Combine column from multiple rows into single row

This is relatively trivial to do with a correlated subquery. You can't use the COALESCE method highlighted in the blog post you mention unless you extract that to a user-defined function (or unless you only want to return one row at a time). Here is how I typically do this:

DECLARE @x TABLE 
(
  id INT, 
  row_num INT, 
  customer_code VARCHAR(32), 
  comments VARCHAR(32)
);

INSERT @x SELECT 1,1,'Dilbert','Hard'
UNION ALL SELECT 1,2,'Dilbert','Worker'
UNION ALL SELECT 2,1,'Wally','Lazy';

SELECT id, customer_code, comments = STUFF((SELECT ' ' + comments 
    FROM @x AS x2 WHERE id = x.id
     ORDER BY row_num
     FOR XML PATH('')), 1, 1, '')
FROM @x AS x
GROUP BY id, customer_code
ORDER BY id;

If you have a case where the data in comments could contain unsafe-for-XML characters (>, <, &), you should change this:

     FOR XML PATH('')), 1, 1, '')

To this more elaborate approach:

     FOR XML PATH(''), TYPE).value(N'(./text())[1]', N'varchar(max)'), 1, 1, '')

(Be sure to use the right destination data type, varchar or nvarchar, and the right length, and prefix all string literals with N if using nvarchar.)


If you're allowed to use CLR in your environment, this is a tailor-made case for a user-defined aggregate.

In particular, this is probably the way to go if the source data is non-trivially large and/or you need to do this type of thing a lot in your application. I strongly suspect the query plan for Aaron's solution will not scale well as the input size grows. (I tried adding an index to the temp table, but that didn't help.)

This solution, like many other things, is a tradeoff:

  • Politics/policy for even using CLR Integration in your, or your client's, environment.
  • CLR function is likely faster, and will scale better given a real set of data.
  • CLR function will be reusable in other queries, and you won't have to duplicate (and debug) a complex subquery every time you need to do this type of thing.
  • Straight T-SQL is simpler than writing and managing a piece of external code.
  • Perhaps you don't know how to program in C# or VB.
  • etc.

EDIT: Well, I went to try to see if this actually was better, and it turns out the requirement that the comments be in a specific order is currently not possible to satisfy using an aggregate function. :(

See SqlUserDefinedAggregateAttribute.IsInvariantToOrder. Basically, what you need to do is OVER(PARTITION BY customer_code ORDER BY row_num) but ORDER BY is not supported in the OVER clause when aggregating. I'm assuming adding this functionality to SQL Server opens a can of worms, because what would need to be changed in the execution plan is trivial. The aforementioned link says this is reserved for future use, so this could be implemented in the future (on 2005 you're probably out of luck, though).

This could still be accomplished by packing and parsing the row_num value into the aggregated string, and then doing the sort within the CLR object... which seems pretty hackish.

In any event, below is the code I used in case anyone else finds this useful even with the limitation. I'll leave the hacking part as an exercise for the reader. Note that I used AdventureWorks (2005) for test data.

Aggregate assembly:

using System;
using System.IO;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

namespace MyCompany.SqlServer
{
    [Serializable]
    [SqlUserDefinedAggregate
    (
        Format.UserDefined,
        IsNullIfEmpty = false,
        IsInvariantToDuplicates = false,
        IsInvariantToNulls = true,
        IsInvariantToOrder = false,
        MaxByteSize = -1
    )]
    public class StringConcatAggregate : IBinarySerialize
    {
        private string _accum;
        private bool _isEmpty;

        public void Init()
        {
            _accum = string.Empty;
            _isEmpty = true;
        }

        public void Accumulate(SqlString value)
        {
            if (!value.IsNull)
            {
                if (!_isEmpty)
                    _accum += ' ';
                else
                    _isEmpty = false;

                _accum += value.Value;
            }
        }

        public void Merge(StringConcatAggregate value)
        {
            Accumulate(value.Terminate());
        }

        public SqlString Terminate()
        {
            return new SqlString(_accum);
        }

        public void Read(BinaryReader r)
        {
            this.Init();

            _accum = r.ReadString();
            _isEmpty = _accum.Length == 0;
        }

        public void Write(BinaryWriter w)
        {
            w.Write(_accum);
        }
    }
}

T-SQL for testing (CREATE ASSEMBLY, and sp_configure to enable CLR omitted):

CREATE TABLE [dbo].[Comments]
(
    CustomerCode int NOT NULL,
    RowNum int NOT NULL,
    Comments nvarchar(25) NOT NULL
)

INSERT INTO [dbo].[Comments](CustomerCode, RowNum, Comments)
    SELECT
        DENSE_RANK() OVER(ORDER BY FirstName),
        ROW_NUMBER() OVER(PARTITION BY FirstName ORDER BY ContactID),
        Phone
        FROM [AdventureWorks].[Person].[Contact]
GO

CREATE AGGREGATE [dbo].[StringConcatAggregate]
(
    @input nvarchar(MAX)
)
RETURNS nvarchar(MAX)
EXTERNAL NAME StringConcatAggregate.[MyCompany.SqlServer.StringConcatAggregate]
GO


SELECT
    CustomerCode,
    [dbo].[StringConcatAggregate](Comments) AS AllComments
    FROM [dbo].[Comments]
    GROUP BY CustomerCode