Can pandas groupby aggregate into a list, rather than sum, mean, etc?

I used the following

grouped = df.groupby('A')

df = grouped.aggregate(lambda x: tuple(x))

df['grouped'] = df['B'] + df['C']

I am answering the question as stated in its title and first sentence: the following aggregates values to lists:

df.groupby('A').aggregate(lambda tdf: tdf.unique().tolist())

Below this is demonstrated in a simple example:

import pandas as pd

df = pd.DataFrame( {'A' : [1, 1, 1, 1, 2, 2, 3], 'B' : [10, 12, 11, 10, 11, 12, 14], 'C' : [22, 20,     8, 10, 13, 10, 0]})
print df

df2 = df.groupby('A').aggregate(lambda tdf: tdf.unique().tolist())
print df2
# Old version:
# df2=df.groupby(['A']).apply(lambda tdf: pd.Series(  dict([[vv,tdf[vv].unique().tolist()] for vv in tdf if vv not in ['A']])  )) 

The output is as follows:

In [3]: run tmp
   A   B   C
0  1  10  22
1  1  12  20
2  1  11   8
3  1  10  10
4  2  11  13
5  2  12  10
6  3  14   0

[7 rows x 3 columns]
              B                C
A                               
1  [10, 12, 11]  [22, 20, 8, 10]
2      [11, 12]         [13, 10]
3          [14]              [0]

[3 rows x 2 columns]

Similar solution, but fairly transparent (I think). you can get full list or unique lists.

df = pd.DataFrame({'A':[1,1,2,2,2,3,3,3,4,5], 
                   'B':[6,7, 8,8,9, 9,9,10,11,12], 
                   'C':['foo']*10})

df
Out[24]: 
   A   B    C
0  1   6  foo
1  1   7  foo
2  2   8  foo
3  2   8  foo
4  2   9  foo
5  3   9  foo
6  3   9  foo
7  3  10  foo
8  4  11  foo
9  5  12  foo

list_agg = df.groupby(by='A').agg({'B':lambda x: list(x), 
                                   'C':lambda x: tuple(x)})

list_agg
Out[26]: 
                 C           B
A                             
1       (foo, foo)      [6, 7]
2  (foo, foo, foo)   [8, 8, 9]
3  (foo, foo, foo)  [9, 9, 10]
4           (foo,)        [11]
5           (foo,)        [12]

unique_list_agg = df.groupby(by='A').agg({'B':lambda x: list(pd.unique(x)), 
                                          'C':lambda x: tuple(pd.unique(x))})

unique_list_agg
Out[28]: 
        C        B
A                 
1  (foo,)   [6, 7]
2  (foo,)   [8, 9]
3  (foo,)  [9, 10]
4  (foo,)     [11]
5  (foo,)     [12]

Here is a one liner

# if list of unique items is desired, use set
df.groupby('A',as_index=False)['B'].aggregate(lambda x: set(x))

# if duplicate items are okay, use list
df.groupby('A',as_index=False)['B'].aggregate(lambda x: list(x))

Tags:

Python

Pandas