pandas groupby and join lists

df.groupby('column_a').agg(sum)

This works because of operator overloading sum concatenates the lists together. The index of the resulting df will be the values from column_a:


object dtype is a catch-all dtype that basically means not int, float, bool, datetime, or timedelta. So it is storing them as a list. convert_objects tries to convert a column to one of those dtypes.

You want

In [63]: df
Out[63]: 
   a          b    c
0  1  [1, 2, 3]  foo
1  1     [2, 5]  bar
2  2     [5, 6]  baz


In [64]: df.groupby('a').agg({'b': 'sum', 'c': lambda x: ' '.join(x)})
Out[64]: 
         c                b
a                          
1  foo bar  [1, 2, 3, 2, 5]
2      baz           [5, 6]

This groups the data frame by the values in column a. Read more about groupby.

This is doing a regular list sum (concatenation) just like [1, 2, 3] + [2, 5] with the result [1, 2, 3, 2, 5]


The approach proposed above using df.groupby('column_a').agg(sum) definetly works. However, you have to make sure that your list only contains integers, otherwise the output will not be the same.

If you want to convert all of the lists items into integers, you can use:

df['column_a'] = df['column_a'].apply(lambda x: list(map(int, x)))

Tags:

Python

Pandas