itertools.groupby() not grouping correctly

itertools.groupby collects together contiguous items with the same key. If you want all items with the same key, you have to sort self.data first.

for mid, group in itertools.groupby(
    sorted(self.data,key=operator.itemgetter(1)), key=operator.itemgetter(1)):

Below "fixes" several annoyances with Python's itertools.groupby.

def groupby2(l, key=lambda x:x, val=lambda x:x, agg=lambda x:x, sort=True):
    if sort:
        l = sorted(l, key=key)
    return ((k, agg((val(x) for x in v))) \
        for k,v in itertools.groupby(l, key=key))

Specifically,

  1. It doesn't require that you sort your data.
  2. It doesn't require that you must use key as named parameter only.
  3. The output is clean generator of tuple(key, grouped_values) where values are specified by 3rd parameter.
  4. Ability to apply aggregation functions like sum or avg easily.

Example Usage

import itertools
from operator import itemgetter
from statistics import *

t = [('a',1), ('b',2), ('a',3)]
for k,v in groupby2(t, itemgetter(0), itemgetter(1), sum):
  print(k, v)

This prints,

a 4
b 2

Play with this code


Variant without sorting (via dictionary). Should be better performance-wise.

def full_group_by(l, key=lambda x: x):
    d = defaultdict(list)
    for item in l:
        d[key(item)].append(item)
    return d.items()