Number of occurrence of pair of value in dataframe

For performance implications of the below solutions, see Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series. They are presented below with best performance first.

`GroupBy.size`

You can create a series of counts with (Name, Surname) tuple indices using GroupBy.size:

res = df.groupby(['Name', 'Surname']).size().sort_values(ascending=False)

By sorting these values, we can easily extract the most common:

most_common = res.head(1)
most_common_dups = res[res == res.iloc[0]].index.tolist()  # handles duplicate top counts

`value_counts`

Another way is to construct a series of tuples, then apply pd.Series.value_counts:

res = pd.Series(list(zip(df.Name, df.Surname))).value_counts()

The result will be a series of counts indexed by Name-Surname combinations, sorted from most common to least.

name, surname = res.index[0]  # return most common
most_common_dups = res[res == res.max()].index.tolist()

`collections.Counter`

If you wish to create a dictionary of (name, surname): counts entries, you can do so via collections.Counter:

from collections import Counter

zipper = zip(df.Name, df.Surname)
c = Counter(zipper)

Counter has useful methods such as most_common, which you can use to extract your result.

Seems like a good use case for the performant Counter:

from collections import Counter
popular_names = Counter(zip(df.Name, df.Surname)).most_common(10)

Number of occurrence of pair of value in dataframe

`GroupBy.size`

`value_counts`

`collections.Counter`

Tags:

Python

Pandas

Series

Related

Recent Posts