How to make a histogram from a list of strings in Python?

As @notconfusing pointed above this can be solved with Pandas and Counter. If for any reason you need to not use Pandas you can get by with only matplotlib using the function in the following code:

from collections import Counter
import numpy as np
import matplotlib.pyplot as plt

a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
letter_counts = Counter(a)

def plot_bar_from_counter(counter, ax=None):
    """"
    This function creates a bar plot from a counter.

    :param counter: This is a counter object, a dictionary with the item as the key
     and the frequency as the value
    :param ax: an axis of matplotlib
    :return: the axis wit the object in it
    """

    if ax is None:
        fig = plt.figure()
        ax = fig.add_subplot(111)

    frequencies = counter.values()
    names = counter.keys()

    x_coordinates = np.arange(len(counter))
    ax.bar(x_coordinates, frequencies, align='center')

    ax.xaxis.set_major_locator(plt.FixedLocator(x_coordinates))
    ax.xaxis.set_major_formatter(plt.FixedFormatter(names))

    return ax

plot_bar_from_counter(letter_counts)
plt.show()

Which will produce enter image description here


Rather than use groupby() (which requires your input to be sorted), use collections.Counter(); this doesn't have to create intermediary lists just to count inputs:

from collections import Counter

counts = Counter(a)

You haven't really specified what you consider to be a 'histogram'. Lets assume you wanted to do this on the terminal:

width = 120  # Adjust to desired width
longest_key = max(len(key) for key in counts)
graph_width = width - longest_key - 2
widest = counts.most_common(1)[0][1]
scale = graph_width / float(widest)

for key, size in sorted(counts.items()):
    print('{}: {}'.format(key, int(size * scale) * '*'))

Demo:

>>> from collections import Counter
>>> a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
>>> counts = Counter(a)
>>> width = 120  # Adjust to desired width
>>> longest_key = max(len(key) for key in counts)
>>> graph_width = width - longest_key - 2
>>> widest = counts.most_common(1)[0][1]
>>> scale = graph_width / float(widest)
>>> for key, size in sorted(counts.items()):
...     print('{}: {}'.format(key, int(size * scale) * '*'))
... 
a: *********************************************************************************************
b: **********************************************
c: **********************************************************************
d: ***********************
e: *********************************************************************************************************************

More sophisticated tools are found in the numpy.histogram() and matplotlib.pyplot.hist() functions. These do the tallying for you, with matplotlib.pyplot.hist() also providing you with graph output.


here's a concise all-pandas approach:

a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
pd.Series(a).value_counts(sort=False).plot(kind='bar')

barplot of counts


Very easy with Pandas.

import pandas
from collections import Counter
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
letter_counts = Counter(a)
df = pandas.DataFrame.from_dict(letter_counts, orient='index')
df.plot(kind='bar')

Notice that Counter is making a frequency count, so our plot type is 'bar' not 'hist'.

histogram of letter counts