Elegant way to match a string to a random color matplotlib

Choose a color map, such as viridis:

cmap = plt.get_cmap('viridis')

The colormap, cmap, is a function which can take an array of values from 0 to 1 and map them to RGBA colors. np.linspace(0, 1, len(names)) produces an array of equally spaced numbers from 0 to 1 of length len(names). Thus,

colors = cmap(np.linspace(0, 1, len(names)))

selects equally-spaced colors from the viridis color map.

Note that this is not using the value of the string, it only uses the ordinal position of the string in the list to select a color. Note also that these are not random colors, this is just an easy way to generate unique colors from an arbitrary list of strings.


So:

import numpy as np
import matplotlib.pyplot as plt

cmap = plt.get_cmap('viridis')
names = ["bob", "joe", "andrew", "pete"]
colors = cmap(np.linspace(0, 1, len(names)))
print(colors)
# [[ 0.267004  0.004874  0.329415  1.      ]
#  [ 0.190631  0.407061  0.556089  1.      ]
#  [ 0.20803   0.718701  0.472873  1.      ]
#  [ 0.993248  0.906157  0.143936  1.      ]]

x = np.linspace(0, np.pi*2, 100)
for i, (name, color) in enumerate(zip(names, colors), 1):
    plt.plot(x, np.sin(x)/i, label=name, c=color)
plt.legend()
plt.show()

enter image description here


The problem with

clr = {names[i]: colors[i] for i in range(len(names))}
ax.scatter(x, y, z, c=clr)

is that the c parameter of ax.scatter expects a sequence of RGB(A) values of the same length as x or a single color. clr is a dict, not a sequence. So if colors is the same length as x then you could use

ax.scatter(x, y, z, c=colors)

I use the hash function to get numbers between 0 and 1, you can use this even when you don't know all the labels:

x = [1, 2, 3, 4, 5]
labels = ["a", "a", "b", "b", "a"]
y = [1, 2, 3, 4, 5]

colors = [float(hash(s) % 256) / 256 for s in labels]      

plt.scatter(x, y, c=colors, cmap="jet")
plt.show()

This has upset me so much, that I have written get_cmap_string that returns a function which works exactly as cmap but acts also on strings.

data = ["bob", "joe", "pete", "andrew", "pete"]
cmap = get_cmap_string(palette='viridis', domain=data)
cmap("joe")
# (0.20803, 0.718701, 0.472873, 1.0)
cmap("joe", alpha=0.5)
# (0.20803, 0.718701, 0.472873, 0.5)

1. Implementation

The basic idea as mentioned by all other answers is that we need a hash table -- a mapping from our strings to integers, which is unique. In python this is just a dictionary.

The reason hash(str) won't work, is that even though matplotlib's cmap accepts any integer, it is possible for two different strings to get the same color. For example, if hash(str1) is 8 and hash(str2) is 18, and we initialize cmap as get_cmap(name=palette, lut=10) then cmap(hash(str1)) will be the same as cmap(hash(str2))

Code

import numpy as np
import matplotlib.cm
def get_cmap_string(palette, domain):
    domain_unique = np.unique(domain)
    hash_table = {key: i_str for i_str, key in enumerate(domain_unique)}
    mpl_cmap = matplotlib.cm.get_cmap(palette, lut=len(domain_unique))

    def cmap_out(X, **kwargs):
        return mpl_cmap(hash_table[X], **kwargs)

    return cmap_out

2. Usage

Example as in other answers, but now note that the name pete appears twice.

import matplotlib.pyplot as plt

# data
names = ["bob", "joe", "pete", "andrew", "pete"]

# color map for the data
cmap = get_cmap_string(palette='viridis', domain=names)

# example usage
x = np.linspace(0, np.pi*2, 100)
for i_name, name in enumerate(names):
    plt.plot(x, np.sin(x)/i_name, label=name, c=cmap(name))
plt.legend()
plt.show()

example usage

You can see, that the entries in the legend are duplicated. Solving this is another challenge, see here. Or use a custom legend instead as explained here.

3. Alternatives

As far the discussion by matplotlib devs goes, they recommend using Seaborn. See discussion here and an example usage here.