How to make Matplotlib scatterplots transparent as a group?

Interesting question, I think any use of transparency will result in the stacking effect you want to avoid. You could manually set a transparency type colour to get closer to the results you want,

import matplotlib.pyplot as plt
import numpy as np

def points(n=100):
    x = np.random.uniform(size=n)
    y = np.random.uniform(size=n)
    return x, y
x1, y1 = points()
x2, y2 = points()
fig = plt.figure(figsize=(4,4))
ax = fig.add_subplot(111, title="Test scatter")
alpha = 0.5
ax.scatter(x1, y1, s=100, lw = 0, color=[1., alpha, alpha])
ax.scatter(x2, y2, s=100, lw = 0, color=[alpha, alpha, 1.])
plt.show()

The overlap between the different colours are not included in this way but you get,

enter image description here


This is a terrible, terrible hack, but it works.

You see while Matplotlib plots data points as separate objects that can overlap, it plots the line between them as a single object - even if that line is broken into several pieces by NaNs in the data.

With that in mind, you can do this:

import numpy as np
from matplotlib import pyplot as plt

plt.rcParams['lines.solid_capstyle'] = 'round'

def expand(x, y, gap=1e-4):
    add = np.tile([0, gap, np.nan], len(x))
    x1 = np.repeat(x, 3) + add
    y1 = np.repeat(y, 3) + add
    return x1, y1

x1, y1 = points()
x2, y2 = points()
fig = plt.figure(figsize=(4,4))
ax = fig.add_subplot(111, title="Test scatter")
ax.plot(*expand(x1, y1), lw=20, color="blue", alpha=0.5)
ax.plot(*expand(x2, y2), lw=20, color="red", alpha=0.5)

fig.savefig("test_scatter.png")
plt.show()

And each color will overlap with the other color but not with itself.

enter image description here

One caveat is that you have to be careful with the spacing between the two points you use to make each circle. If they're two far apart then the separation will be visible on your plot, but if they're too close together, matplotlib doesn't plot the line at all. That means that the separation needs to be chosen based on the range of your data, and if you plan to make an interactive plot then there's a risk of all the data points suddenly vanishing if you zoom out too much, and stretching if you zoom in too much.

As you can see, I found 1e-5 to be a good separation for data with a range of [0,1].


Yes, interesting question. You can get this scatterplot with Shapely. Here is the code :

import matplotlib.pyplot as plt
import matplotlib.patches as ptc
import numpy as np
from shapely.geometry import Point
from shapely.ops import cascaded_union

n = 100
size = 0.02
alpha = 0.5

def points():
    x = np.random.uniform(size=n)
    y = np.random.uniform(size=n)
    return x, y

x1, y1 = points()
x2, y2 = points()
polygons1 = [Point(x1[i], y1[i]).buffer(size) for i in range(n)]
polygons2 = [Point(x2[i], y2[i]).buffer(size) for i in range(n)]
polygons1 = cascaded_union(polygons1)
polygons2 = cascaded_union(polygons2)

fig = plt.figure(figsize=(4,4))
ax = fig.add_subplot(111, title="Test scatter")
for polygon1 in polygons1:
    polygon1 = ptc.Polygon(np.array(polygon1.exterior), facecolor="red", lw=0, alpha=alpha)
    ax.add_patch(polygon1)
for polygon2 in polygons2:
    polygon2 = ptc.Polygon(np.array(polygon2.exterior), facecolor="blue", lw=0, alpha=alpha)
    ax.add_patch(polygon2)
ax.axis([-0.2, 1.2, -0.2, 1.2])

fig.savefig("test_scatter.png")

and the result is :

Test scatter