How to explain the concentration-of-measure phenomenon intuitively?

As I see it, the key intuition is passing from the equator orthogonal to a single vector to looking at a whole orthonormal basis.

Suppose we pick a random unit vector $(x_1,\dots,x_n)$. What we want to know is why $x_1$ is probably near zero, since this is equivalent to being near the equator relative to the first basis vector. But this feels intuitively obvious to me: all the coordinates have the same distribution, and they surely can't all be large, so they had better all be small.

To be a little more precise, we have $x_1^2+\dots+x_n^2=1$, and each coordinate has the same distribution, so the expected value of $x_1^2$ is $1/n$. Now we can just apply Markov's inequality. For example, the probability that $|x_1|$ is at least $1/n^{1/4}$ must be at most $1/n^{1/2}$, since otherwise the expected value of $x_1^2$ would be too large.

(This is not so different from Bjørn and Dustin's answers, but expressed in a less sophisticated way.)


Visualize first, for comparison, the 2-dimensional unit sphere in 3-dimensional Euclidean space (something that I can visualize!), and imagine it cut, by circles of latitude (perpendicular to the $z$-axis), into narrow zones. Of course, the zones closer to the poles have smaller radii, and therefore smaller circumferences, than the zones near the equator. It's well-known that this decrease of circumference, as you approach the poles, exactly compensates for the increasing "tilt" of the zones, so that a zone's area is proportional to its height as measured in the $z$-direction.

Now let's "look" at $d$-dimensional unit sphere in $(d+1)$-dimensional Euclidean space. Cut it into zones similarly, at the same $z$-coordinates as before. What has changed? The radii of the zones and their tilt are the same as in the 2-dimensional case, but the "circumferences" have become $(d-1)$-dimensional volumes. Now the $(d-1)$-dimensional volume of a sphere depends on the radius $r$ much more violently than $1$-dimensional circumferences. $r^{d-1}$ is almost exactly zero while $r$ is substantially $<1$; it becomes respectable only when $r$ is almost up to $1$. So the zones near the poles are way smaller, compared to equatorial zones, in the high-dimensional case than in the $2$-dimensional case.


Take iid Gaussian random variables $X_1,\ldots,X_d$ with mean $0$ and variance $1/d$. Normalizing the vector $X=(X_1,\ldots,X_d)$ will produce a random point on the unit sphere, but it's already close to having unit norm, so we will avoid this for the sake of intuition. For each unit vector $v$, there is an equator given by

$$\{x\in S^{d-1}:\langle x,v\rangle=0\}$$

Observe that $\langle X,v\rangle$ is Gaussian with mean $0$ and variance $1/d$. Thus, $X$ is typically close to $v$'s equator when $d$ is large. This is because it only has a unit of energy to spread across $d$ dimensions, so the amount of energy in each dimension must vanish.