Distance metric between two sample distributions (histograms)

Total variation and Hellinger distance are two standard ways to measure this.

Kullback-Leibler divergence is another standard way, as would be general $f$-divergences.

The Earth-Mover's distance (also called the Wasserstein metric) is another option.

Bear in mind that your data gives an empirical cdf, so you can use any of the standard metrics for probability distributions notwithstanding the fact that you have a data sample in hand rather than a formula.

(Wiki has decent entries for all of these, which I may link to later.)


In the first place the answer depends on the nature of your data (e.g., numerical continuous, numerical discrete, nominal etc.). In each of these cases the empirical measures on the range of your data have to be compared by using the corresponding specific methods. I presume that your values are reals (since you evoke the KS distance). In this case the most natural for your problem topology on the space of measures is weak, and by no means the norm (total variation) one, as empirical measures you deal with will typically be pairwise singular.

The weak topolgy, indeed, can be metrized by the "ad hoc" Levy-Prohorov metric, however, the transportation metric (which also metrizes the weak topology) is by far more canonical and appropriate in this situation (it was evoked in R Hahn's answer under the names of Earth-Mover's distance and Wasserstein metric). For instance, the KS distance between two distinct $\delta$-measures is always 1, their total variation distance is 2, whereas the transportation distance between them is equal to the distance between the corresponding points, so that it correctly reflects their similarity. Another advantage is that, unlike the LP metric, the transportation metric on the line can be easily computed explicitly (by using its dual description in terms of Lipschitz functions).

The metrics used in statistics, generally speaking, serve a purpose which is different from yours, and do not make much sense in your situation, when comparing two mutually singular empirical distributions. Most of the statistical distances only make sense for equivalent measures (like the Kullback-Leibler deviation). Of course, you can always discretize your data by using bins, however you will lose information about the data by making it "coarser" (which is OK for typical statistical purposes, but is not necessarily so for you). I do not see any reason to do that if one can efficiently work with the original data by metrizing the weak topology.


For particle distributions, you can, e.g., use the Wasserstein distance or OSPA (its counterpart for densities with different numbers of particles), see [1]. These two are based on the linear assignment problem (one-to-one assignment of the particles, if the numbers are equal) and therefore do not have a continuous gradient. If you don't want a a hard assignment of the particles, you can use the LCD distance from [2].

[1] J. R. Hoffman and R. P. S. Mahler. Multitarget Miss Distance and its Applications. In Proceedings of the Fifth International Conference on Information Fusion (Fusion 2002), July 2002.

[2] U. D. Hanebeck. Optimal Reduction of Multivariate Dirac Mixture Densities. at -Automatisierungstechnik, 63(4), 2015.