Way to aggregated percentiles

There is no math for meaningfully aggregating percentiles. Once you've summarized things as percentiles (and discarded the raw data or histogram distribution behind them) there is no way to aggregate the summarized percentiles into anything useful for the same percentile levels. And yes, this means that those "average percentile" legend numbers that show in various percentile monitoring charts are completely bogus.

A simple way to demonstrate why any attempt at aggregating percentiles by averaging them (weighted or not) is useless, try it with a simple to reason about percentile: the 100%'ile (the max).

E.g. If I had the following 100%'iles reported for each one minute interval, each with the same overall event count: [1, 0, 3, 1, 601, 4, 2, 8, 0, 3, 3, 1, 1, 0, 2]

The (weighted or not) average of this sequence is 42. And it has as much relation to the overall 100%'ile as the phase of the moon does. No amount of fancy averaging (weighted or not) will produce a correct answer for "what is the 100%'ile of the overall 15 minute period?". There is only one correct answer: 601 was the 100%'ile seen during the 15 minutes period.

There are only two percentiles for which you can actually find math that works for accurate aggregation across intervals: - the 100%'ile (for which the answer is "the max is the max of the maxes") - 0%'ile (for which the answer is "the min is the min of the mins")

For all other percentiles, the only correct answer is "The aggregate N%'ile is somewhere between the lowest and highest N%'ile seen in any interval in the aggregate time period". And that's not a very useful answer. Especially when the range for those can cover the entire spectrum. In many real world data sets, it often amounts to something close to "it's somewhere between the overall min and overall max".

For more ranting on this subject: http://latencytipoftheday.blogspot.com/2014/06/latencytipoftheday-q-whats-wrong-with_21.html http://latencytipoftheday.blogspot.com/2014/06/latencytipoftheday-you-cant-average.html


It would be very nice to have data for more percentiles. If you had the $90,91,92,\dots,99$ percentiles for each block of time you could do very well. Say you had this data along with then number of requests for each of the $168$ hours of a week and want to report the aggregate $95$ percentile for the week. Average all the $95$ percentiles for a starting point. Say that time is the $93$ percentile for the first hour. Then $7\%$ of the requests that hour took longer. Total up the number of requests that took longer and compare to the total number of requests in the week. Say you find the this time is $96$ percentile. Shorten it down a bit and try again. You can use bisection to guide your search.

If you don't have the additional data, I don't see what you can do better than average. If the need is just to report a number, that will be good enough. It probably will also be good enough to spot trends, but may not be an accurate $95$ percentile. Do you need that?


No, there is no exact way to aggregate percentiles -- you need all data, or at least a histogram of all data.

Yes, there are approximations, Q-digest seems very promising.

In my spare time I hacked up histogram of quantised log-response-time, domain limited to some arbitrary min and max. That answers my queries in log(N) space, considering the imposed limits it's actually O(1). Roughly, the histogram looks like this:

buckets = 1ms|2ms|4ms|...|1h|2h|4h
counts = 0,0,12,145,...,1,0,0,0

In my case, I used series [2**(0.1*i) for i in range(100)], that gives nice 7% increments from 1 to about 1000; 1 being considered impossibly fast and 1000 unbearably slow.

Faster lookup is achieved by using cumulative histogram, as binary search can be used.

Tags:

Average