Calculating percentile rank in MySQL

This is a relatively ugly answer, and I feel guilty saying it. That said, it might help you with your issue.

One way to determine the percentage would be to count all of the rows, and count the number of rows that are greater than the number you provided. You can calculate either greater or less than and take the inverse as necessary.

Create an index on your number. total = select count(); less_equal = select count() where value > indexed_number;

The percentage would be something like: less_equal / total or (total - less_equal)/total

Make sure that both of them are using the index that you created. If they are not, tweak them until they are. The explain query should have "using index" in the right hand column. In the case of the select count(*) it should be using index for InnoDB and something like const for MyISAM. MyISAM will know this value at any time without having to calculate it.

If you needed to have the percentage stored in the database, you can use the setup from above for performance and then calculate the value for each row by using the second query as an inner select. The first query's value can be set as a constant.

Does this help?

Jacob


SELECT 
    c.id, c.score, ROUND(((@rank - rank) / @rank) * 100, 2) AS percentile_rank
FROM
    (SELECT 
    *,
        @prev:=@curr,
        @curr:=a.score,
        @rank:=IF(@prev = @curr, @rank, @rank + 1) AS rank
    FROM
        (SELECT id, score FROM mytable) AS a,
        (SELECT @curr:= null, @prev:= null, @rank:= 0) AS b
ORDER BY score DESC) AS c;

Here's a different approach that doesn't require a join. In my case (a table with 15,000+) rows, it runs in about 3 seconds. (The JOIN method takes an order of magnitude longer).

In the sample, assume that measure is the column on which you're calculating the percent rank, and id is just a row identifier (not required):

SELECT
    id,
    @prev := @curr as prev,
    @curr := measure as curr,
    @rank := IF(@prev > @curr, @rank+@ties, @rank) AS rank,
    @ties := IF(@prev = @curr, @ties+1, 1) AS ties,
    (1-@rank/@total) as percentrank
FROM
    mytable,
    (SELECT
        @curr := null,
        @prev := null,
        @rank := 0,
        @ties := 1,
        @total := count(*) from mytable where measure is not null
    ) b
WHERE
    measure is not null
ORDER BY
    measure DESC

Credit for this method goes to Shlomi Noach. He writes about it in detail here:

http://code.openark.org/blog/mysql/sql-ranking-without-self-join

I've tested this in MySQL and it works great; no idea about Oracle, SQLServer, etc.


there is no easy way to do this. see http://rpbouman.blogspot.com/2008/07/calculating-nth-percentile-in-mysql.html