Calculate percentile of value in column
Probably very late but still
will give you the regular 25, 50 and 75 percentile with some additional data but if you specifically want percentiles for some specific values then
df['column_name'].describe(percentiles=[0.1, 0.2, 0.3, 0.5])
This will give you 10th, 20th, 30th and 50th percentiles. You can give as many values as you want.
To find the percentile of a value relative to an array (or in your case a dataframe column), use the scipy function
For example, if we have a value
x (the other numerical value not in the dataframe), and a reference array,
arr (the column from the dataframe), we can find the percentile of
from scipy import stats percentile = stats.percentileofscore(arr, x)
Note that there is a third parameter to the
stats.percentileofscore() function that has a significant impact on the resulting value of the percentile, viz.
kind. You can choose from
mean. See the docs for more information.
For an example of the difference:
>>> df a 0 1 1 2 2 3 3 4 4 5 >>> stats.percentileofscore(df['a'], 4, kind='rank') 80.0 >>> stats.percentileofscore(df['a'], 4, kind='weak') 80.0 >>> stats.percentileofscore(df['a'], 4, kind='strict') 60.0 >>> stats.percentileofscore(df['a'], 4, kind='mean') 70.0
As a final note, if you have a value that is greater than 80% of the other values in the column, it would be in the 80th percentile (see the example above for how the
kind method affects this final score somewhat) not the 20th percentile. See this Wikipedia article for more information.