How to perform rank based selection in a genetic algorithm?

I was also a bit confused by various sources on how the probabilities are being computed when using the Linear Ranking Selection, sometimes also called "Rank Selection" as is referred to here. At least I hope these two are referring to the same thing.

The part which was elusive to me is the sum of the ranks which seems to have been omitted or at least not explicitly stated in most of the sources. Here I present a short, yet a verbose Python example of how the probability distribution is computed (those nice charts you often see).

Assuming these are some example individual fintesses: 10, 9, 3, 15, 85, 7.

Once sorted, assign the ranks in ascending order: 1st: 3, 2nd: 7, 3rd: 9, 4th: 10, 5th: 15, 6th: 85

The sum of all ranks is 1+2+3+4+5+6 or using the gauss formula (6+1)*6/2 = 21.

Hence, we compute the probabilities as: 1/21, 2/21, 3/21, 4/21, 5/21, 6/21, which you can then express as percentages:

Probability distribution of selecting an individual using Rank Selection

Take note that this is not what is used in actual implementations of genetic algorithms, only a helper script to give you better intuition.

You can fetch this script with:

curl -o ranksel.py https://gist.githubusercontent.com/kburnik/3fe766b65f7f7427d3423d233d02cd39/raw/5c2e569189eca48212c34b3ea8a8328cb8d07ea5/ranksel.py

#!/usr/bin/env python

"""
Assumed name of script: ranksel.py

Sample program to estimate individual's selection probability using the Linear
Ranking Selection algorithm - a selection method in the field of Genetic
Algorithms. This should work with Python 2.7 and 3.5+.

Usage:

./ranksel.py f1 f2 ... fN

Where fK is the scalar fitness of the Kth individual. Any ordering is accepted.

Example:

$ python -u ranksel.py 10 9 3 15 85 7
Rank Fitness Sel.prob.
   1    3.00     4.76%
   2    7.00     9.52%
   3    9.00    14.29%
   4   10.00    19.05%
   5   15.00    23.81%
   6   85.00    28.57%

"""

from __future__ import print_function
import sys

def compute_sel_prob(population_fitness):
  """Computes and generates tuples of (rank, individual_fitness,
     selection_probability) for each individual's fitness, using the Linear
     Ranking Selection algorithm."""
  # Get the number of individuals in the population.
  n = len(population_fitness)

  # Use the gauss formula to get the sum of all ranks (sum of integers 1 to N).
  rank_sum = n * (n + 1) / 2

  # Sort and go through all individual fitnesses; enumerate ranks from 1.
  for rank, ind_fitness in enumerate(sorted(population_fitness), 1):
    yield rank, ind_fitness, float(rank) / rank_sum


if __name__ == "__main__":
  # Read the fitnesses from the command line arguments.
  population_fitness = list(map(float, sys.argv[1:]))

  print ("Rank Fitness Sel.prob.")
  # Iterate through the computed tuples and print the table rows.
  for rank, ind_fitness, sel_prob in compute_sel_prob(population_fitness):
    print("%4d %7.2f %8.2f%%" % (rank, ind_fitness, sel_prob * 100))

What you've described there is roulette wheel selection, not rank selection. To do rank selection, rather than weighting each candidate by its fitness score, you weight it by its "rank" (that is, best, second best, third best, etc.).

For instance, you might give the first one a weighting of 1/2, the second a weighting of 1/3, the third a weighting of 1/4, etc. Or the worst gets weighting 1, the second worst gets weighting 2, etc.

The important point is that absolute or relative fitness scores are not considered, only the rankings. So the best is more likely to be chosen than the second best, but the two have the same probabilities of being chosen whether the best had ten times the score of the second best, or only had a slightly greater score.


What have you described is simply Roulette wheel selection.In Roulette wheel selection:

  • Parents are selected according to their fitness
  • The better the chromosomes are, the more chances to be selected they have.

Imagine a roulette wheel where all chromosomes in the population are placed, each chromosome has its place big accordingly to its fitness function, like on the following pictureenter image description here.
This selection will have problems when the finesses differs very much. Outstanding individuals will introduce a bias in the beginning of the search that may cause a premature convergence and a loss of diversity.
For Example:

if an initial population contains one or two very fit but not the best individuals and the rest of the population are not good, then these fit individuals will quickly dominate the whole population and prevent the population from exploring other potentially better individuals. Such a strong domination causes a very high loss of genetic diversity which is definitely not advantageous for the optimization process.

But in Rank Selection:

  1. Rank selection first ranks the population and then every chromosome receives fitness from this ranking.
  2. The worst will have fitness 1, second worst 2 etc. and the best will have fitness N (number of chromosomes in population). enter image description here

  3. After this all the chromosomes have a chance to be selected.

  4. Rank-based selection schemes can avoid premature convergence.
  5. But can be computationally expensive because it sorts the populations based on fitness value.
  6. But this method can lead to slower convergence, because the best chromosomes do not differ so much from other ones.

So the process will be:

  1. First sort the Fitness value of the Population.

  2. Then if the Population number is 10 then give the probability of selection to the Population like 0.1,0.2,0.3,...,1.0 .

  3. Then calculate cumulative Fitness and make roulette wheel.
  4. And the next steps is same as roulette wheel.

My Implementation of Rank Selection in Matlab:

NewFitness=sort(Fitness);
        NewPop=round(rand(PopLength,IndLength));

        for i=1:PopLength
            for j=1:PopLength
                if(NewFitness(i)==Fitness(j))
                    NewPop(i,1:IndLength)=CurrentPop(j,1:IndLength);
                    break;
                end
            end
        end
        CurrentPop=NewPop;

        ProbSelection=zeros(PopLength,1);
        CumProb=zeros(PopLength,1);

        for i=1:PopLength
            ProbSelection(i)=i/PopLength;
            if i==1
                CumProb(i)=ProbSelection(i);
            else
                CumProb(i)=CumProb(i-1)+ProbSelection(i);
            end
        end

        SelectInd=rand(PopLength,1);

        for i=1:PopLength
            flag=0;
            for j=1:PopLength
                if(CumProb(j)<SelectInd(i) && CumProb(j+1)>=SelectInd(i))
                    SelectedPop(i,1:IndLength)=CurrentPop(j+1,1:IndLength);
                    flag=1;
                    break;
                end
            end
            if(flag==0)
                SelectedPop(i,1:IndLength)=CurrentPop(1,1:IndLength);
            end
        end

Note: you can also see my question regarding rank selection here in this link and my article here.