Pathological Sorting

It's been a pretty long time, but I remember back in Algorithms 101 we were taught some sorting algorithm that used randomization. I wasn't a very good student so I don't really remember how it went or why it worked quickly on average.

Nevertheless, I've decided that this problem calls for a solution that uses randomization, which will hopefully work in my favor on average.

import random

def arrayIsSorted (arr) :
    for i in range(len(arr)-1) :
        if arr[i]>arr[i+1] : return False
    return True

def rSort (arr) :
    random.seed (42)
    counter = 0
    while not arrayIsSorted(arr) :
        random.shuffle (arr)
        counter+=1
    print ("Sorted in %d iterations." % counter)
    return arr

Since true randomization is important, I make sure to seed the RNG with the answer to Life, the Universe and Everything. After a bit of testing it turns out that that was a smart move! Check out how fast these 2 completely arbitrary lists get sorted:

rSort ([6,1,4,2,3,7,5])
rSort ([8,9,6,1,4,7,2,3,5])

Both of these get sorted in only 1 iteration - you couldn't possibly ask for a faster function than that!

Now, admittedly, some other lists produce slightly worse results...

rSort ([5,1,4,2,3,7,6])
rSort ([8,9,6,1,4,7,2,5,3])

These get sorted in 4,176 and 94,523 iterations respectively, which actually takes more than a second... but let's just keep that fact to ourselves so as not to distract anyone from how amazing this algorithm is!

Edit:

I've been asked to prove my algorithm's efficiency on a list of 100 items, so here you go:

rSort ([70, 6, 52, 97, 85, 61, 62, 48, 30, 3, 11, 88, 39, 91, 98, 8, 54, 92, 44, 65, 69, 21, 58, 41, 60, 76, 27, 82, 93, 81, 20, 94, 22, 29, 49, 95, 40, 19, 55, 42, 43, 1, 0, 67, 35, 15, 51, 31, 16, 25, 5, 53, 37, 74, 86, 12, 13, 72, 56, 32, 47, 46, 59, 33, 80, 4, 45, 63, 57, 89, 7, 77, 14, 10, 34, 87, 18, 79, 9, 66, 24, 99, 64, 26, 78, 38, 90, 28, 83, 75, 68, 2, 17, 73, 96, 71, 23, 84, 36, 50])

Even this long and completely arbitrary list gets sorted instantly! Truly I must have stumbled upon the best sorting algorithm in the world!


If you can create your own data, then it's pretty straightforward - get data that looks random, but includes a key for faster sorting. All other data uses the original sorting method, so the average times are better.

One easy way is to make sure each data item has a unique key, and then just hash the keys. Take for example a list with the numbers from 1-10,000, all multiplied by 16, and with a random number from 0-15 added to it (see fillArray() below). They will look random, but each one has a unique sequential key. For sorting, divide by 16 (in C the >>4 is very fast) and then just place the number into an array using the resulting key as the index. One pass and you're done. In testing, I found quicksort was 30 times slower on ten million numbers.

void fillArray(int *a,int len)
{
  for (int i=0;i<len;++i)
    a[i]=(i<<4)|(rand()&0xF);
  // shuffle later
}
void sortArray(int *a,int len)
{
  int key=0;
  int *r=new int[len];
  for (int i=0;i<len;++i)
  {
    key=a[i]>>4;
    r[key]=a[i];
  }
  memcpy(a,r,len*sizeof(int));
  delete[] r;
}
void shuffleArray(int *a,int len)
{
  int swap=0, k=0;
  for (int i=0;i<len;++i)
  {
    k=rand()%len;
    swap=a[k];
    a[k]=a[i];
    a[i]=swap;
  }
}
int qCompare(const void*a,const void*b)
{
  int result=*((int*)a)-*((int*)b);
  return result;
}
void main()
{
  int aLen=10000;
  int *a=new int[aLen];
  srand (time(NULL));
  fillArray(a,aLen);
  // time them
  long t0=0, d0=0, d1=0;
  // qsort
  shuffleArray(a,aLen);
  t0=::GetTickCount();
  qsort(a,aLen,sizeof(int),&qCompare);
  d0=::GetTickCount()-t0;
  // oursort
  shuffleArray(a,aLen);
  t0=::GetTickCount();
  sortArray(a,aLen);
  d1=::GetTickCount()-t0;
  delete[] a;
}

Anything that has a unique key can be sorted this way - if you have the memory to store it, of course. For example, many databases use a unique numeric customer id - if the list is small/sequential enough this could be held in memory. Or some other way to translate a record into a unique number. For more info, research Hash Sorts, since that's what this is...