Reason why CFS scheduler using red black tree?

Another interesting point is that, considering you have a task (process or thread) changing state from runnable to blocked (waiting for io or network resource), then you need to remove that task from the runqueue and the complexities are:

  • O(log(n)) for red black tree
  • O(n) for heap

The remove operation of heap is slow and that's why red black tree is better.

And when we get the min vruntime, the heap operation is not actually O(1), O(1) only happen if you refer the root node without removing it. But in CFS, we need to

  • Remove it (which requires heapifying of O(log(n)))
  • Update vruntime, and insert it back to runqueue which needs O(log(n)), too

The reason is: Heaps are array based and hence require contiguous memory in kernel space. This is because the way heaps are implemented in Linux. See the files lib/prio_heap.c and include/linux/prio_heap.h and you'll note that heap is kmalloc'd using heap_init. Once the multi-programming space becomes huge, maintaining thousands of struct sched_entity requires lot of contiguous space (it runs in several pages). From time and performance point of view, one would prefer heap as hepify operation can run in background once min vruntime is picked but it's space requirement which makes bottleneck.

As rbtree is readily available, kernel developers didn't think of implementing pointer based heap, in fact one doesn't need.