Binary Trees, arrays vs linked

Arrays cannot efficiently represent arbitrarily-shaped binary trees, only complete trees. A complete binary tree is one in which all levels are full, OR all levels except the deepest level are full and the deepest level has all of its nodes as far to the left as possible. (You can imagine that the levels are filled with nodes from left to right, and one level has to be filled before the next level can begin.)

Heaps are, by definition, complete binary trees - hence the array implementation is used due to its superior memory efficiency. On the other hand, binary search trees that must support insertion and removal at arbitrary locations (and thus may not be complete trees) cannot use the array implementation.


First of all, it is a legitimate question: binary trees can indeed be embedded in arrays. phari's answer is incorrect: it is possible with some effort to embed trees of arbitrary shapes into arrays (as long as you have enough memory). A straightforward representation would involve defining a Node as a variant type: either it is Filled or Empty, where Filled contains the key and the auxiliary data, and Empty is analogous to Nil (aka the null pointer). If the only operation you need to support is delete, then you're all set up: just implement the build operation to return a complete tree and then implement the normal binary tree delete operation. No balancing required to achieve O(log n) complexity bound (where n is the initial number of items the tree).

It is also possible with a significant effort to implement the insert operation by maintaining the tree in a balanced shape. Simplifying a bit, you maintain a nearly-complete tree with storage size no more than 2n (where n is the current number of items in the tree). When a new item is inserted, you check where the appropriate array cell to insert it is: if it is inside the allocated storage, you just write it to that cell. Otherwise, you go up the tree starting from the parent of that cell until you find a subtree whose storage has enough room to fit all items including the new one; if no such subtree exists, you double the storage. After finding that subtree, you rebuild it into a nearly-complete shape and insert the new item into the correct array cell (which is now guaranteed to be within the allocated storage). All this can be done in amortized O(log^2(n)) time, or in amortized O(log(n)) time with even more effort.

The above algorithm sketch is based on "Cache Oblivious Search Trees via Binary Trees of Small Height".

I have implemented a data structure called TeardownTree, which uses this kind of embedding. I support build, delete, delete_range, query_range, iter operations on the master branch, and an experimental insert operation on the insert branch. The project page also contains some benchmarks that show the data structure is definitely viable at least for some usages.

You might also be interested in this blog post explaining how to build trees in constant auxiliary space (a very fast method in practice).

Tags:

Binary Tree