Why Doesn't Java's TreeMap Allow an Initial Size?

Unlike HashMap that re-allocates its internals as new ones get inserted, the TreeMap does not generally reallocate its nodes on adding new ones. The difference can be very loosely illustrated as that between an ArrayList and a LinkedList: the first re-allocates to resize, while the second one does not. That is why setting the initial size of a TreeMap is roughly as meaningless as trying to set the initial size of a LinkedList.

The speed difference is due to the different time complexity of the two containers: inserting N nodes into a HashMap is O(n), while for the TreeMap it's O(N*LogN), which for 1000000 nodes is roughly 20 times asymptotic difference. Although the difference in asymptotic complexity does not translate directly into the timing difference because of different constants dictated by the individual algorithms, it serves as a good way to decide which algorithm is going to be faster on very large inputs.


Am I wrong to assume a TreeMap's array's initial size should be able to be set?

Yes, that assumption is incorrect. A TreeMap doesn't have an array. A TreeMap uses binary nodes with 2 children.

If you are suggesting that the number of children in a tree node should be a parameter, then you need to figure out how that impacts on search time. And I think that it turns the search time from O(log2N) to O(log2M * log2(N/M)) where N is the number elements and M is the average number of node children. (And I'm making some optimistic assumptions ...) That's not a "win".

Is there a different reason that it is so slow?

Yes. The reason that a (large) TreeMap is slow relative to a (large) HashMap under optimal circumstances is that lookup using a balanced binary tree with N entries requires looking at roughly log2N tree nodes. By contrast, in an optimal HashMap a lookup involves 1 hashcode calculation and looking at O(1) hashchain nodes.

Notes:

  1. TreeMap uses a binary tree organization that gives balanced trees, so O(log2N) is the worst case lookup time.
  2. HashMap performance depends on the collision rate of the hash function and key space. In the worst case where all keys end up on the same hash chain, a HashMap has O(N) lookup.
  3. In theory, HashMap performance becomes O(N) when you reach the maximum possible hash array size; i.e. ~2^31 entries. But if you have a HashMap that large, you should probably be looking at an alternative map implementation with better memory usage and garbage collection characteristics.

A Treemap is always balanced. Every time you add a node to the tree, it must make sure the nodes are all in order by the provided comparator. You don't have a specified size because the treemap is designed for a smooth sorted group of nodes and to traverse through the nodes easily.

A Hashmap needs to have a size-able amount of free space for the things that you store in it. My professor has always told me that it needs 5 times the amount of space that the objects or whatever you are storing in that hashmap. So specifying the size from the initial creation of the Hashmap improves the speed of your hashmap. Otherwise, if you have more objects going into a hashmap than you planned for, the hashmap has to "size up".

(edited for spelling)

Tags:

Java

Treemap