Why do we use linear probing in hash tables when there is separate chaining linked with lists?

I'm surprised that you saw chained hashing to be faster than linear probing - in practice, linear probing is typically significantly faster than chaining. This is primarily due to locality of reference, since the accesses performed in linear probing tend to be closer in memory than the accesses performed in chained hashing.

There are other wins in linear probing. For example, insertions into a linear probing hash table don't require any new allocations (unless you're rehashing the table), so in applications like network routers where memory is scarce, it's nice to know that once the table is set up, the elements can be placed into it with no risk of a malloc fail.

One weakness of linear probing is that, with a bad choice of hash function, primary clustering can cause the performance of the table to degrade significantly. While chained hashing can still suffer from bad hash functions, it's less sensitive to elements with nearby hash codes, which don't adversely impact the runtime. Theoretically, linear probing only gives expected O(1) lookups if the hash functions are 5-independent or if there's sufficient entropy in the keys. There are many ways to address this, since as using the Robin Hood hashing technique or hopscotch hashing, both of which have significantly better worst-cases than vanilla linear probing.

The other weakness of linear probing is that its performance significantly degrades as the load factor approaches 1. You can address this either by rehashing periodically or by using the Robin Hood hashing technique described above.

Hope this helps!

Linear probing is actually more memory efficient when the hash table is close to full.

Historically, one had very, very little memory, so every byte mattered (and there are still some cases where memory is very limited).

Why does it use less memory?

Consider what the tables look like: (separate chaining variations as per Wikipedia - there are other variations too, but they typically use more memory)

Linear             Separate chaining #1    Separate chaining #2
probing            List head in table      Pointer in table
|------|           |------|---|            |---|    |------|---|
|Object|           |Object|Ptr|            |Ptr| -> |Object|Ptr|
|------|           |------|---|            |---|    |------|---|
|Object|           |Object|Ptr|            |Ptr| -> |Object|Ptr|
|------|           |------|---|            |---|    |------|---|
| NULL |           | NULL |Ptr|            |Ptr|
|------|           |------|---|            |---|
 .                  .                       .
 .                  .                       .
 .                  .                       .

(Ptr stands for "pointer" - any pointer not pointing to something can be considered NULL)

Separate chaining #1 clearly uses more memory than linear probing (always), as every element in the table is bigger by the size of the pointer.

Separate chaining #2 might have an advantage when there isn't much in the table, but when it gets full, it's going to have roughly an additional 2 pointers floating around for every element.

templatetypedef is probably right about linear probing typically being faster (he's rarely wrong), but it's typically taught that separate chaining is faster, and you see it in major API's (like Java implementations, for example), perhaps because of this believe, to avoid cases when linear probing is much slower (with a few well-selected values, you can quickly get to O(n) performance with linear probing while separate chaining would've still been O(1)), or perhaps for some other reason.

Why do we use linear probing in hash tables when there is separate chaining linked with lists?

Tags:

Performance

Algorithm

Hash

Hashtable

Time Complexity

Related

Recent Posts