How can a pathological input exist for an std::unordered_set?

The input file you've provided consists of successive integers congruent to 1 modulo 107897. So what is most likely happening is that, at some point when the load factor crosses a threshold, the particular library implementation you're using resizes the table, using a table with 107897 entries, so that a key with hash value h would be mapped to the bucket h % 107897. Since each integer's hash is itself, this means all the integers that are in the table so far are suddenly mapped to the same bucket. This resizing itself should only take linear time. However, each subsequent insertion after that point will traverse a linked list that contains all the existing values, in order to make sure it's not equal to any of the existing values. So each insertion will take linear time until the next time the table is resized.

In principle the unordered_set implementation could avoid this issue by also resizing the table when any one bucket becomes too long. However, this raises the question of whether this is a hash collision with a reasonable hash function (thus requiring a resize), or the user was just misguided and hashed every key to the same value (in which case the issue will persist regardless of the table size). So maybe that's why it wasn't done in this particular library implementation.

See also https://codeforces.com/blog/entry/62393 (an application of this phenomenon to get points on Codeforces contests).


Your program works absolutely fine. There is nothing wrong with the hash algorithm, collisions, or anything of the like.

The thottling you are seeing is from the console i/o when you attempt to paste 200000 numbers into the window. That's why it chokes. Redirect from file and it works fine and returns the result almost instantly.

C:\Users\selbie\source\repos\ConsoleApplication126\Debug>ConsoleApplication126.exe  < d:/test.txt
200000

All the numbers in your test input file are unique, so the output is 200000.