MADALGO Theory Seminar
MADALGO Seminar: Mathias Bæk Tejs Knudsen (DIKU): "Linear Hashing os Awesome"
Title
Linear Hashing is Awesome
Abstract
The most classic textbook hash function, e.g. taught in CLRS [MIT Press '09], ish(x) = ((ax+b) mod p) mod m where a,b \in {0,1,...,p-1} are chosen uniformly at random. It is known that (*) is 2-independent and almost uniform provided p >> m. This implies that when using (*)
to build a hash table with chaining, the expected query time is O(1) and the expected length of the longest chain is O(sqrt{n}). This result holds for any 2-independent hash function. No hash function can improve on the expected query time, but the upper bound on the expected length of the longest chain is not known to be tight for (*). Partially addressing this problem, Alon et al. [STOC '97] proved the existence of a class of linear hash functions such that the expected length of the longest chain is Omega(sqrt{n}) and leave as an open problem to decide which non-trivial properties (*) has. We make the first progress on this fundamental problem showing that the expected length of the longest chain is at most n^{1/3+o(1)}, showing that the performance of (*) is similar to that of a 3-independent hash function for which we can prove an upper bound of O(n^{1/3}).
As a lemma we show that within a fixed set of integers there are few pairs such that the height of the ratio of the pairs are small. This is proved using a mixture of techniques from additive combinatorics and number theory, and we believe that the result might be of independent interest.
For a natural variation of (*) where we look at the most significant instead of the least significant bits of (ax+b) mod p to create the hash value, we show that it is possible to apply second order moment bounds even when a hash value is fixed. As a consequence:
- For min-wise hashing it was known that any key from a set of n keys have the smallest hash value with probability O(1/sqrt{n}). We improve this to n^{-1+o(1)}.
- For linear probing it was known that the worst case expected query time is O(sqrt{n}). We improve this to n^{o(1)}.
Hosts
Thomas Dueholm Hansen and Kasper Green Larsen