On Mon, 29 Nov 2010 09:57:18 +0000 David Laight <da...@l8s.co.uk> wrote:
> What about the sequence prime, prime*2, prime*3 etc. > Unless you are going to search for a factor that works, no particular > value is any better than any other if you allow pathaogical data > to be selected. > Only that keys which are multiples of some even number occur much more frequently. Yes you need to have a knowledge of what type of data you are hashing, but in most cases hash tables work very well. > Even searching for a factor may not help. > I looked at the distribution for the elf symbol table in libc. > The current 'first prime below the power of 2' was no different > from the '2^n-1' value - except that it is a lot slower to compute. > Yes when hashing strings, power of 2 hash tables work just as well. With integers you have to be more careful. > I would always look for a deterministic algorythm. Hashing to linked > lists is still o(n), with a hash table with 'h' entries it is just > 'h' times faster, and then only if the linked lists are equal length. > Trying to get hash chains of length 1 or 2 requires a lot of knowledge > of the data in order to generate a suitable hash function. > And then calculating the hash might be more expensive than a few > comparisons. If you are hashing integer values, you can tune the hash table so that every bucket will have no more than 1 entry. Usually you simply make hash table size a prime number. As for calculating the hash, modern hardware has fast modulo instructions. A very simple test on Intel Atom D510 processor, seems to indicate a single 'interger % prime' is roughly equal to two integer operations.