On Sun, 07 Mar 2004 02:46:52 +1300, Sidney Markowitz <[EMAIL PROTECTED]> writes:
> Daniel Quinlan wrote: > > I suspect if you used the first 32 bits or first 64 bits of > > the SHA1 you'd get equally good (perhaps better vs. CRC32) > > collision rates with the same size. > > Some theoretical results: The Birthday Paradox says that you can > expect to look at about 2 billion (2^31) hashes before you see any > collision with a good random 32 bit hash. Birthday paradox is sqrt(count) not count/2. So you'd expect a hit after 64k inserts. > So you should not see the 14 collisions that are shown for CRC32 if > you used a good cryptographic hash, such as something based on > cooking MD5 or SHA-1 down to 32 bits. CRC32 is a good hash. The reason that the other hashes have fewer collisions is that they have more bits. MD5 truncated to 32 bits would be just as bad. CRC32 has another property that may be a problem. It is deterministic, so an attacker can deliberately compute collisions. (I don't see how this produces anything exploitable.) I'd probably do Jenkin's keyed hash, which generates 64 bits, then, depending on what is needed, truncate to an appropriate or convenient number of bits. Using a trick from Quinlan, it should be possible to design a 2^i token database with 10 bits each for spam&ham counts, 16 bit circular space for last-use time, and the equivalent of 20+i bits of hash at a fixed 8 bytes/token. Scott
