On Sat, Mar 06, 2004 at 02:10:32AM -0800, Dan Quinlan wrote: > Just the same, SHA1 wasn't too bad. The extra time for even a SHA1 is > perhaps negligible. I suspect if you used the first 32 bits or first 64 > bits of the SHA1 you'd get equally good (perhaps better vs. CRC32) > collision rates with the same size.
I was thinking something like that, just because we already use SHA1 in our code so we may as well use it. But it does draw around 2x the CPU cycles to do the same calculations which isn't so great when we're talking about throughput speed. > I disagree. I believe using a fixed length key would enable faster and > much more space efficient DB hashing when using a DB capable of using > that to its advantage. Probably not with DB_File, of course, but other > DBs have options for fixed length keys, and we could even use a custom > DB. Well, ok, but I was talking about using hash tokens in the code we have now. For 3.0, we're not going to be replacing DB_File, and we're not going to write our own DB module (frankly I don't think we should do that at all...) BTW: I did a little more testing... Took my 440k token bayes db and ran through it using DB_File in a while(...= each ...) loop. Took 11.4 seconds. I then converted the DB to use crc64 hashed keys instead, but everything else exactly the same. Then ran through the read-only loop from up above. 11.25 seconds. So if we combined the read time decrease with the CPU time increase from the hashing function, we end up taking an extra 0.2 seconds, so it's still not worthwhile given the current code. -- Randomly Generated Tagline: "I'm nothing ... I'm navel lint ..." - From the movie True Lies
pgpQR2CjvHrQ5.pgp
Description: PGP signature
