Re: Bayes optimization?

Theo Van Dinter 6 Mar 2004 19:56:58 -0000

On Sat, Mar 06, 2004 at 02:10:32AM -0800, Dan Quinlan wrote:
> Just the same, SHA1 wasn't too bad.  The extra time for even a SHA1 is
> perhaps negligible.  I suspect if you used the first 32 bits or first 64
> bits of the SHA1 you'd get equally good (perhaps better vs. CRC32)
> collision rates with the same size.


I was thinking something like that, just because we already use SHA1
in our code so we may as well use it.  But it does draw around 2x the
CPU cycles to do the same calculations which isn't so great when we're
talking about throughput speed.

> I disagree.  I believe using a fixed length key would enable faster and
> much more space efficient DB hashing when using a DB capable of using
> that to its advantage.  Probably not with DB_File, of course, but other
> DBs have options for fixed length keys, and we could even use a custom
> DB.

Well, ok, but I was talking about using hash tokens in the code we
have now.  For 3.0, we're not going to be replacing DB_File, and we're
not going to write our own DB module (frankly I don't think we should
do that at all...)

BTW: I did a little more testing...  Took my 440k token bayes db and
ran through it using DB_File in a while(...= each ...) loop.  Took 11.4
seconds.  I then converted the DB to use crc64 hashed keys instead,
but everything else exactly the same.  Then ran through the read-only
loop from up above.  11.25 seconds.

So if we combined the read time decrease with the CPU time increase
from the hashing function, we end up taking an extra 0.2 seconds, so
it's still not worthwhile given the current code.

-- 
Randomly Generated Tagline:
"I'm nothing ... I'm navel lint ..."         - From the movie True Lies

pgpQR2CjvHrQ5.pgp
Description: PGP signature

Re: Bayes optimization?

Reply via email to