http://bugzilla.spamassassin.org/show_bug.cgi?id=3331





------- Additional Comments From [EMAIL PROTECTED]  2004-08-09 12:24 -------
Subject: Re:  [review] Bayes option to keep original token as db data (not key).

On Sun, Aug 08, 2004 at 10:45:02PM -0700, [EMAIL PROTECTED] wrote:
> 
> Is there a limit on the size of original tokens?
> 

Not currently, I'm contemplating a limit, at least on the SQL side, of
128 or 200.  Opinions?

> How much does this slow down DBM when the option is turned off vs. when
> the patch has not been applied?
> 

Hmmmm..anywhere from 0-7%, it was actually faster for the spamd
phase.  I'm inclined to run the test again, something seems fishy but
it's possible I've got some innerloop if statement that is causing a
problem.

> If you turn the option off, are original tokens removed over time?
> 

Yes, as tokens are updated they will be updated with blank raw_token
values.

> I assume when upgrading from version 2 that the original tokens will be
> stored in this field if the option is turned on.
> 

That's the idea, haven't tested this yet.

> Did you try ignoring hapaxes (single count tokens) to reduce the size
> impact?
> 

I forgot about this, very easy to do, just a matter of running some
benchmarks.

I'm still running benchmarks, each one takes 50min to 1hr to run.  3
times each type of test times 3-4 tests per storage engine takes a
little while.

Michael





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

Reply via email to