http://bugzilla.spamassassin.org/show_bug.cgi?id=3331
------- Additional Comments From [EMAIL PROTECTED] 2004-08-09 12:24 ------- Subject: Re: [review] Bayes option to keep original token as db data (not key). On Sun, Aug 08, 2004 at 10:45:02PM -0700, [EMAIL PROTECTED] wrote: > > Is there a limit on the size of original tokens? > Not currently, I'm contemplating a limit, at least on the SQL side, of 128 or 200. Opinions? > How much does this slow down DBM when the option is turned off vs. when > the patch has not been applied? > Hmmmm..anywhere from 0-7%, it was actually faster for the spamd phase. I'm inclined to run the test again, something seems fishy but it's possible I've got some innerloop if statement that is causing a problem. > If you turn the option off, are original tokens removed over time? > Yes, as tokens are updated they will be updated with blank raw_token values. > I assume when upgrading from version 2 that the original tokens will be > stored in this field if the option is turned on. > That's the idea, haven't tested this yet. > Did you try ignoring hapaxes (single count tokens) to reduce the size > impact? > I forgot about this, very easy to do, just a matter of running some benchmarks. I'm still running benchmarks, each one takes 50min to 1hr to run. 3 times each type of test times 3-4 tests per storage engine takes a little while. Michael ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
