http://bugzilla.spamassassin.org/show_bug.cgi?id=3331





------- Additional Comments From [EMAIL PROTECTED]  2004-08-08 23:18 -------
> Did you try ignoring hapaxes (single count tokens)
> to reduce the size impact?

That's an interesting possibility. Since the hash can be considered to uniquely
identify the token you could not store the token when creating a new entry, then
put it in the second time the token appears.

I think that should be an option if we do it, so people can choose the degree of
tradeoff between completeness of inforamtion about the tokens and disk space:
Minimum space with no token informaiton; Medium space with information on
non-hapaxes; Maximum space with with information on all tokens.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

Reply via email to