http://bugzilla.spamassassin.org/show_bug.cgi?id=2129
------- Additional Comments From [EMAIL PROTECTED] 2004-03-14 10:07 ------- No, I've been looking at performance issue in Bayes. I'm not talking about sweeping useful tokens out of the database, although to the degree that performance is impacted by number of unique tokens in the database, that could be an issue. What I'm concerned about is the per message performance if a single message had 20,000 or more unique tokens in it. My email has an average of 262 tokens per message lately. 20,000 random four character sequences adds only 100Kbytes to the length of a message, below the typical 256Kbyte limit on what we will process in SpamAssassin, and would increase the number of tokens to look up in the database by a factor of 100. This is not the same as the "Bayes poisoning" that we have seen so far. I don't think we can ignore the possibility when considering whether to make use of I*tokens. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
