On Fri, 13 Feb 2004, Jason Crowe wrote: > I also use bogofilter and I wonder if it will be more accurate long term. > Spamassassin Bayes filter only allows a message(token ?) to be read once,
That's not quite true. SA only allows the same *instance* of a message to be learned once -- where an instance is (last I checked) determined by the message-id. So if you get the "same" spam seven times with a different message-id each time, SA will learn it every time. (I hope the use of message-id for this goes by the wayside soon, before spammers get the bright idea to steal old message-id headers from nonspam usenet or list archives and insert them into newly generated spam.) > if a token can be read multiple times won't that allow bayes work > through this type of poison as long as it's continuality trained? Yes. Also note that these sorts of attacks are mostly effective against systems that use a ratio of spammy tokens to total tokens in the given message; that is, where never-before-seen tokens can reduce the spamminess. As I understand it, SA does not work that way -- a new token is not given any weight one way or the other during SA's classification. (Someone will doubtless correct me on that.) Tokens seen in approximately equal amounts in both ham and spam tend to be ignored by the classifier as well -- so unless the poisoner manages to include tokens that previously appeared mostly in ham, the poison simply vanishes into the noise.
