On Fri, 13 Feb 2004, Jason Crowe wrote:

> I also use bogofilter and I wonder if it will be more accurate long term.
> Spamassassin Bayes filter only allows a message(token ?) to be read once,

That's not quite true.  SA only allows the same *instance* of a message to
be learned once -- where an instance is (last I checked) determined by the
message-id.  So if you get the "same" spam seven times with a different
message-id each time, SA will learn it every time.

(I hope the use of message-id for this goes by the wayside soon, before
spammers get the bright idea to steal old message-id headers from nonspam
usenet or list archives and insert them into newly generated spam.)

> if a token can be read multiple times won't that allow bayes work
> through this type of poison as long as it's continuality trained?

Yes.

Also note that these sorts of attacks are mostly effective against systems
that use a ratio of spammy tokens to total tokens in the given message;  
that is, where never-before-seen tokens can reduce the spamminess.  As I
understand it, SA does not work that way -- a new token is not given any
weight one way or the other during SA's classification.  (Someone will
doubtless correct me on that.)  Tokens seen in approximately equal amounts
in both ham and spam tend to be ignored by the classifier as well -- so
unless the poisoner manages to include tokens that previously appeared
mostly in ham, the poison simply vanishes into the noise.

Reply via email to