Hi all,

I’m new to SA and I’ve been evaluating how it performs on my inbox.

I’m using bayes and I’ve been teaching it for a couple of months now, but I 
haven’t been seeing the type of success I’d been hoping for. Basically, I’m 
seeing messages very similar to messages I’ve taught it several times are spam 
still getting through the bayes with relatively low scores (eg BAYES_50), so 
I’ve been investigating it a bit to try and figure out why.

One pattern of messages which I’ve noticed slip through are those which have a 
multipart and have a block of bayes poisoning text in the text/plain part, with 
the real spam payload in the text/html part.  What I’m seeing is that the 
text/plain block manages to hit a few of my hammy-tokens and so has its bayes 
score tempered enough to allow it to slip through. Of course, I then teach it 
this is spam, but given the random nature of this text block, it just seems 
this is inserting noise in the bayes DB. I guess it would eventually average 
out, but still...

So I’m wondering, given that most e-mail clients nowadays don’t show the 
text/plain part if there is a text/html part, why not have SA’s bayes filter 
just ignore the text/plain part if there is a text/html part and just focus on 
that? It’s just being used for noise after all?

Of course, the counter argument would be spammers would then just stop using 
multi part and dump the poisoning block into the text/html part instead - so 
maybe this is just a stupid suggestion :)

Has this been discussed before? What are peoples thoughts?

Cheers,

        Mark

PS: These messages aren’t triggering the MPART_ALT_DIFF rule

Reply via email to