Hi all, I’m new to SA and I’ve been evaluating how it performs on my inbox.
I’m using bayes and I’ve been teaching it for a couple of months now, but I haven’t been seeing the type of success I’d been hoping for. Basically, I’m seeing messages very similar to messages I’ve taught it several times are spam still getting through the bayes with relatively low scores (eg BAYES_50), so I’ve been investigating it a bit to try and figure out why. One pattern of messages which I’ve noticed slip through are those which have a multipart and have a block of bayes poisoning text in the text/plain part, with the real spam payload in the text/html part. What I’m seeing is that the text/plain block manages to hit a few of my hammy-tokens and so has its bayes score tempered enough to allow it to slip through. Of course, I then teach it this is spam, but given the random nature of this text block, it just seems this is inserting noise in the bayes DB. I guess it would eventually average out, but still... So I’m wondering, given that most e-mail clients nowadays don’t show the text/plain part if there is a text/html part, why not have SA’s bayes filter just ignore the text/plain part if there is a text/html part and just focus on that? It’s just being used for noise after all? Of course, the counter argument would be spammers would then just stop using multi part and dump the poisoning block into the text/html part instead - so maybe this is just a stupid suggestion :) Has this been discussed before? What are peoples thoughts? Cheers, Mark PS: These messages aren’t triggering the MPART_ALT_DIFF rule