Re: Bayes learning differences: v3.3.2 to v3.4.0

Kevin A. McGrail Wed, 05 Nov 2014 04:41:05 -0800

On 11/4/2014 6:06 PM, John Woods wrote:

Everyone,
We're having problems with auto learning on v3.4.0 that we aren'thaving on v.3.3.2. The number of spam e-mails being auto-learned hasdropped significantly, and the amount of spam being let through (falsenegatives) is higher as well. After looking through the wiki andthe code, I'm pretty sure this change is related to the rule that saysyou must have 3 "body only" points and 3 "header only" points, whichare hardcoded values inMail::SpamAssassin::Plugin::AutoLearnThreshold. In 3.3.2, it lookslike body-points equals the head-points, and in 3.4.0, they are changed.

You are correct. There were changes and bugs found in the logic thatwere resolved on 3.4.0. Seehttps://issues.apache.org/SpamAssassin/show_bug.cgi?id=5503

    I've got a few questions:
1) How does SpamAssassin derive and sum the "body_only" and"head_only" points? It doesn't look like the body_only pointscorrespond to any scores from individual tests.

There is a test_type flag. It was sometimes lost in previous parsing ofmessages.

2) How can we affect the configuration, to increase the number ofspam e-mails being auto-learned?3) Instead, do we need to completely change our strategy for howwe're using Bayes?

I will leave Bayes comments to other experts but in general, I believeyou will find that some sort of NON automated learning will producebetter results. My concern with auto-learning is you are justself-perpetuating any flaws in the current classification not reallyhelping to stop new and different spam. I will likely setup a flamewarif I continue discussing Bayes.

Perhaps you can buy a six pack for AXB and convince him to add his $0.04on Bayes. He's the resident expert.


regards,
KAM

Re: Bayes learning differences: v3.3.2 to v3.4.0

Reply via email to