On 11/4/2014 6:06 PM, John Woods wrote:
Everyone,

We're having problems with auto learning on v3.4.0 that we aren't having on v.3.3.2. The number of spam e-mails being auto-learned has dropped significantly, and the amount of spam being let through (false negatives) is higher as well. After looking through the wiki and the code, I'm pretty sure this change is related to the rule that says you must have 3 "body only" points and 3 "header only" points, which are hardcoded values in Mail::SpamAssassin::Plugin::AutoLearnThreshold. In 3.3.2, it looks like body-points equals the head-points, and in 3.4.0, they are changed.

You are correct. There were changes and bugs found in the logic that were resolved on 3.4.0. See https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5503
    I've got a few questions:

1) How does SpamAssassin derive and sum the "body_only" and "head_only" points? It doesn't look like the body_only points correspond to any scores from individual tests.
There is a test_type flag. It was sometimes lost in previous parsing of messages.

2) How can we affect the configuration, to increase the number of spam e-mails being auto-learned? 3) Instead, do we need to completely change our strategy for how we're using Bayes?
I will leave Bayes comments to other experts but in general, I believe you will find that some sort of NON automated learning will produce better results. My concern with auto-learning is you are just self-perpetuating any flaws in the current classification not really helping to stop new and different spam. I will likely setup a flamewar if I continue discussing Bayes.

Perhaps you can buy a six pack for AXB and convince him to add his $0.04 on Bayes. He's the resident expert.

regards,
KAM

Reply via email to