On 11/4/2014 6:06 PM, John Woods wrote:
Everyone,
We're having problems with auto learning on v3.4.0 that we aren't
having on v.3.3.2. The number of spam e-mails being auto-learned has
dropped significantly, and the amount of spam being let through (false
negatives) is higher as well. After looking through the wiki and
the code, I'm pretty sure this change is related to the rule that says
you must have 3 "body only" points and 3 "header only" points, which
are hardcoded values in
Mail::SpamAssassin::Plugin::AutoLearnThreshold. In 3.3.2, it looks
like body-points equals the head-points, and in 3.4.0, they are changed.
You are correct. There were changes and bugs found in the logic that
were resolved on 3.4.0. See
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5503
I've got a few questions:
1) How does SpamAssassin derive and sum the "body_only" and
"head_only" points? It doesn't look like the body_only points
correspond to any scores from individual tests.
There is a test_type flag. It was sometimes lost in previous parsing of
messages.
2) How can we affect the configuration, to increase the number of
spam e-mails being auto-learned?
3) Instead, do we need to completely change our strategy for how
we're using Bayes?
I will leave Bayes comments to other experts but in general, I believe
you will find that some sort of NON automated learning will produce
better results. My concern with auto-learning is you are just
self-perpetuating any flaws in the current classification not really
helping to stop new and different spam. I will likely setup a flamewar
if I continue discussing Bayes.
Perhaps you can buy a six pack for AXB and convince him to add his $0.04
on Bayes. He's the resident expert.
regards,
KAM