Everyone,
We're having problems with auto learning on v3.4.0 that we aren't
having on v.3.3.2. The number of spam e-mails being auto-learned has
dropped significantly, and the amount of spam being let through (false
negatives) is higher as well.
For reference, here is a snippet from a "spamassassin -D" on a spam
e-mail, under version 3.3.2, on Solaris 10 x86:
Nov 4 13:50:47.844 [28558] dbg: plugin:
Mail::SpamAssassin::Plugin::AutoLearnThreshold=HASH(0x8c62360)
implements 'autolearn_discriminator', priority 0
Nov 4 13:50:47.844 [28558] dbg: learn: auto-learn: currently using
scoreset 3, recomputing score based on scoreset 1
Nov 4 13:50:47.844 [28558] dbg: learn: auto-learn: message score:
15.696, computed score for autolearn: 11.022
Nov 4 13:50:47.844 [28558] dbg: learn: auto-learn? ham=0, spam=6.5,
*body-points=11.022, head-points=11.022*, learned-points=0.8
Nov 4 13:50:47.844 [28558] dbg: learn: auto-learn? yes, spam
(11.022 > 6.5)
Here is the same e-mail, under version 3.4.0, on Solaris 11.1 x86:
Nov 4 13:56:20.901 [1554] dbg: plugin:
Mail::SpamAssassin::Plugin::AutoLearnThreshold=HASH(0x8e32700)
implements 'autolearn_discriminator', priority 0
Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: currently using
scoreset 3, recomputing score based on scoreset 1
Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: not considered
head or body scores: 3.558
Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: not considered
head or body scores: 1.644
Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding body_only
points 0.001
Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding body_only
points 0.001
Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding body_only
points 0.724
Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding body_only
points 0.342
Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding head_only
points 1.323
Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding head_only
points 1.274
Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn: adding head_only
points 0.005
Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn: adding head_only
points 1.886
Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn: adding head_only
points 0.123
Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn: adding head_only
points 0.141
Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn: message score:
16.896, computed score for autolearn: 11.022
Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn? ham=0, spam=6.5,
*body-points=1.068, head-points=4.752*, learned-points=2
*Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn: autolearn_force
not flagged for a rule. Body Only Points: 1.068 (3 req'd) / Head
Only Points: 4.752 (3 req'd)*
***Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn? no: scored as
spam but too few body points (1.068 < 3)*
After looking through the wiki and the code, I'm pretty sure this change
is related to the rule that says you must have 3 "body only" points and
3 "header only" points, which are hardcoded values in
Mail::SpamAssassin::Plugin::AutoLearnThreshold. In 3.3.2, it looks like
body-points equals the head-points, and in 3.4.0, they are changed.
I've got a few questions:
1) How does SpamAssassin derive and sum the "body_only" and
"head_only" points? It doesn't look like the body_only points correspond
to any scores from individual tests.
2) How can we affect the configuration, to increase the number of
spam e-mails being auto-learned?
3) Instead, do we need to completely change our strategy for how
we're using Bayes?
Thanks,
John**