Everyone,

We're having problems with auto learning on v3.4.0 that we aren't having on v.3.3.2. The number of spam e-mails being auto-learned has dropped significantly, and the amount of spam being let through (false negatives) is higher as well.

For reference, here is a snippet from a "spamassassin -D" on a spam e-mail, under version 3.3.2, on Solaris 10 x86:

   Nov  4 13:50:47.844 [28558] dbg: plugin:
   Mail::SpamAssassin::Plugin::AutoLearnThreshold=HASH(0x8c62360)
   implements 'autolearn_discriminator', priority 0
   Nov  4 13:50:47.844 [28558] dbg: learn: auto-learn: currently using
   scoreset 3, recomputing score based on scoreset 1
   Nov  4 13:50:47.844 [28558] dbg: learn: auto-learn: message score:
   15.696, computed score for autolearn: 11.022
   Nov  4 13:50:47.844 [28558] dbg: learn: auto-learn? ham=0, spam=6.5,
   *body-points=11.022, head-points=11.022*, learned-points=0.8
   Nov  4 13:50:47.844 [28558] dbg: learn: auto-learn? yes, spam
   (11.022 > 6.5)

    Here is the same e-mail, under version 3.4.0, on Solaris 11.1 x86:

   Nov  4 13:56:20.901 [1554] dbg: plugin:
   Mail::SpamAssassin::Plugin::AutoLearnThreshold=HASH(0x8e32700)
   implements 'autolearn_discriminator', priority 0
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: currently using
   scoreset 3, recomputing score based on scoreset 1
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: not considered
   head or body scores: 3.558
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: not considered
   head or body scores: 1.644
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding body_only
   points 0.001
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding body_only
   points 0.001
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding body_only
   points 0.724
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding body_only
   points 0.342
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding head_only
   points 1.323
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding head_only
   points 1.274
   Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn: adding head_only
   points 0.005
   Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn: adding head_only
   points 1.886
   Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn: adding head_only
   points 0.123
   Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn: adding head_only
   points 0.141
   Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn: message score:
   16.896, computed score for autolearn: 11.022
   Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn? ham=0, spam=6.5,
   *body-points=1.068, head-points=4.752*, learned-points=2
   *Nov  4 13:56:20.902 [1554] dbg: learn: auto-learn: autolearn_force
   not flagged for a rule. Body Only Points: 1.068 (3 req'd) / Head
   Only Points: 4.752 (3 req'd)*
   ***Nov  4 13:56:20.902 [1554] dbg: learn: auto-learn? no: scored as
   spam but too few body points (1.068 < 3)*


After looking through the wiki and the code, I'm pretty sure this change is related to the rule that says you must have 3 "body only" points and 3 "header only" points, which are hardcoded values in Mail::SpamAssassin::Plugin::AutoLearnThreshold. In 3.3.2, it looks like body-points equals the head-points, and in 3.4.0, they are changed.

    I've got a few questions:

1) How does SpamAssassin derive and sum the "body_only" and "head_only" points? It doesn't look like the body_only points correspond to any scores from individual tests. 2) How can we affect the configuration, to increase the number of spam e-mails being auto-learned? 3) Instead, do we need to completely change our strategy for how we're using Bayes?

Thanks,
John**

Reply via email to