Bayes learning differences: v3.3.2 to v3.4.0

John Woods Tue, 04 Nov 2014 15:17:11 -0800

Everyone,

We're having problems with auto learning on v3.4.0 that we aren'thaving on v.3.3.2. The number of spam e-mails being auto-learned hasdropped significantly, and the amount of spam being let through (falsenegatives) is higher as well.

For reference, here is a snippet from a "spamassassin -D" on a spame-mail, under version 3.3.2, on Solaris 10 x86:


   Nov  4 13:50:47.844 [28558] dbg: plugin:
   Mail::SpamAssassin::Plugin::AutoLearnThreshold=HASH(0x8c62360)
   implements 'autolearn_discriminator', priority 0
   Nov  4 13:50:47.844 [28558] dbg: learn: auto-learn: currently using
   scoreset 3, recomputing score based on scoreset 1
   Nov  4 13:50:47.844 [28558] dbg: learn: auto-learn: message score:
   15.696, computed score for autolearn: 11.022
   Nov  4 13:50:47.844 [28558] dbg: learn: auto-learn? ham=0, spam=6.5,
   *body-points=11.022, head-points=11.022*, learned-points=0.8
   Nov  4 13:50:47.844 [28558] dbg: learn: auto-learn? yes, spam
   (11.022 > 6.5)

    Here is the same e-mail, under version 3.4.0, on Solaris 11.1 x86:

   Nov  4 13:56:20.901 [1554] dbg: plugin:
   Mail::SpamAssassin::Plugin::AutoLearnThreshold=HASH(0x8e32700)
   implements 'autolearn_discriminator', priority 0
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: currently using
   scoreset 3, recomputing score based on scoreset 1
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: not considered
   head or body scores: 3.558
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: not considered
   head or body scores: 1.644
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding body_only
   points 0.001
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding body_only
   points 0.001
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding body_only
   points 0.724
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding body_only
   points 0.342
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding head_only
   points 1.323
   Nov 4 13:56:20.901 [1554] dbg: learn: auto-learn: adding head_only
   points 1.274
   Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn: adding head_only
   points 0.005
   Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn: adding head_only
   points 1.886
   Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn: adding head_only
   points 0.123
   Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn: adding head_only
   points 0.141
   Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn: message score:
   16.896, computed score for autolearn: 11.022
   Nov 4 13:56:20.902 [1554] dbg: learn: auto-learn? ham=0, spam=6.5,
   *body-points=1.068, head-points=4.752*, learned-points=2
   *Nov  4 13:56:20.902 [1554] dbg: learn: auto-learn: autolearn_force
   not flagged for a rule. Body Only Points: 1.068 (3 req'd) / Head
   Only Points: 4.752 (3 req'd)*
   ***Nov  4 13:56:20.902 [1554] dbg: learn: auto-learn? no: scored as
   spam but too few body points (1.068 < 3)*

After looking through the wiki and the code, I'm pretty sure this changeis related to the rule that says you must have 3 "body only" points and3 "header only" points, which are hardcoded values inMail::SpamAssassin::Plugin::AutoLearnThreshold. In 3.3.2, it looks likebody-points equals the head-points, and in 3.4.0, they are changed.


    I've got a few questions:

1) How does SpamAssassin derive and sum the "body_only" and"head_only" points? It doesn't look like the body_only points correspondto any scores from individual tests.2) How can we affect the configuration, to increase the number ofspam e-mails being auto-learned?3) Instead, do we need to completely change our strategy for howwe're using Bayes?


Thanks,
John**

Bayes learning differences: v3.3.2 to v3.4.0

Reply via email to