On 02/24/2018 01:05 AM, Amir Caspi wrote:
On Feb 23, 2018, at 11:47 PM, David B Funk <dbf...@engineering.uiowa.edu> wrote:
It could have 20 points from a whole bunch of body rules but if it only hit 2
points via header rules it still will not auto-learn.

Gotcha. The spam in question that triggered this hit a lot of rules, but hard 
for me to tell on cursory inspection whether it satisfies sufficient header and 
body points.  But it LOOKS like there should be at least 3 points from header 
(MISSING_HEADERS, FREEMAIL_FORGED_REPLYTO, among others) and certainly 3 body 
(MONEY_FRAUD_3 at the very least).  The actual spam report is this:

        *  0.0 FSL_CTYPE_WIN1251 Content-Type only seen in 419 spam
        *  0.0 NSL_RCVD_FROM_USER Received from User
        *  1.0 MISSING_HEADERS Missing To: header
        *  0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60%
        *      [score: 0.5004]
        *  1.1 DCC_CHECK Detected as bulk mail by DCC (dcc-servers.net)
        *  0.0 FROM_MISSP_MSFT From misspaced + supposed Microsoft tool
        *  0.0 FSL_NEW_HELO_USER Spam's using Helo and User
        *  2.6 MSOE_MID_WRONG_CASE No description available.
        *  0.0 FROM_MISSP_USER From misspaced, from "User"
        *  1.0 RDNS_DYNAMIC Delivered to internal network by host with
        *      dynamic-looking rDNS
        *  0.0 LOTS_OF_MONEY Huge... sums of money
        *  0.0 FROM_MISSP_XPRIO Misspaced FROM + X-Priority
        *  1.6 REPLYTO_WITHOUT_TO_CC No description available.
        *  0.0 AXB_XMAILER_MIMEOLE_OL_024C2 Yet another X header trait
        *  0.0 MSGID_FROM_MTA_HEADER Message-Id was added by a relay
        *  0.0 FSL_BULK_SIG Bulk signature with no Unsubscribe
        *  2.1 FREEMAIL_FORGED_REPLYTO Freemail in Reply-To, but not From
        *  1.0 FREEMAIL_REPLYTO Reply-To/From or Reply-To/body contain different
        *      freemails
        *  0.0 TO_NO_BRKTS_FROM_MSSP Multiple header formatting problems
        *  1.9 FORGED_MUA_OUTLOOK Forged mail pretending to be from MS Outlook
        *  1.6 TO_NO_BRKTS_DYNIP To: lacks brackets and dynamic rDNS
        *  0.0 FILL_THIS_FORM Fill in a form with personal information
        *  2.0 TO_NO_BRKTS_MSFT To: lacks brackets and supposed Microsoft tool
        *  2.0 FILL_THIS_FORM_LONG Fill in a form with personal information
        *  3.1 FROM_MISSP_FREEMAIL From misspaced + freemail provider
        *  3.0 MONEY_FRAUD_3 Lots of money and several fraud phrases

But, it still didn't autolearn.

(I can post the entire spample if the above seems like it should have 
autolearned.)

Another possible factor, if you have "bayes_auto_learn_on_error" enabled, then 
autolearn will be skipped if Bayes already agrees with the condition of the message. IE: 
if the message is already classifed as BAYES_99 then it won't bother auto-learning it as 
yet another high-ranking spam.

I do not have that enabled.  Also, as you can see from above, this hit BAYES_50.

Does the above provide an indication as to why it didn't autolearn?

Thanks!

--- Amir



I found the best thing to do is setup a hidden mail server (iRedMail) and split a copy of all mail to it to sort and filter into a Ham and Spam folder based on rule hits and scoring. Then I run a nightly sa-learn on the Ham and Spam folders (in that order). The few questionable emails that score in the middle stay in the Inbox so I just have to drag-n-drop into the Ham or Spam folder taking a few minutes a day. Some that are new phishing campaigns or are from compromised accounts are copied into a Spamcop folder that automatically submits it to my Spamcop account.

I also use the Ham and Spam folders for the nightly SA masscheck to help get new rules validated and new 72_scores.cf update daily via sa-update.

--
David Jones

Reply via email to