Bayes autolearn: how does it resolve whether rules are body or header related?

Bert Van de Poel Sat, 08 May 2021 19:17:33 -0700

Dear fellow Spamassassin users,

I recently noticed that quite a lot of spam emails with high scoresweren't marked for Bayes autolearning. While some senders and receiverswere a common match, explaining why autolearn was nog, there was noclear explanation for other cases. I therefore put Spamassassin in debugmode to check in more detail, and noticed that fairly often autolearn isnot used because the minimum score for body tests isn't achieved. Afterlooking at some specific cases, it seems however that several rules areeither not considered when calculating the header rule score and bodyrule score for Bayes autolearning. I've always presumed these scores arecalculated based on whether the underlying rule performs a regex on aheader or on the body, but now I'm not so sure any more. I hope you canhelp clear up whether this is intended behaviour (and what thatbehaviour is) or whether I should report this as a bug.

One example I noticed is URI_DEOBFU_INSTR=3.595. This is if I understandit correctly a URI test that's performed on the body. Should a test likethis be counted towards the body score count? Then there's the questionof meta rules such as MONEY_NOHTML. If you resolve the different metalevels within this rule, it's a combination of header and body, howeverit's only counted towards the header score. Finally, it seems as ifcustom rules I've added within local.cf aren't considered. Is thatindeed the case (and if so, is that by design)? I'm also not completelysure if UNWANTED_BODY_LANGUAGE and tests like razor, pyzor and DCC areconsidered for body scores.

Within the same realm, I'm also wondering whether these expected numbersfor body and header can be tweaked and if so, how. For example the casebelow isn't autolearned even though it has a huge score and a vastamount of tests going off, but seemingly not enough body-related scores.Is that really the intended behaviour?

May 8 10:40:32 mail amavis[4076058]: (4076058-16)header_edits_for_quar: <fine...@dasanart.com> -><g...@notgoingtoshare.tld>, Yes, score=24.619 tag=-9999 tag2=5 kill=7.5tests=[ADVANCE_FEE_3_NEW_MONEY=0.001,AXB_XMAILER_MIMEOLE_OL_024C2=0.001, BAYES_50=0.8, BERT_KULSPAM=1,FORGED_MUA_OUTLOOK=1.927, FREEMAIL_FORGED_REPLYTO=2.095,FREEMAIL_REPLYTO=1, FREEMAIL_REPLYTO_END_DIGIT=0.25,FROM_MISSPACED=0.001, FROM_MISSP_EH_MATCH=0.001,FROM_MISSP_FREEMAIL=0.001, FROM_MISSP_MSFT=0.001,FROM_MISSP_REPLYTO=2.497, FSL_BULK_SIG=0.001, FSL_CTYPE_WIN1251=0.001,FSL_NEW_HELO_USER=0.001, KHOP_HELO_FCRDNS=0.398, LOTS_OF_MONEY=0.001,MISSING_HEADERS=1.021, MISSING_MID=0.497, MONEY_FREEMAIL_REPTO=1.202,MONEY_FROM_MISSP=0.001, MONEY_NOHTML=2.497, NSL_RCVD_HELO_USER=0.001,PYZOR_CHECK=1.392, REPLYTO_WITHOUT_TO_CC=1.552, REPTO_419_FRAUD=2.996,SPF_HELO_NONE=0.001, TO_NO_BRKTS_FROM_MSSP=1.593,TO_NO_BRKTS_MSFT=1.888, XFER_LOTSA_MONEY=0.001] autolearn=noautolearn_force=no

Thank you in advance for your help. If you need any more examples orwould us to run some tests, then feel free to let me know.


Kind regards,
Bert Van de Poel
ULYSSIS

Bayes autolearn: how does it resolve whether rules are body or header related?

Reply via email to