Re: Really hard-to-filter spam

Sean Greenslade Sat, 05 Aug 2023 00:33:13 -0700

On Fri, Aug 04, 2023 at 08:38:24AM -0500, Thomas Cameron wrote:
> It was a typo, sorry. I have a cron job that uses --spam against the spam
> folder, and --ham against the ham folder. I just copied and pasted poorly.
> This is the actual script for my account:
> 
> [thomas.cameron@mail-east ~]$ cat bin/spamcheck
> #!/bin/bash
> sa-learn --progress --spam --mbox /home/thomas.cameron/mail/INBOX/spam
> sa-learn --progress --ham --mbox /home/thomas.cameron/mail/INBOX/ham
> 
> Bayes tests for other messages, like the one you sent me, looks like this:
> 
> ------------------------------------------------------------------
> Return-Path: <s...@redacted.foo>
> X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
>       mail-east.camerontech.com
> X-Spam-Level:
> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,
>       DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI,SPF_HELO_NONE,
>       SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham
>       autolearn_force=no version=3.4.6
> ------------------------------------------------------------------
> 
> But messages flagged as spam look like this:
> 
> ------------------------------------------------------------------
> Return-Path:
> <usawildseafood_ad-thomas.cameron=camerontech.com@redacted.click>
> X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
>       mail-east.camerontech.com
> X-Spam-Flag: YES
> X-Spam-Level: ************************************
> X-Spam-Status: Yes, score=36.8 required=5.0 tests=BAYES_99,BAYES_999,
>       DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FROM_FMBLA_NEWDOM,
>       FROM_SUSPICIOUS_NTLD,FROM_SUSPICIOUS_NTLD_FP,HTML_IMAGE_ONLY_32,
>       HTML_MESSAGE,PDS_OTHER_BAD_TLD,RAZOR2_CF_RANGE_51_100,RAZOR2_CHECK,
>       RCVD_IN_DNSWL_HI,RDNS_NONE,SH_HELO_DBL,SH_HELO_ZRD_FRESH,
>       SH_ZRD_HEADERS_FRESH,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,
>       URIBL_ABUSE_SURBL,URIBL_BLACK,URIBL_ZRD shortcircuit=no autolearn=spam
>       autolearn_force=no version=3.4.6
> ------------------------------------------------------------------
> 
> The previous email I copied headers from as an example was just a bad
> example. Usually Bayes is /pretty/ accurate on my system. I only used that
> one because it was a message which made it through SpamAssassin. I was
> trying to demonstrate that the checks were not failing, as suggested in an
> earlier comment.
> 
> Thanks for catching that, though. I have made silly mistakes like that so I
> appreciate you checking me.


In that case, I think I can only offer some general suggestions that I
personally follow.

I have the autolearn function completely disabled. In my experience, if
you have a decent training corpus of known ham and known spam, autolearn
doesn't really add anything.

Like yours, my bayes results are usually quite accurate. At this point,
I only train messages that are actually false positives or false
negatives. I can't say for sure how effective this is, but my intuition
is that by only training on "hard" messages (meaning ones that the
non-bayes SA rules couldn't take care of on their own), I'm keeping the
bayes engine focused on the most important messages to classify
correctly. Your above spample has such a high score, my mail server
would have rejected that message at SMTP time even if it had triggered
BAYES_00. I wouldn't bother training such a message; the rest of the
rules have it covered.

Another thing to note is that spam tends to change over time. Having
really old spams in your bayes DB could be diluting its effectiveness by
having it look for signs that the current crop of spams don't show. It
might be worth starting fresh with an empty bayes db and training just a
few hundred of your most recent hams and spams.

And finally, if there's something consistent about the messages, don't
be afraid to write a manual rule. I have a few special rules in my
configs that alter the bayes scoring based on other aspects of the
messages.

--Sean

Re: Really hard-to-filter spam

Reply via email to