Re: How should this tricky spam be filtered?

Adam Katz Mon, 01 Feb 2010 09:09:57 -0800

Martin Gregorie wrote:
> Apparently putting the spam's payload in the "personal name" part
> of the From: header is as old a trick as putting it in the Subject:
> header though I hadn't seen it used until recently.
> 
> There was a recent suggestion that 'personal name' text from the
> From: header should be included in the text examined by 'body'
> rules, which already includes the Subject: text. This sounds like a
> good thing to do.


My tests have been mildly successful on this note, with FROM_WWW
already getting promoted out of testing:
http://ruleqa.spamassassin.org/?rule=/FROM_W&srcpath=khop

This indicates that we don't actually need to parse any further
because there is no sizable mass of legitimate mail that does this
(and hopefully by getting this rule out the door, people considering
it might decide against it).

Developers note:  I'm probably going to merge those two rules since
while FROM_WEBSITE sometimes flips and has a sub-.500 S/O, its ham% in
even those instances is always negligible.

This rule is particularly exciting because most of its hits are
low-scoring; 21.37% of spam is 5 and under, 68.39% is 8 and under.
This reflects a feature that (afaik) the genetic algorithm doesn't
specifically breed for and that is somewhat rare.

> Is it already in the developer's to-do list or should somebody
> (me?) raise a bug requesting it?

It might be nice to have the URI rule check From, Reply-to, and
Subject.  We'd have to be careful so as to not include /all/ headers
as many different mailing lists use various headers for subscription
management and PGP systems often use headers for pubkey locations, and
I'm sure there's other stuff out there too.

Re: How should this tricky spam be filtered?

Reply via email to