Re: SpamAssassins bayes mechanism and message headers

Matt Kettler Tue, 17 Mar 2009 18:30:52 -0700

fl...@pbartels.info wrote:
> Hello,
>
> instead of disabling a lot possibly set message headers using
> "bayes_ignore_header" and ending up in strange configs like:
>
> bayes_ignore_header Return-Path
> bayes_ignore_header Received
> bayes_ignore_header X-Spam-Flag
> bayes_ignore_header X-Spam-Status
> bayes_ignore_header X-Spam-Flag
> bayes_ignore_header X-Spam-Level
> bayes_ignore_header X-purgate
> bayes_ignore_header X-purgate-ID
> bayes_ignore_header X-purgate-Ad
> bayes_ignore_header X-GMX-Antispam
> bayes_ignore_header X-Resent-For
> bayes_ignore_header X-Resent-By
> bayes_ignore_header X-Resent-To
> bayes_ignore_header Resent-To
> bayes_ignore_header Sender
> bayes_ignore_header Precedence
> bayes_ignore_header X-Antispam
> bayes_ignore_header X-Sieve
> bayes_ignore_header X-Spamcount
> bayes_ignore_header X-Spamsensitivity
> bayes_ignore_header To
> bayes_ignore_header X-Sieve
> bayes_ignore_header X-WEBDE-FORWARD
>
> bayes_ignore_header X-purgate
> bayes_ignore_header X-purgate-ID
> bayes_ignore_header X-purgate-Ad
> bayes_ignore_header X-GMX-Antispam
> bayes_ignore_header X-Antispam
> bayes_ignore_header X-Spamcount
> bayes_ignore_header X-Spamsensitivity
>
> (found on the net)
Where?
>
> shouldn't SpamAssassins bayes mechanism just ignore the complete
> message header and just look at the body?
> This seems useful in my opinion.
It seems like a very misguided idea to me.


Is there any reason to think headers make bad tokens?
Do you have any test data showing this improves your bayes accuracy?

I'd expect a significant reduction in accuracy from this, but if you've
got real data showing otherwise, I'd love to see it.  My own informal
testing shows header tokens are *VERY* useful, particularly Received:
header tokens.

SpamAssassin contains quite a bit of code to break the headers down when
tokenize them in a useful way. It doesn't just extract a bunch of words
from the headers and throw them in the database, it actually encodes
things like what header a word exists in as a part of the token itself.
ie: "Drug" in the From: header is a different token  than "Drug" in the
To: header  which is different from "Drug" in the body.


> What do you mean?
> (Are static tests not good enough for the message headers?)
No.  Static rules are not any better for headers than they are for body
text. Bayes allows SA to adapt to rapid mutations in spam. These
mutations exist in both the headers, and the body.
> It seems also more useful for me to activate just special header
> fields and ignoring all other. I undestand for example From, To or the
> Subject may contain useful tokenizable informations but the most
> fields seems not interesing and hard to find out or to be sure you got
> them all.
>
> Is there a config option to tell SpamAssassins bayes mechanism not to
> look at the message header or does SpamAssassin still not look at the
> header by default?
No, the entire design of the SA bayes mechanism intentionally tries to
tokenize headers.  A lot of work went into making it do this very well.
Why would you want to disable it?

If you don't like bayes, by all means disable it, but why cut off its
legs? If you're going to use the CPU and IO time to run bayes, let it
run well.
> Perhaps there are regular expressions ?
>
> If it parses the message header, it seems you have to read the RFC's
> and look at some tools to find out what kind of message headers are set.

SA extensively parses the headers. It parses *all* headers, even
nonstandard ones that I could randomly configure a server to add like
"X-Matts-funky-header: Hi!".

There is no complete list of headers in the RFCs, because you can add a
X- header with any name you can think of.

Re: SpamAssassins bayes mechanism and message headers

Reply via email to