From: Matt Kettler <mkettler...@verizon.net> Date: Tue, 17 Mar 2009 21:30:02 -0400 fl...@pbartels.info wrote: > Hello, > > instead of disabling a lot possibly set message headers using > "bayes_ignore_header" and ending up in strange configs like: > > bayes_ignore_header Return-Path ... > (found on the net) Where? > > shouldn't SpamAssassins bayes mechanism just ignore the complete > message header and just look at the body? > This seems useful in my opinion. It seems like a very misguided idea to me. Is there any reason to think headers make bad tokens? Do you have any test data showing this improves your bayes accuracy?
Yes - I think some headers make extremely bad tokens for bayes, for example the X-Mailer/User-Agent headers. 40% of the spam I get claims to have Microsoft Outlook as a x-Mailer. So bayes rapidly determines that *UAMicrosoft (etc) is an extremely strong token. These *UA tokens were enough to push a short ham message to BAYES_99. When I added an bayes_ignore_header the score dropped to ~BAYES_40 Obfuscated words like 'st0ck' are 100% indications of spam (or of messages that discuss spam), so these words work great for bayes. A 'X-Mailer: Microsoft Office Outlook' header doesn't really tell you anything about the message, at least not to the extent that bayes treats these tokens. The Message-ID tokens are also low quality tokens. Most of these tokens are hapaxes that are never used by other messages. These just fill up the bayes database. Maybe if the Message-ID tokens were even more processed then maybe these could be more useful for bayes - eg - replace 1234.56789 with a format %4d.%5d, or throw out all of the timestamp numbers and keep the just the stuff after the @. -jeff