Mail::SpamAssassin::Conf:
   body SYMBOLIC_TEST_NAME /pattern/modifiers ...
       The 'body' in this case is the textual parts of the message body;
       any non-text MIME parts are stripped, and the message decoded from
       Quoted-Printable or Base-64-encoded format if necessary.  The
       message Subject header is considered part of the body and becomes
       the first paragraph when running the rules.  All HTML tags and line
       breaks will be removed before matching.

It sure would be nice if spamassassin had a flag that would cause it to
spit out this postprocessed body, so we could know exactly what it is
that we are tying to match against!!

Same for rawbody!

Regarding "All HTML tags and line breaks will be removed before matching,"
painstaking trial and error showed me that at least the line breaks are
replaced by a space (%20), not just "removed". Perhaps someone should do
something about that wording.

Also blanks are compressed into just one... So line breaks and non-text
(whitespace) are compressed into just one blank, one finds. OK.

   rawbody SYMBOLIC_TEST_NAME /pattern/modifiers ...
       The 'raw body' of a message is the raw data inside all textual
       parts. The text will be decoded from base64 or quoted-printable
       encoding, but HTML tags and line breaks will still be present.
       Multiline expressions will need to be used to match strings that
       are broken by line breaks.

Here he forgets to mention if the Subject is also considered part of the
body, perhaps assuming that the reader has just read "body" above it...
My tests show that indeed the Subject is part of rawbody. (Yes I could
look at the source, but let's hope he/they/you will improve the man page.)

Anyway, for "body" at least, even a /SUBJECT.*MESSAGE/s does not help me match
whatever it is that supposedly joins the post processed "first paragraph",

       The message Subject header is considered part of the body and
       becomes the first paragraph when running the rules. All HTML tags
       and line breaks will be removed before matching.

with the rest, despite perlre's

       s   Treat string as single line.  That is, change "." to match any
           character whatsoever, even a newline, which normally it would not
           match.

hence I move that spamassassin should have a flag to spit out this
postprocessed body so we can see if it can be matched against in its
entirety in the first place in any way at all!

By the way, further tests I did show that the Subject is indeed totally
disposed of. Even /.MESSAGE/ won't match.

OK, I think I know what is going on. For the spam message:
Subject: Re: Your Photos

Hello, 
as promised your photos  http://...

These match,
body J_PHO /^Hello, as promised your photos http/
body J_PHO /^Re: Your Photos$/
but the user must remember that _these are still run line by line_ so
there is no way to match across that first "paragraph" boundary
mentioned!

Same for rawbody.

Anyway, still need a flag to spit them out.

Reply via email to