On Fri, 2011-11-25 at 11:49 -0500, Kevin A. McGrail wrote:
> > On 11/25/2011 12:23 AM, Alex wrote:
> > > Some time ago we created the following rule on this list to identify
> > > mail with less than 200 characters in the body:

> > > rawbody     __KB_RAWBODY_200    /^.{0,200}$/s

> > > I'm finding that it's hitting on mail that is much larger than 200
> > > characters and I don't understand why. Is it only the text/plain
> > > component of the body? Here's an example:
> > >
> > > http://pastebin.com/raw.php?i=XNHjxfTz

Damn! So it does...


> It was a brilliantly simple idea but this concept won't work if I am 
> looking at things correctly.  The loop for the pattern test appears to 
> test line by line.  So if a single line is less than 200 chars, you are 
> hitting the rule.

Well, that "line" actually is a full, textual MIME-part. Not a line of
the source message.

See Message.pm get_decoded_body_text_array(). That's the function
filling the @$bodytext, used in Check.pm do_rawbody_tests().

*sigh*  And that appears what breaks my rule in a very subtle way.
Because it apparently pushes a "line" containing only a \n *between* the
decoded MIME-parts, aka "lines".  Why? No effin' clue.

That means my rule above works perfectly with *single* part MIME
messages, but always triggers, if there are multiple textual MIME-parts.


A quick fix, with appropriately renamed rule name:

  rawbody __KB_MIMEPART_200  /^.{2,200}$/s

The only change to the RE is using a minimum limit of 2, rather than 0.
This ensures NOT triggering on SA injected almost-empty "lines" in the
case of multiple textual MIME-parts.

Noteworthy again is, that this rule does not work on the entire body of
textual parts, but on a per-MIME-part basis. A short, textual MIME-part
triggers the rule.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to