Re: Inconsistent spam scores between spam headers and rewritten subject line.

Karsten Bräckelmann Mon, 15 Aug 2011 15:06:39 -0700

On Tue, 2011-08-16 at 01:07 +0930, Rodney Baker wrote:
> On Tue, 16 Aug 2011 00:48:13 Bowie Bailey wrote:


> > >    * ^Subject.*SPAM\([0-9]{1,3}\.[0-9]\).*
> > >    $HOME/Maildir/.Spam//
> > > 
> > > I'm attempting to filter on the modified subject line (which for some
> > > reason isn't working - that rule never seems to match and spam never
> > > gets moved into the Spam folder, even though I've tested the regex
> > > manually). I thought of filtering on the X-Spam-Status header instead,
> > > but when I had a look at a message that was marked as Spam (according to
> > > the subject line) I found something rather strange...

Yes, filtering on the SA X-Spam Status or Level headers is the way to
go. After you found and fixed where SA gets called a second time
(actually the first time), these won't be harmed and overwritten -- and
useful for filtering.

Anyway, the secret why the above procmail recipe doesn't work is simply,
because procmail uses a rather limited sub-set of REs and its own
flavor. It's not PCRE.

In particular procmail does not understand {x,y} range quantifiers, but
treats that part as a plain string to match. Which doesn't.
(Caveat: From memory, not actually looked it up again for verification.)


> > >     3.8 KB_DATE_CONTAINS_TAB   KB_DATE_CONTAINS_TAB
> > >     3.0 IMPOTENCE              BODY: Impotence cure
> > >    -0.0 BAYES_20               BODY: Bayes spam probability is 5 to 20%
> > >                                [score: 0.1050]
> > >     2.0 KB_FAKED_THE_BAT       KB_FAKED_THE_BAT
> > >     1.2 RDNS_NONE              Delivered to internal network by a host 
> > > with no
> > >                                rDNS

Oh, yeah, these do ring quite some bells... ;)

After you fixed your mail processing chain to not have SA chew twice on
the spam -- you should manually train Bayes, feeding it a lot of hand
classified spam, and possibly ham. Check your 'sa-learn --dump magic'
numbers. The Bayes score of 0.1 is way out of line.

Note though, that a previous site-wide SA filter might use a site-wide
user, not the one owning the procmail recipe. Thus Bayes scores might
suddenly change once it's run per user. Check the numbers and
performance for the user you'll use after fixing the chain issue.


> > You need to fix whatever is causing the message to be scanned twice.
> 
> OK - that makes sense. Now I'm wondering if there is a global mail config 
> somewhere that is routing the message through SA, and then my local 
> .procmailrc is doing it again. Time to go digging...

Site-wide /etc/procmailrc, SMTP server milter, transport or similar, or
even something like Amavis in the chain?

> That then leaves the question as to why my procmail recipe isn't triggering 
> on 
> the rewritten subject, but that is probably not for this list. 

It's sufficiently related. ;)  See above.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: Inconsistent spam scores between spam headers and rewritten subject line.

Reply via email to