On Tue, 2011-08-16 at 01:07 +0930, Rodney Baker wrote: > On Tue, 16 Aug 2011 00:48:13 Bowie Bailey wrote:
> > > * ^Subject.*SPAM\([0-9]{1,3}\.[0-9]\).* > > > $HOME/Maildir/.Spam// > > > > > > I'm attempting to filter on the modified subject line (which for some > > > reason isn't working - that rule never seems to match and spam never > > > gets moved into the Spam folder, even though I've tested the regex > > > manually). I thought of filtering on the X-Spam-Status header instead, > > > but when I had a look at a message that was marked as Spam (according to > > > the subject line) I found something rather strange... Yes, filtering on the SA X-Spam Status or Level headers is the way to go. After you found and fixed where SA gets called a second time (actually the first time), these won't be harmed and overwritten -- and useful for filtering. Anyway, the secret why the above procmail recipe doesn't work is simply, because procmail uses a rather limited sub-set of REs and its own flavor. It's not PCRE. In particular procmail does not understand {x,y} range quantifiers, but treats that part as a plain string to match. Which doesn't. (Caveat: From memory, not actually looked it up again for verification.) > > > 3.8 KB_DATE_CONTAINS_TAB KB_DATE_CONTAINS_TAB > > > 3.0 IMPOTENCE BODY: Impotence cure > > > -0.0 BAYES_20 BODY: Bayes spam probability is 5 to 20% > > > [score: 0.1050] > > > 2.0 KB_FAKED_THE_BAT KB_FAKED_THE_BAT > > > 1.2 RDNS_NONE Delivered to internal network by a host > > > with no > > > rDNS Oh, yeah, these do ring quite some bells... ;) After you fixed your mail processing chain to not have SA chew twice on the spam -- you should manually train Bayes, feeding it a lot of hand classified spam, and possibly ham. Check your 'sa-learn --dump magic' numbers. The Bayes score of 0.1 is way out of line. Note though, that a previous site-wide SA filter might use a site-wide user, not the one owning the procmail recipe. Thus Bayes scores might suddenly change once it's run per user. Check the numbers and performance for the user you'll use after fixing the chain issue. > > You need to fix whatever is causing the message to be scanned twice. > > OK - that makes sense. Now I'm wondering if there is a global mail config > somewhere that is routing the message through SA, and then my local > .procmailrc is doing it again. Time to go digging... Site-wide /etc/procmailrc, SMTP server milter, transport or similar, or even something like Amavis in the chain? > That then leaves the question as to why my procmail recipe isn't triggering > on > the rewritten subject, but that is probably not for this list. It's sufficiently related. ;) See above. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}