On Tue, 16 Aug 2011 07:36:05 Karsten Bräckelmann wrote: > On Tue, 2011-08-16 at 01:07 +0930, Rodney Baker wrote: > > On Tue, 16 Aug 2011 00:48:13 Bowie Bailey wrote: > > > > * ^Subject.*SPAM\([0-9]{1,3}\.[0-9]\).* > > > > $HOME/Maildir/.Spam// > > > > > > > > I'm attempting to filter on the modified subject line (which for some > > > > reason isn't working - that rule never seems to match and spam never > > > > gets moved into the Spam folder, even though I've tested the regex > > > > manually). I thought of filtering on the X-Spam-Status header > > > > instead, but when I had a look at a message that was marked as Spam > > > > (according to the subject line) I found something rather strange... > > Yes, filtering on the SA X-Spam Status or Level headers is the way to > go. After you found and fixed where SA gets called a second time > (actually the first time), these won't be harmed and overwritten -- and > useful for filtering. > > Anyway, the secret why the above procmail recipe doesn't work is simply, > because procmail uses a rather limited sub-set of REs and its own > flavor. It's not PCRE. > > In particular procmail does not understand {x,y} range quantifiers, but > treats that part as a plain string to match. Which doesn't. > (Caveat: From memory, not actually looked it up again for verification.)
Ah, thankyou. Despite googling for lots of stuff on procmail I've not been able to find a definitive reference for what can and can't be used in a procmail recipe. Maybe I just haven't use the right search terms (or maybe I just haven't understood what I've read). Anyway, thanks for the clarification. > > > > > 3.8 KB_DATE_CONTAINS_TAB KB_DATE_CONTAINS_TAB > > > > 3.0 IMPOTENCE BODY: Impotence cure > > > > > > > > -0.0 BAYES_20 BODY: Bayes spam probability is 5 to > > > > 20% > > > > > > > > [score: 0.1050] > > > > > > > > 2.0 KB_FAKED_THE_BAT KB_FAKED_THE_BAT > > > > 1.2 RDNS_NONE Delivered to internal network by a > > > > host with no > > > > > > > > rDNS > > Oh, yeah, these do ring quite some bells... ;) > > After you fixed your mail processing chain to not have SA chew twice on > the spam -- you should manually train Bayes, feeding it a lot of hand > classified spam, and possibly ham. Check your 'sa-learn --dump magic' > numbers. The Bayes score of 0.1 is way out of line. Agreed. I do run sa-learn --spam (actually now have it scheduled to run weekly on a folder into which I drop all the non-classified spam messages) and --ham (on a folder with messages that were false-positives). > > Note though, that a previous site-wide SA filter might use a site-wide > user, not the one owning the procmail recipe. Thus Bayes scores might > suddenly change once it's run per user. Check the numbers and > performance for the user you'll use after fixing the chain issue. > > > > You need to fix whatever is causing the message to be scanned twice. > > > > OK - that makes sense. Now I'm wondering if there is a global mail config > > somewhere that is routing the message through SA, and then my local > > .procmailrc is doing it again. Time to go digging... > > Site-wide /etc/procmailrc, SMTP server milter, transport or similar, or > even something like Amavis in the chain? There is no /etc/procmailrc, no milter that I'm aware of, running fetchmail/sendmail/dovecot. This machine doubles as my home mail server/file server and desktop machine. The only reason I'm running IMAP is so that I can access the same mail from my laptop or netbook when I need to (and I used to run squirrelmail to allow access remotely via https webmail, but not any more). > > > That then leaves the question as to why my procmail recipe isn't > > triggering on the rewritten subject, but that is probably not for this > > list. > > It's sufficiently related. ;) See above. Thanks again. :-) -- ====================================================== Rodney Baker rod...@jeremiah31-10.net web: www.jeremiah31-10.net ======================================================