Re: Inconsistent spam scores between spam headers and rewritten subject line.

Rodney Baker Tue, 16 Aug 2011 05:56:21 -0700

On Tue, 16 Aug 2011 07:36:05 Karsten Bräckelmann wrote:
> On Tue, 2011-08-16 at 01:07 +0930, Rodney Baker wrote:
> > On Tue, 16 Aug 2011 00:48:13 Bowie Bailey wrote:
> > > >    * ^Subject.*SPAM\([0-9]{1,3}\.[0-9]\).*
> > > >    $HOME/Maildir/.Spam//
> > > > 
> > > > I'm attempting to filter on the modified subject line (which for some
> > > > reason isn't working - that rule never seems to match and spam never
> > > > gets moved into the Spam folder, even though I've tested the regex
> > > > manually). I thought of filtering on the X-Spam-Status header
> > > > instead, but when I had a look at a message that was marked as Spam
> > > > (according to the subject line) I found something rather strange...
> 
> Yes, filtering on the SA X-Spam Status or Level headers is the way to
> go. After you found and fixed where SA gets called a second time
> (actually the first time), these won't be harmed and overwritten -- and
> useful for filtering.
> 
> Anyway, the secret why the above procmail recipe doesn't work is simply,
> because procmail uses a rather limited sub-set of REs and its own
> flavor. It's not PCRE.
> 
> In particular procmail does not understand {x,y} range quantifiers, but
> treats that part as a plain string to match. Which doesn't.
> (Caveat: From memory, not actually looked it up again for verification.)


Ah, thankyou. Despite googling for lots of stuff on procmail I've not been 
able to find a definitive reference for what can and can't be used in a 
procmail recipe. Maybe I just haven't use the right search terms (or maybe I 
just haven't understood what I've read). Anyway, thanks for the clarification.

> 
> > > >     3.8 KB_DATE_CONTAINS_TAB   KB_DATE_CONTAINS_TAB
> > > >     3.0 IMPOTENCE              BODY: Impotence cure
> > > >    
> > > >    -0.0 BAYES_20               BODY: Bayes spam probability is 5 to
> > > >    20%
> > > >    
> > > >                                [score: 0.1050]
> > > >     
> > > >     2.0 KB_FAKED_THE_BAT       KB_FAKED_THE_BAT
> > > >     1.2 RDNS_NONE              Delivered to internal network by a
> > > >     host with no
> > > >     
> > > >                                rDNS
> 
> Oh, yeah, these do ring quite some bells... ;)
> 
> After you fixed your mail processing chain to not have SA chew twice on
> the spam -- you should manually train Bayes, feeding it a lot of hand
> classified spam, and possibly ham. Check your 'sa-learn --dump magic'
> numbers. The Bayes score of 0.1 is way out of line.

Agreed. I do run sa-learn --spam (actually now have it scheduled to run weekly 
on a folder into which I drop all the non-classified spam messages) and --ham 
(on a folder with messages that were false-positives).
 
> 
> Note though, that a previous site-wide SA filter might use a site-wide
> user, not the one owning the procmail recipe. Thus Bayes scores might
> suddenly change once it's run per user. Check the numbers and
> performance for the user you'll use after fixing the chain issue.
> 
> > > You need to fix whatever is causing the message to be scanned twice.
> > 
> > OK - that makes sense. Now I'm wondering if there is a global mail config
> > somewhere that is routing the message through SA, and then my local
> > .procmailrc is doing it again. Time to go digging...
> 
> Site-wide /etc/procmailrc, SMTP server milter, transport or similar, or
> even something like Amavis in the chain?

There is no /etc/procmailrc, no milter that I'm aware of, running 
fetchmail/sendmail/dovecot. This machine doubles as my home mail server/file 
server and desktop machine. The only reason I'm running IMAP is so that I can 
access the same mail from my laptop or netbook when I need to (and I used to 
run squirrelmail to allow access remotely via https webmail, but not any 
more).
 
> 
> > That then leaves the question as to why my procmail recipe isn't
> > triggering on the rewritten subject, but that is probably not for this
> > list.
> 
> It's sufficiently related. ;)  See above.

Thanks again. :-)

-- 
======================================================
Rodney Baker
rod...@jeremiah31-10.net
web: www.jeremiah31-10.net
======================================================

Re: Inconsistent spam scores between spam headers and rewritten subject line.

Reply via email to