Re: ADDRESS_IN_SUBJECT et al

Ian Turner Thu, 25 Jul 2013 05:08:54 -0700

On Thursday, July 25, 2013 05:15:19 AM Karsten Bräckelmann wrote:
> On Wed, 2013-07-24 at 21:53 -0400, Ian Turner wrote:
> > They are moderately low-scoring, sadly (I wouldn't have noticed
> > otherwise!),
> > mainly due to bayes poison. A typical message looks like this:
> Do you manually train them as spam?


Yes.

> >  -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
> >  
> >                              [score: 0.0000]
> 
> Ouch. A probability score of < 0.00005 -- which pretty much equals no
> token learned as spammy. Seriously? How often do you see "Funds" (mind
> the uppercase!) or "funds" in ham? How many of them do have that word in
> the Subject (which in addition gets treated specially by SA)?

I work in finance. We talk about funds. :-) I have quite a bit of ham with 
"Funds" or "funds" in the subject (but zip with the To: address in the 
subject).

> See where I am heading? Any chance your Bayes DB is completely borked?
>   sa-learn --dump magic

Not sure what to do with this, but here you go:0.000          0          3      
    
0  non-token data: bayes db version
0.000          0      29074          0  non-token data: nspam
0.000          0      46274          0  non-token data: nham
0.000          0     158157          0  non-token data: ntokens
0.000          0 1369590693          0  non-token data: oldest atime
0.000          0 1374752584          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync 
atime
0.000          0 1374712421          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime 
delta
0.000          0          0          0  non-token data: last expire reduction 
count

> Might be worth putting a sample or three up a pastebin of your choice,
> to see more of the text.

http://pastebin.com/8ATfK7EJ
http://pastebin.com/VMX0rEkn
http://pastebin.com/eQYUf2st

> And for further digging, which are the top hammy / spammy tokens? See
> M::SA::Conf [1], section Template Tags.

They are in the pastes in the X-Spam-JPW-Report: header.

> > Looking at the code for check_for_to_in_subject, it looks like the regular
> > expression used for LOCALPART_IN_SUBJECT is rather different (much more
> > specific) than the one used for ADDRESS_IN_SUBJECT. Presumably that's why
> > this rule doesn't match.
> > 
> > An example subject from this spam (address changed to protect the
> > innocent): <some...@example.com>_Need Approval for Fast Funds? July 24th
> > 2013_
> Do the Subjects strictly follow that pattern? Including the angle
> brackets AND the underscore? Dead easy target for a local rule to squat
> them.

They do, and I did. These spams are pretty easy to catch, they also have some 
boilerplate at the bottom of each one that is the same every time.

Cheers,

--Ian

Re: ADDRESS_IN_SUBJECT et al

Reply via email to