On Thursday, July 25, 2013 05:15:19 AM Karsten Bräckelmann wrote: > On Wed, 2013-07-24 at 21:53 -0400, Ian Turner wrote: > > They are moderately low-scoring, sadly (I wouldn't have noticed > > otherwise!), > > mainly due to bayes poison. A typical message looks like this: > Do you manually train them as spam?
Yes. > > -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% > > > > [score: 0.0000] > > Ouch. A probability score of < 0.00005 -- which pretty much equals no > token learned as spammy. Seriously? How often do you see "Funds" (mind > the uppercase!) or "funds" in ham? How many of them do have that word in > the Subject (which in addition gets treated specially by SA)? I work in finance. We talk about funds. :-) I have quite a bit of ham with "Funds" or "funds" in the subject (but zip with the To: address in the subject). > See where I am heading? Any chance your Bayes DB is completely borked? > sa-learn --dump magic Not sure what to do with this, but here you go:0.000 0 3 0 non-token data: bayes db version 0.000 0 29074 0 non-token data: nspam 0.000 0 46274 0 non-token data: nham 0.000 0 158157 0 non-token data: ntokens 0.000 0 1369590693 0 non-token data: oldest atime 0.000 0 1374752584 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 1374712421 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count > Might be worth putting a sample or three up a pastebin of your choice, > to see more of the text. http://pastebin.com/8ATfK7EJ http://pastebin.com/VMX0rEkn http://pastebin.com/eQYUf2st > And for further digging, which are the top hammy / spammy tokens? See > M::SA::Conf [1], section Template Tags. They are in the pastes in the X-Spam-JPW-Report: header. > > Looking at the code for check_for_to_in_subject, it looks like the regular > > expression used for LOCALPART_IN_SUBJECT is rather different (much more > > specific) than the one used for ADDRESS_IN_SUBJECT. Presumably that's why > > this rule doesn't match. > > > > An example subject from this spam (address changed to protect the > > innocent): <some...@example.com>_Need Approval for Fast Funds? July 24th > > 2013_ > Do the Subjects strictly follow that pattern? Including the angle > brackets AND the underscore? Dead easy target for a local rule to squat > them. They do, and I did. These spams are pretty easy to catch, they also have some boilerplate at the bottom of each one that is the same every time. Cheers, --Ian