Re: newbie questions: sought, sa-learn, rule weights

Reindl Harald Sun, 18 Oct 2015 02:30:07 -0700


Am 18.10.2015 um 06:35 schrieb frede...@ofb.net:

I'm concerned that the BAYES_* rules aren't showing up in my spam
headers

you pretty sure train the wrong bayes instead the one of the user SA is running

and would like to know if there's a good way to look at the
tokens in the database


there is no way at all, stripped hashes

When I do "sa-learn --dump data", I see a file
with lines like this:

0.987          1          0 1436496897  0315e1da7f
0.016          0          1 1410284743  0320ba06ef
0.987          1          0 1393199297  0329ec4e6e
0.003          0          5 1268403253  03541effbc
0.008          0          2 1398222936  038d6e997d
0.016          0          1 1429567309  041cabf4ef
0.016          0          1 1431638107  041d441c1b

Is that normal?

yes

How do I get at the actual tokens?


you don't

How do I see how it scores a test message, just the Bayesian part?


you see BAYES_00 - BAYES_999 in the mailheaders

I find that I get a lot
of spam with exactly the same lines in the body of the message, and
the Bayesian classifier doesn't seem to register it.


as said above: you train the wrong bayes

Here's the output of sa-learn --dump magic:

0.000          0          3          0  non-token data: bayes db version
0.000          0      15466          0  non-token data: nspam
0.000          0      30317          0  non-token data: nham
0.000          0    1733267          0  non-token data: ntokens
0.000          0 1098575745          0  non-token data: oldest atime
0.000          0 1441160002          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync atime
0.000          0 1441160455          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire reduction 
count


FROM WHAT USER?

I couldn't find a sample output on your Wiki, with which to compare
this; I'm worried about the 0.000 lines and other zeroes.


they are normal

I'm also thinking that I should employ some kind of sender address
whitelisting using e.g. TxRep. Most of my spam is stuff that I'm
receiving for the first time from a particular sender, and there are a
lot of strings that I can say for sure I'd never find in a Subject
line of a message from a friend who is emailing me for the first time:
"ATTN", "stock tip"... All of the mail I send is Bcc'ed to myself, is
there a way to get Spamassassin to notice when this comes in and
automatically whitelist the recipients for me?

no need to do so and for sure you don't want it automatically, you *think* you want it - a blind whitelisting is easy to trick out with forged senders, whitelist_auth is based on DKIM/SPF precence

but tyically you don't need much whitelisting except you are a hosting provier and care about your load (combining whitelist_auth and shortcircuit)

signature.asc
Description: OpenPGP digital signature

Re: newbie questions: sought, sa-learn, rule weights

Reply via email to