Re: my spamassassin has serious config problems

hg user Tue, 28 May 2019 06:41:16 -0700

I did some more research and I think I have to report the new discovery so
that the thread can be useful to other Readers.


First:
0.000          0       5232          0  non-token data: nspam
0.000          0      70408          0  non-token data: nham
0.000          0     388070          0  non-token data: ntokens
nspam and nham values are definitively the number of messages learnt.

Second:
I saw that nham increased every few seconds. I discovered that
bayes_auto_learn was enabled !
My situation yesterday:
0.000          0    1042011          0  non-token data: nspam
0.000          0      66472          0  non-token data: nham
0.000          0     663479          0  non-token data: ntokens
My situation now:
0.000          0    1042049          0  non-token data: nspam
0.000          0      71228          0  non-token data: nham
0.000          0    1040661          0  non-token data: ntokens

So, at least, I now know that the system is feeding the bayes engine with
some new data and that in this way the results can change.

Third:
in 72_active.cf there are a lot of bayes_ignore_header directives, but they
don't include the ones added by my commercial antivirus. Should I create a
patch?

Fourth:
I added a dbg statement to bayes.pm, sub tokenize, to print the tokens it
extracts from the message.
I agree with some, I don't with others. I'd like to know if there is some
doc that lists why tokens are extracted this way (some notes are in the
source code)
I discovered that probably some words should be added to the stopwords list
but there is no way to do it in a configuration file, I should modify
spamassassin code directly...



To end:
I think that the only way to proceed now is to nuke the bayes db and start
from scratch:
- setup bayes configuration correctly
- double check the corpus to be correctly classified
- run sa-learn

For the "setup bayes configuration correctly" step I accept your
contributions :-) I excluded all the headers of my antivirus and
internal/external/trusted.

Thanks
Francesco

Re: my spamassassin has serious config problems

Reply via email to