On Tue, 28 May 2019 15:34:06 +0200
hg user wrote:

> Fourth:
> I added a dbg statement to bayes.pm, sub tokenize, to print the
> tokens it extracts from the message.
> I agree with some, I don't with others. I'd like to know if there is
> some doc that lists why tokens are extracted this way (some notes are
> in the source code)
> I discovered that probably some words should be added to the
> stopwords list but there is no way to do it in a configuration file,
> I should modify spamassassin code directly...


The stoplist is just there to drop tokens that are deemed to be not
worth using because they are likely to be neutral. Neutral tokens
don't affect the result. 


For testing purposes I'd suggest stripping any purely internal headers,
except headers that contain envelope information as zimba may be
supplying this by other means.

If you can turn-off auto-training and clear the database, I suggest
you do that. 

Reply via email to