On Tue, 28 May 2019 15:34:06 +0200 hg user wrote: > Fourth: > I added a dbg statement to bayes.pm, sub tokenize, to print the > tokens it extracts from the message. > I agree with some, I don't with others. I'd like to know if there is > some doc that lists why tokens are extracted this way (some notes are > in the source code) > I discovered that probably some words should be added to the > stopwords list but there is no way to do it in a configuration file, > I should modify spamassassin code directly...
The stoplist is just there to drop tokens that are deemed to be not worth using because they are likely to be neutral. Neutral tokens don't affect the result. For testing purposes I'd suggest stripping any purely internal headers, except headers that contain envelope information as zimba may be supplying this by other means. If you can turn-off auto-training and clear the database, I suggest you do that.