Am 28.01.2015 um 16:52 schrieb Axb:
On 01/28/2015 04:38 PM, Reindl Harald wrote:is AFAIK relevant in context of sa-learn to not re-train the same messages again and again - and it has it's own bugs becaue for a few messages it contains random parts of the message itself, fire sa-learn on the whole corpus would add these messages each time to "bayes_toks" see two example snippets below hence it is that large here -rw------- 1 sa-milt sa-milt 5,4K 2015-01-28 16:34 bayes_journal -rw------- 1 sa-milt sa-milt 1,3M 2015-01-28 16:12 bayes_seen -rw------- 1 sa-milt sa-milt 40M 2015-01-28 16:33 bayes_toks -rw------- 1 sa-milt sa-milt 98 2014-08-21 17:47 user_prefs _________________________________________________something here does NOT make sense 1.3 MB of seen against 40MB tokens. someone please correct me if I'm wrong: afaik, this probably means you've deleted bayes_seen so bayes has lost it's record of what it has processed so it will relearn stuff you already fed it.
no, i explained what happens in the part you stripped from the quote - it contains randomly complete message parts independent how often i delete *any file* in the userhome and rebuild from scratch
if i delete "bayes_seen" than it happens by a complete reset with sa-learn.sh using sa-learn to *rebuild from scratch* based on the forever stored raw-mails in the folders "ham" and "spam"
signature.asc
Description: OpenPGP digital signature