Am 25.09.2014 um 17:24 schrieb Amir Caspi: > On Sep 25, 2014, at 8:51 AM, John Hardin <jhar...@impsec.org> wrote: >> >> You *did* keep your initial Bayes training corpora, right? > > Does it matter if you keep the initial corpora, or just that you train on > known corpora, even if they are "fluid?"
yes because you can remove questionable messages, reset the bayes and start again my train data are two folders with eml messages and if it turns out that the bayes no longer works good a possible reason is that you have too much neutralized tokens since all eml-files are named by "date-number.eml" i could try to move the oldest year out of the folder, reset and rebuild within seconds well, and you can do a fulltext search if you have a clue which messages better not have been trained and rebuild the same way after remove them - recently i noticed that brand new messages trained as HAM to avoid them get marked as spam for a specific user turned out to pass 3 clear spam messages within the next 10 minutes to myself - files deleted, rebuild, fine what i never would like to do is reset bayes and start by zero since i watched the filter quality dramatically improve compared have 200, 1000 and currently 1500 spam/ham examples
signature.asc
Description: OpenPGP digital signature