Am 25.09.2014 um 17:24 schrieb Amir Caspi:
> On Sep 25, 2014, at 8:51 AM, John Hardin <jhar...@impsec.org> wrote:
>>
>> You *did* keep your initial Bayes training corpora, right?
> 
> Does it matter if you keep the initial corpora, or just that you train on 
> known corpora, even if they are "fluid?"

yes because you can remove questionable messages, reset the bayes and start 
again

my train data are two folders with eml messages and if it turns
out that the bayes no longer works good a possible reason is
that you have too much neutralized tokens

since all eml-files are named by "date-number.eml" i could try to
move the oldest year out of the folder, reset and rebuild within
seconds

well, and you can do a fulltext search if you have a clue which
messages better not have been trained and rebuild the same way
after remove them - recently i noticed that brand new messages
trained as HAM to avoid them get marked as spam for a specific
user turned out to pass 3 clear spam messages within the next
10 minutes to myself - files deleted, rebuild, fine

what i never would like to do is reset bayes and
start by zero since i watched the filter quality
dramatically improve compared have 200, 1000 and
currently 1500 spam/ham examples

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to