-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 16/10/11 18:45, [email protected] wrote: > >>> 2. When I train over a message, I keep training in a loop until >>> the message probability goes under 20% (ham) or over 90% >>> (spam). As the database ages, training spam needs more >>> "looping", that is, the probability goes up slowly. The ham >>> training, nevertheless, is fast and the loop counting is low. > > Jesus> Uhm, the wiki says: "never train the same message Jesus> > twice". Reason?. I am breaking this badly. > > Jesus, > > I use train to exhaustion as referenced in your other email > (contrib/tte.py in the SpamBayes distribution). I currently have > 21 hams and 17 spams in my current training database. I suggest > you just toss out everything but the most recent 10-15 hams and > spams then start with that. > > I cheat as well, since both my pobox.com mail forwarding service > and Gmail (where it forwards to) apply their own spam filters > before SpamBayes gets a crack at my mail. The downside of that is > that I need to scan their held spams periodically.
Thanks for your reply, Skip, but you don't address any of my concerns :-): 1. Do not train with the same message twice, 2. Keep spam/ham balanced, 3. Is normal that "training" can slowly degrade the quality?, and if so, what people do about it (beside deleting the DB and retrain again with recent samples). I think that 1&2 are related to the bayes asumption about independent samples. But the code is abusing bayes so badly that breaking this condition is actually irrelevant in our context :-). BTW, what are the changes between 1.1a4 (my version) and 1.1a6?. I can't find an updated CHANGELOG... - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ [email protected] - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:[email protected] _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTpw1Kplgi5GaxT1NAQLlbQP/RxagFrvQcmWpz54cku6GR2KLkZByS54E 1ArPp92RlarYEaB0fUhn1D8JBbIOgwPHT65sE1p94mh18D7NxIVsJdUW4Ay9ZnR7 62CttlHFBMynv7xJGSzZ8d4OECwIqSobNqUYZgRLEwdKOvT/uak1t3DXW2o8xpRD swfOemBzEtI= =98ok -----END PGP SIGNATURE----- _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
