On Wed, April 16, 2008 19:34, David wrote: > > Am getting loads of spam with cyrillic characters and would like to know > if > Spambayes can automatically delete anything with these characters in their > headers. Below is score info for typical one. If you need it, could send > you the config file if you can tell me where to find it. > > Kindest regards > David Kanareck > > > > > > Combined Score: 57% (0.567348) > > Internal ham score (*H*): 0.285187 > Internal spam score (*S*): 0.419882 >
> # ham trained on: 39 > # spam trained on: 76 That is not much training. In my experience, Spambayes gets *extremely* accurate after about 100 hams and 100 spams. Your mileage may vary. With the Outlook plugin, I add a column that shows the spam score (see FAQ/wiki for details). I sort on spam score. I look at the bottom and find one spam with the lowest score. Train as spam. Rescore inbox. Now I look at the top, and find one ham with the highest score. Train as ham, rescore. Back to the lowest spam, rescore. Highest ham, rescore. Lather, rince, repeat. Very quickly you will see that all spam scores above 99% and all ham scores below 1%. This method of training is so kewl that I have actually considered installing Outlook on Linux, just so that I could train Spambayes this way. > 'message.' 0.310872 15 13 > > 'date:' 0.325631 14 13 > > 'checked' 0.341867 13 13 > > 'database:' 0.341867 13 13 > > 'incoming' 0.341867 13 13 > > 'version:' 0.341867 13 13 > > 'virus' 0.35698 14 15 > > 'release' 0.358294 13 14 > > 'avg.' 0.359817 12 13 > > 'skip:2 10' 0.359817 12 13 > > 'found' 0.385564 14 17 These are generic tokens added by your virus scanner. After more training they will score around .5 which means they will neither increase nor decrease the global spam score of a message. > 'to:no real name:2**0' 0.750084 10 59 > > 'header:Received:1' 0.893006 1 18 Interesting tokens... > 'from:charset:koi8-r' 0.908163 0 2 > > 'subjectcharset:koi8-r' 0.908163 0 2 And those last two are *really* interesting tokens! Keep on training, I can already see that your Spambayes is improving. -- Amedee Van Gasse [EMAIL PROTECTED] Disclaimer: By sending an email to ANY of my addresses you are agreeing that: 1. I am by definition, "the intended recipient" 2. All information in the email is mine to do with as I see fit and make such financial profit, political mileage, or good joke as it lends itself to. In particular, I may quote it on usenet. 3. I may take the contents as representing the views of your company. 4. This overrides any disclaimer or statement of confidentiality that may be included on your message. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
