From: Troy Settle <[EMAIL PROTECTED]>
   Date: Mon, 10 Nov 2008 11:30:27 -0500
   
   I received a piece of junkmail this morning:  
   http://home.psknet.com/troy/1.txt
   
   In the spam report, I see this:  BAYES_00=-2.599
   
   So, I run it through sa-learn with --spam:
   
   Learned tokens from 1 message(s) (1 message(s) examined)
   
   Then, I re-scan it using spamc, and still I get:
   
   BAYES_00=-2.599
   
   What gives?  I don't expect the total score to come up much, but the 
   bayes should at least go from a negative number to a positive number... 
   shouldn't it?

The answer Depends on how many tokens bayes is looking at and how
spammy those tokens are.  You can see what bayes thinks about each
token with --debug output.  I get BAYES_40 on your message.

   % wget http://home.psknet.com/troy/1.txt
   % spamassassin -D --test-mode --debug all,bayes < 1.txt  2>&1 | grep bayes:
   ...
   [14389] dbg: bayes: corpus size: nspam = 426975, nham = 53737
   [14389] dbg: bayes: token 'Dodge' => 0.999612090680101
   [14389] dbg: bayes: token 'sincerely' => 0.999492864983535
   [14389] dbg: bayes: token 'decode' => 0.0344385308520192
   [14389] dbg: bayes: token 'I'll' => 0.0365668821340277
   [14389] dbg: bayes: token 'Perspective' => 0.0404549158471554
   ...
   [14389] dbg: bayes: score = 0.310353325094371

After you learn a message as spam the numbers and raw score should
increase somewhat depending on how many times that token has been
seen.  I get BAYES_60 on the message after learning.

   % sa-learn --spam < 1.txt
   % sa-learn --sync
   % spamassassin -D --test-mode --debug all,bayes < 1.txt 2>&1 | grep bayes:
   [14618] dbg: bayes: corpus size: nspam = 426990, nham = 53737
   ...
   [14618] dbg: bayes: token 'Dodge' => 0.999615320566195
   [14618] dbg: bayes: token 'sincerely' => 0.999498371335505
   [14618] dbg: bayes: token 'decode' => 0.0348456512323892
   [14618] dbg: bayes: token 'I'll' => 0.0366062570517363
   [14618] dbg: bayes: token 'Perspective' => 0.0670493467695761
   ...
   [14618] dbg: bayes: token 'omaha' => 0.958
   [14618] dbg: bayes: token 'elsasser' => 0.958
   [14618] dbg: bayes: token 'riders' => 0.958
   ...
   [14618] dbg: bayes: score = 0.659988861825694


-jeff

Reply via email to