Keith> Had noticed in a recent thread on training problems that there
Keith> appears to be a utility in the Outlook version of SpamBayes that
Keith> allows you to examine the clues that spambayes used on a given
Keith> message to decide how to classify it. I was wondering if there's
Keith> a Unix command-line equivalent that I can use to check to see how
Keith> a message is treated...
Just run the message through sb_filter.py with your include_evidence set to
true:
sb_filter.py -o Headers:include_evidence:True < some-mail-message
The output will be the message along with an X-SpamBayes-Evidence header
similar to this:
X-Spambayes-Evidence: '*H*': 0.69; '*S*': 0.00; 'changed': 0.05; 'owner':
0.07;
'response.': 0.09; 'dire': 0.16; 'matter.': 0.16; 'sure': 0.16;
'funds': 0.20; 'might': 0.23; 'friend,': 0.25;
'received:206': 0.25; 'source': 0.25; 'south': 0.25;
'those': 0.27; 'sending': 0.31; 'well': 0.31; 'these': 0.32;
'choose': 0.32; 'content': 0.32; 'husband': 0.32; 'mail.': 0.32;
'president': 0.32; 'help': 0.33; 'hear': 0.33; 'which': 0.35;
'would': 0.36; 'under': 0.36; 'ask': 0.37; 'him': 0.37;
'longer': 0.37; 'soon': 0.37; 'mail': 0.38; 'further': 0.39;
'content-type:text/html': 0.61; 'your': 0.62; 'becoming': 0.62;
'his': 0.62; 'personal': 0.62; 'our': 0.63; 'come': 0.65;
'friends': 0.65; 'reach': 0.65; 'best': 0.68; 'disclose': 0.84;
'earnings': 0.84; 'finances': 0.84; 'detail': 0.93
Keith> ... I have noticed some very short spam messages of late that
Keith> seem to be resistant to training-- probably simply *because* they
Keith> are so short there's not enough to go on,
Precisely. You'll see they generate very few tokens.
Skip
_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html