Some time around 08/14/2004 12:27:41, I think I heard Andre Wichartz say:
> Assume a word orccurs equally often in spam and non-spam mails. If you
> set the value to 1 the word will get a spam propability of 0.5. If you
> set it to a higher value the word will get something lower than 0.5.
> Words in non-spam mails just count more and you can set just how much
> more.

> At least that's my take on it.

That makes sense.  But do you know how the weight is calculated? I can assume it is 
the product of its initial probability by the "regarding threshold" value, is that 
true?  And is it only for tokens that have the same occurrence in spam and non-spam 
messages, or is the weight skewed by this threshold on all tokens to give them an 
extra "non-spamy" umph in order to avoid false positives?

        Thanx
        dZ.

-- 
Powered by The Bat! v.2.12.00,
  Hindered by MS Windows 2000 v.5.0 build 2195 Service Pack 4


________________________________________________
Current version is 2.12.00 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html

Reply via email to