Some time around 08/14/2004 12:27:41, I think I heard Andre Wichartz say: > Assume a word orccurs equally often in spam and non-spam mails. If you > set the value to 1 the word will get a spam propability of 0.5. If you > set it to a higher value the word will get something lower than 0.5. > Words in non-spam mails just count more and you can set just how much > more.
> At least that's my take on it. That makes sense. But do you know how the weight is calculated? I can assume it is the product of its initial probability by the "regarding threshold" value, is that true? And is it only for tokens that have the same occurrence in spam and non-spam messages, or is the weight skewed by this threshold on all tokens to give them an extra "non-spamy" umph in order to avoid false positives? Thanx dZ. -- Powered by The Bat! v.2.12.00, Hindered by MS Windows 2000 v.5.0 build 2195 Service Pack 4 ________________________________________________ Current version is 2.12.00 | 'Using TBUDL' information: http://www.silverstones.com/thebat/TBUDLInfo.html