On 08/12/03 09:07 AM, Bart Schaefer sat at the `puter and typed: > On Tue, 12 Aug 2003, Louis LeBlanc wrote: > > > On 08/12/03 01:12 PM, [EMAIL PROTECTED] sat at the `puter and typed: > > > > > > Would it be wise to sa-learn that message as ham? > > > > Nope. > > Eh? Of course it would be wise to learn the message as ham. The more > data the classifier has, the more accurate it becomes, assuming that the > input is correct -- that is, that you're not telling it spam is ham and > vice-versa. Ideally you'd train it on every message you receive.
Yeah, and the next time a *real* spammer sends him a carefully worded ad for Vigorex, his bayes db will have learned it as ham. That particular message will almost certainly never pass through his system again, so why use the content to train bayes? > There's a strong tendency to make value judgements ("oh, this is ham, but > it looks so spammy I'd better not feed it to sa-learn, it'll just confuse > the poor thing"). It doesn't work like that. Although I could be wrong, I respectfully disagree. Unless I'm mistaken, the tokens will be used to reduce or increase their tendency to indicate spam. Bayes will not learn from this message that it's ok to get erectile dysfunction in a message so long as it comes from this sender AND is accompanied by text referring to lower interest rates. So you really do want to play the numbers game sometimes. Personally, I train bayes every night on two of my inboxes and my spam folder. If this message got hit as a false positive, I'd probably put it into the inbox to be trained as ham too, but I'd do it with full knowledge that I may see an increase in penis pill ads slipping through. On the other hand, I might just read it and delete it, then whitelist the sender. After all, this is exactly the scenario that whitelist feature was added for. Whitelisting the sender ensures that whatever this newsletter contains, it will not be tagged as spam in the future. The increased score due to the whitelist hit does not induce an autolearn (and if I'm not mistaken, it will actually prevent autolearning - at least it should), it just indicates that the message is not spam. And you HAVE to make value judgements. Keep in mind that the bayes classifier is a PROGRAM, and it has no real ability to make fool proof judgements. It makes a best guess based on the info it is fed, and no matter how good the program gets, until we get true AI checking our email for spam, garbage in == garbage out. Never give your program data that will decrease its accuracy, just make allowances for exceptions, like the SA developers did when they added a whitelist feature in the first place. Lou -- Louis LeBlanc [EMAIL PROTECTED] Fully Funded Hobbyist, KeySlapper Extrordinaire :) http://www.keyslapper.org ԿԬ Research is what I'm doing when I don't know what I'm doing. -- Wernher von Braun ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk