jdow <[EMAIL PROTECTED]> wrote: > After watching the Bayes filter "learn" to auto white list > spam when first installed I disabled the auto white list > feature and explicitly generated lists if ham and spam.
AWL works well for me, but that may have been due a combination of add-on rules and luck. I've left it enabled, but scoring of spam has swung to such extremes (a good thing) thanks to bayes and other rules that it really hasn't impacted things much one way or the other lately. It does seem most of the auto-whitelist options are now missing from the manpage (Mail::SpamAssassin::Conf) so perhaps they've been deprecated as of late? (Must search archives.) > When the Bayes filter kicked in after it had accumulated a couple > hundred ham and spam messages the results were dramatic. I learned my lesson and have begun storing a collection of 'borderline' spam for training purposes. Thankfully, I had bayes trained before some of the more clever spams began to hit, so non have gotten through lately, depite all their attempts. > Before then it was somewhat discouraging. I do believe I shall > leave automatic learning and white listing turned off because > it seems to false entirely too often for my tastes. Now that I've read the latest manpage, I'm not really sure WHAT AWL is doing in my case. I do see AWL score adjustments, but they tend to be slight... at least in comparison to the massive scores most spam gets. Unless I'm mistaken, unless spammers have forged addresses from real people I get good messages from, AWL should NOT result in false positives. > (The concept also seems a little strange. If it already knows it's > spam then train it that the message is spam. I'd rather teach > it with the new spam that is not found than simply rack up > higher scores by training it that material it knows is spam is > indeed spam. What am I missing here?) I think there's a difference between auto-whitelist (AWL) -- based on sender -- and bayes_auto, which trains on content. AWL makes good sense... especially for messages from my good friend that occasionally forwards spammy stuff of interest. I've left the defaults for bayes_auto (to autolearn high-scoring spam), but I do augment it with training from my corpus of about 1,000 low-scoring spams that I verified by hand, and the (infrequent) false negative. I think the reason for bayes auto-learning being useful is that the words in spam that DIDN'T trip the score get added as well. If those same words appear commonly in non-spam, they cancel out. But as was pointed out recently, if spammers use random dictionary words that DON'T appear in non-spam, that itself is a hint that it might be spammy. It adds to the "smell" of spam, which is why I think bayes has been so effective at catching the random-word spams that bypass so many rudimentary filters. Then again, this may simply be an indicator that I subscribe to low-brow lists. :) - Bob
