Re: [SAtalk] how to change the bayes auto_learn threshold to zero or above?

Brett Dikeman Thu, 29 Jan 2004 09:20:34 -0800

Martin Radford wrote:

It might be because you get the occasional false positive that you
want to avoid (but all the rest come under your threshold).  You
probably would want these autolearned as ham.

Actually, at the moment the bayes engine thinks 99% of the messages going through it are spam, simply because it's auto-learning spam messages but never auto-learning ham...because messages never get negative scores.

Or it might be because the messages are from a mailing list like this
one, where the messages may well contain extracts from spam.  In this
case you positively *don't* want to autolearn them as ham, because
it'll adversely affect the Bayes database's training.

I read this in the archives while researching the problem before asking the list(Gasp, yes! A user did research before posting!) and I think it's such an obscure problem, that it doesn't affect us. In our specific use, in fact- this particular circumstance will never, ever, ever happen; nobody forwards spam to this particular mail server(but it does get deluges of spam on its own). As for the people on SA-user getting copies of spams...believe it or not, Spamassassin users vastly- and I do mean vastly- outnumber spamassassin-* list members. I also don't know many people that forward each other spams(at least not people that want to keep their friends).

Further- as I recall last time someone said "what if a spam gets quoted on -list", several people on the SA list pointed out that even on the SA list, such occurances are rare.

So yes- I think your argument is rather obscure and moot for 99% of your users.

Did you consider that the occasional spam auto-learned as ham really isn't that bad, if you're auto-learning many more legitimate messages? SA tends to grossly tip the scales towards auto-learning spam versus ham, all for the sake of not accidentally learning a rather theoretical(for most users) case. Left to its own devices, the bayes engine will eventually mark more and more messages as spam, and the engine becomes completely useless- which is much worse than a slight inaccuracy from the occasional spam that gets auto-learned as ham.

Developers are always well-meaning when they institute rules(that cannot be overridden) to address specific circumstances. However, these little rules often end up causing a lot of people a lot of grief and solving a problem that really wasn't that big in the first place. It's like not giving your dinner guests steak knives because there's the slim chance they might poke themselves in the ear with it. Yeah, your dinner guests will be safe- but they're going to have a hell of a time enjoying the steak with that butter knife you gave them instead. Another example would be the infamous crash involving that Airbus plane that overrode the pilot's command for more throttle. The computer(and its programmers) had good intentions, but failed to realize that in the end, there has to be a magic red button somewhere that puts someone with situational knowledge back in control of things. SA has many such restrictions and few Magic Red Buttons.

In several cases, spamassassin assumes it knows better than I do, and overrides my config directives(and further, doesn't warn me it's doing so). If you want to warn me in the install/config/whatever docs that "turning on auto-learning of messages above X score" or "turning on auto-learning of whitelisted messages is dangerous", fine. So be it. Some people might not instantly realize the implication. But give us the OPTION of doing it.

So here's my suggestion, and it's two-part:

a)strip the min+max limit controls from the two auto-learn params. If I want to be a moron and set my auto-learn-spam to 2(ie, below the magic number "6"), that's my bloody business, not yours ;-)

b)add a auto_learn_whitelist, and have a couple of options. Off(nothin'), auto(ie let bayes auto-learn messages that were auto-whitelisted), manual(ie config-file whitelist rules) and all(both auto and manual, mwuaha). Ok, so they're not intelligently named, but that combo will make just about anybody happy.

Make the default 'off' if you REALLY, really think the whole subversive-spam thing is a problem for the MAJORITY OF YOUR USERS. Chances are "manual" is the next-safest option, since generally users have to be smarter than the average bear to set up their own rules(or their admins had good reasons for adding global rules- as I did on our system, whitelisting our biggest customers). Auto and Both would be the least safest.

You could work around the problem by creating your own rules to
identify these messages, and give them a negative score.

The messages in question have no common element. They come from virtually anyone; in most cases, they're inititated by the user out of the blue, so we can't even inject headers or taglines to look for later.

Brett


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] how to change the bayes auto_learn threshold to zero or above?

Reply via email to