On Wed, 2 Jul 2014, Steve Bergman wrote:
Well... I just turned on autolearn for a moment, deleted the bayes_* files on
the test account I use, and sent myself a message from my usual outside
account. And new bayes_* files were created. So I was wrong, and I win. More
options.
So now I can proceed to the "what does this mean?" phase.
If I leave things as they are, then training is perfect if the users are
diligent. But if they are not, then... what? I see plenty of spams getting
through with a 0.0 score. IIRC, the autolearn spam threshold is 7? Pretty
much everything there is spam.
But I'm not sure I quite buy having the static rules of SA training Bayes.
Isn't Bayes just learning to emulate the static rules, with all their
imperfections?
Unless you've explicitly disabled them, the network based rules (razor,
pyzor, dcc, DNS based rules, RBLs, URIBLs, etc) constitute an external
'reputation' system to pass judgment on messages.
It's not uncommon to take a low-scoring spam and find that it gets a
higher score on retest as it has been added to various bad-boy lists.
This is also one way that gray-listing helps. If you stiff-arm the first
pass of a spam run a later check may hit it more accurately as it's been
added to block-lists in the mean-time.
If it starts going wrong, doesn't that mean the errors are going to spiral
out of control?
That is a possible risk of relying solely on auto-learning.
The autolearn system has been carefully crafted and tuned over the years
to try to prevent a feed-back loop from throwing it into a tail-spin.
For example the internal scoring system used to determine if a message
is spam or ham WRT the choice for auto-learning explicitly excludes
the Bayes score (and other particular kinds of scores such as white/black
lists) to try to prevent tail-eating.
Occasional judicious manual learning can help to 'tweak' things when Bayes
looks like it's not in top shape. (IE manual learning of FPs & FNs).
I've used site-wide Bayes with auto-learning at a site with ~3000 users
and have had to flush & restart our Bayes database twice in 10 years.
Dave
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{