From: "Linda Walsh" <[EMAIL PROTECTED]>


Loren Wilton wrote:

If you are only correctly classifying 50% of the spam (you said 100 caught
to 100 missed, I htink) then you have SERIOUS problems of some sort.
----
   Yeah, well, I try not to be too reactionary on computer
things like this -- especially when it could just be a
matter of flipping a config switch somewhere and things get
instantly better.  While the number of spams getting through
are significantly higher, probably 75-80% of them are duplicate
emails sent to multiple email addresses -- including some
blacklisting To-Addresses.  Apparently, the spammer isn't being
kind enough to send the spam to the black-listed To-Add'ies first
and with the new spamc client, sendmail notices the lower load
average and likely allows more parallel incoming instances to
process incoming email before a given spam gets "locked out".
I suppose this could be a "downside" of this efficiency, but
previous to this I never saw multiple instances of these
simple spams get through **undetected**.  This makes me think
it isn't just the increased efficiency causing problems as
I would have expected at least one or two duplicate spams
that wouldn't have been caught by "other means" (than being
sent to a blacklisted To-addr).

Linda, looking at your score for Bayes 99 I think you can safely
raise it if you are running a very small mail service with well
known customers. I run with it at a full 5 here. I get very few
escaped spams, perhaps 0.1% within a factor of two either way.
I do have some slightly negative scoring rules when I can determine
a message is likely legitimate by specific for me rules. So if ham
marked as BAYES_99 only about 1 in 16000 got through recently. And
I very seldom have ham coming through as spam, except from the kernel
mailing list and the FC4 list when someone posts oddly formatted
messages, usually patches or debug logs. Perhaps I might get two of
those a week out of 5000 ham emails. So this works at this site with
two people.

A quick trick would be to go to the SARE site and get their version
of sa-stats.pl. Rename it to something distinguishable from the mostly
useless sa-stats.pl that comes with spamassassin itself. Then run it
something like this: /etc/mail/spamassassin/mysa-stats.pl -f maillog*
Once you've run it look at the BAYES_99 rule. It SHOULD sit right at
the top of your top spam rules ranking:
TOP SPAM RULES FIRED
------------------------------------------------------------
RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
------------------------------------------------------------
1 BAYES_99 6630 4.41 27.24 85.98 0.01

If it does not look something like this with numbers near 20000 or more
emails incoming over the duration of all your saved logs then BAYES may
need more careful training. Note how much ham was marked with BAYES_99.
I use that to "justify" setting the score for BAYES_99 up to a full 5
after the careful inclusion of some small negative scoring rules that
can off set it slightly in special cases. In any case you might find you
can justify a nearly perfect rule, high spam catch and very low or no
ham catch (this was a bad month for ham caught by BAYES_99), being scored
high enough to mark a message as spam all by itself or nearly high enough.

One other serious hint, do NOT run this list through SpamAssassin. That
may help protect your BAYES scores from subtle shifts such as might come
if you merely have it white listed.

{^_^}

Reply via email to