Re: How to train SpamAssassin to catch this kind of spam

Sander Holthaus Mon, 05 Jun 2006 05:46:08 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 
Christoph Reichenberger wrote:
> Hi Sander,
>
> thanks a lot for your analysis. Since you sent your original message
> with CC to my
> address, I even got the detailed report. Thanks a lot! But, what can
> I learn from that.
> I see that you have  Razor enabled as well as a couple of RBL servers,
> what I have not.  But even without those points, it would reach a
> high score in
> your configuration.
>
> So the question is:  What is wrong in my configuration?
> Where should I look?
>
> Another question: I had spamcop based blocking enabled in my MTA
> (Communigate Pro) before I installed SpamAssassin, but disabled it
> temporary to make sure it would not conflict. Should I enable it again
> in Communigate Pro, or as part of SpamAssassin. If your answer is
> the 2nd choice, how do I do that?
>
> BTW, I have already disabled auto-learning for now.
> Should i delete the whole database and start from scratch?
>
> thanks a lot for your help.
>
> christoph
>
For me, Razor recognizes the most spam, followed by DCC and Pyzor
(don't have up-to-date statistics, bu used to be ~ 80, 50 and 30%
recognition). IMHO SURBL are an important part of SpamAssassin as they
have a low FP-rate and high recognition (again, no up to date
statistics, but used to be ~85%).


Perhaps you could post your configuration somewhere? Getting
SpamAssasssin configuration right takes some time and depends on the
traffic you receive. In my case for example, I have a few accounts
that receive legitimate mail from Africa and Asia which at times have
poor english spelling tripping a lot of rules, hence I run with a
threshold of 10 (probably will be incresing it to 11 in near future,
depending on the results 3.1.2 gives me).

I personally use several RBL's on my MTA (virbl.dnsbl.bit.nl and
sbl-xbl.spamhaus.org) to make sure SpamAssassin and ClamAV
(virusscanner) don't get overloaded with mail from IP's that are
almost certain to be sending out junk (mind you that it may still
cause FP's), but again, your mileage may vary.  Because I already use
those two, I don't use Spamcop as an RBL in my MTA. In your case, you
can go either way (perhaps someone in this list can has some stats
about Spamcop's performance?).

Last, yes, I would remove the database and start over. A poisoned
Bayes-database will get you in trouble. Before enabling auto-learn,
make sure your spamassassin configuration performs well without Bayes.
I would also suggest to keep an eye on what is auto-learned the first
few days / weeks (depends on the traffic you receive) though that may
or may not be possible due to the privacy of your customers.

Kind Regards,
Sander Holthaus


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (MingW32)
 
iD8DBQFEhCdpVf373DysOTURAk0+AJ47VWc4+AlitvZdFEaItVplFsHnkgCgpKrP
krotmFjsnRbAsNGg7Aihyiw=
=Qb2H
-----END PGP SIGNATURE-----

Re: How to train SpamAssassin to catch this kind of spam

Reply via email to