Re: How to train SpamAssassin to catch this kind of spam

Anthony Peacock Mon, 05 Jun 2006 07:33:30 -0700

Hi Christoph,

Christoph Reichenberger wrote:

Hi Anthony,
I order not to swamp the list with my beginner questions too much
(thanks again BTW, that you take the time for your answers), I am
writing directly to you.

Please don't do that. I occasionally reply to list emails, I don'tprovide one to one support. You will get much better answers that way,as more people will be able to offer advice, and if I get an answerwrong, someone will be likely to correct my reply. On top of that alllist traffic is archived, so any discussions we have on the list mayhelp someone else in the future.


I have replied to the list for this message.

It seems that I have a pretty standard installation.
When looking to your results file, I see that you are using many
rules, that my configuration is probably missing.
How do I enable mangled and other rules you are running (URIBL_*)
and spamcop?
Would it be better to have spamcop as a SA-rule or to enable the
check in the MTA ?


What OS are you running on?

I don't know CommuniGate Pro so the following comments are assuming astandard installation on a Linux flavour.

To enable the mangled.cf rules, go to the RulesEmporium web site anddownload the file and place it in your local SpamAssassin directory, onmy system this is /etc/mail/spamassassin then run spamassassin --lintfrom the command line to make sure there are no errors. Depending onhow you call SpamAssassin you may need to restart CommuniGate Pro or spamd.


Mangled.cf =>  http://www.rulesemporium.com/rules/mangled.cf

RulesEmporium => http://www.rulesemporium.com/

Have a look at the other rules on the Rules Emporium web site, the onesI use are:


70_sare_adult.cf
70_sare_bayes_poison_nxm.cf
70_sare_evilnum0.cf
70_sare_header0.cf
70_sare_html0.cf
70_sare_obfu0.cf
70_sare_random.cf
70_sare_specific.cf
70_sare_stocks.cf
99_sare_fraud_post25x.cf
chickenpox.cf
mangled.cf
weeds_2.cf

Read the descriptions on the web site and make your own decision abouttheir usefulness to you.

To enable the URL blacklits look in the init.pre or v310.pre files inthe same directory and make sure that the following line is uncommented:


loadplugin Mail::SpamAssassin::Plugin::URIDNSBL

I don't use SpamCop so can't comment about that. But I don't generallyswitch on RBLs at the MTA level. I am not happy trusting a RBLcompletely to block at the MTA level, I like the fact that SA allows meto build a fuller picture with a number of indicators, and that I cancheck the Junk Folder to weed out any false positives. Having said thatmy mail server is fairly low volume (~3,000 msgs per day), blocking atthe MTA would reduce the amount of resources scanning takes up on theserver.

The SA documentation said that special rules would not be necessary,
so I tried to just start with the standard configuration.

SA out of the box does a very good job. But everybodies mail feed andconcept of ham/spam is different so some tweaking will be neccessary toget the absolutely best results for any particular local configuration.

When I first set up SA I used the basic rule set and manually trainedthe Bayes database. This worked really well hitting about 85% of spamcorrectly. With some tweaking and the addition of SARE rules I now runat 99%+.

I also reset the bayes files and turned autolearn off for now.

Manually train your Bayes with at least 200 sample ham and 200 samplespam messages. Your Bayes wasn't even working for that last email. MyBayes works really well at the moment. Once you have the Bayes systemworking well manually you can then turn on auto-learning, but change theauto-learning thresholds to something like:


bayes_auto_learn_threshold_nonspam -0.1
bayes_auto_learn_threshold_spam 12.0

You can add these lines into a file called local.cf found in the/etc/mail/spamassassin directory.


Also you could do a lot worse than have a look at the wiki:

Also you

Remember, all of the advise about files and directory assumes a standardSpamAssassin install. Your installation may be slightly different.

How would you recommend to continue now?

Thanks a lot

christoph

On 05.06.2006, at 14:08, Anthony Peacock wrote:

Hi,

Christoph Reichenberger wrote:

Hi,
thank you so much for your prompt reply and for your offer to lookinto this and help me.
I have saved the full message in a text file and put it at:
  http://www.ergonis.com/downloads/public/TheSpamMessage.txt
Also, I even saw in the header that it Autolearned it as HAM - sothis may be even worse, isn't it?
Any help is highly appreciated.


OK!  I ran that through my SpamAssassin system and got the following
results:

http://www.chime.ucl.ac.uk/~rmhiajp/SA-results-20060605a.txt


This shows me that my Bayes system was 99-100% sure the message was

spam, and the message was hit by a load of network tests, and a rulefrom the mangled.cf file.


I don't use ComumniGate so I can't really advise on how to configure
your specific set up, but it looks to me as if you should switch on some
network tests and get Bayes working.

The mangled.cf file can be found here:

http://www.rulesemporium.com/rules/mangled.cf


<SNIP>

--Anthony Peacock
CHIME, Royal Free & University College Medical School
WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
"The problem with defending the purity of the English language is that
English is about as pure as a cribhouse whore. We don't just borrow
words; on occasion, English has pursued other languages down alleyways
to beat them unconscious and rifle their pockets for new
vocabulary."  -- James D. Nicoll


--Christoph Reichenberger - ergonis software gmbh



--
Anthony Peacock
CHIME, Royal Free & University College Medical School
WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
"The problem with defending the purity of the English language is that
English is about as pure as a cribhouse whore. We don't just borrow
words; on occasion, English has pursued other languages down alleyways
to beat them unconscious and rifle their pockets for new
vocabulary."  -- James D. Nicoll

Re: How to train SpamAssassin to catch this kind of spam

Reply via email to