Hi,

Sander Holthaus wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Christoph Reichenberger wrote:
Hi,

I am pretty new to SpamAssassin, so I apologize, if this has been
posted in the past. I have SA integrated in Communigate Pro with
CGPSA, and it has already started to filter out a lot of spam
messages right out of the box. However, I am still a bit unsure
about how to train it.

I get a lot of spam messages like that, which SA does not recognize
 as spam. (this one, e..g, got score 0.0)
================================================

Here I wanted to add the original mail, but when I did so, my
posting was rejected from the list server with 552 spam score
(19.7) exceeded threshold. So this is on the one hand good news,
since I now know, that SA can be trained to catch it, but how can I
tell you how this spam message looks like. I'll try to describe:
The first part contains 8 lines, holding the names of the pills
that help male human beings ... ;-)  - you know? Then follows a
line with " all 50 % off"  and an URL where to order, and  this is
followed by  a paragraph with pretty "normal" text. I try to keep
this text here, since I assume this should not trigger the score
anyhow.

From the original mail:

"If ever you are passing my way, said Bilbo, dont wait to knock!
Tea is at four; but any of you are welcome at any time! Then he
turned away. The elf-host was on the march;. and if it was sadly
lessened, yet many were glad, for now the northern world would be
merrier for many a long day. The dragon was dead, and the goblins
overthrown, and their"

=================================================

I am now wondering, whether there is even a chance for SA to catch
this, if I train this message, or whether training messages like
this is not even a good idea, because the "normal text" at the end
would confuse the bayesian corpus more than it would help.  As I
mentioned above, it seems that it can be trained, but how?  I
trained already a lot of messages like that with sa-train, but the
score is still 0.0.

Thanks for any hint you can provide how to best deal with this kind
 of messages.

thanks

Christoph Reichenberger --Christoph Reichenberger - ergonis
software gmbh




Training Bayes to catch there kind of messages is difficult. Your best
bets are to use some rule-sets from SARE (www.rulesemporium.com) and
make sure you use several network tests (rbl's, surbl's, dcc, razor,
pyzor).

Actually training Bayes for these can be very easy.

To work out the reason that your system is not catching these we would need to see the full email message including the original headers.

If you can save the full message as a text file and put it somewhere on a web site, people here will be able to tell you exactly which rules should catch the spam.


--
Anthony Peacock
CHIME, Royal Free & University College Medical School
WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
"The problem with defending the purity of the English language is that
English is about as pure as a cribhouse whore. We don't just borrow
words; on occasion, English has pursued other languages down alleyways
to beat them unconscious and rifle their pockets for new
vocabulary."  -- James D. Nicoll

Reply via email to