Hash: SHA1

Hello maarten,

Friday, November 7, 2003, 1:25:05 PM, you wrote:

mvdB> ... Upon looking at those rules I see al LOT of inconsistencies.
mvdB> For instance, I found these rules that have score of zero(!) (and
mvdB> these are merely the top of a large iceberg)  

mvdB> score CASHCASHCASH 0
mvdB> score ADDRESSES_ON_CD 0
mvdB> score BLANK_LINES_90_100 0
mvdB> score EJACULATION 0
mvdB> score HERBAL_V+AG+A 0

mvdB> One could argue that yelling CASH CASH CASH is a valid sales pitch
mvdB> in a normal mail. But hey, are we being realistic here ?  How could
mvdB> anything but spam have this property ?

In my personal corpus, the CASHCASHCASH rule matches
* A personal response from SBCIS concerning spam abuse
* An email within a non-profit organizations internal mailing list
  concerning the cost of its annual convention
* A promotional email I receive as a member of a hotel's loyalty program
and 200 spam.

Personally, I've changed the score for this rule, to 0.75 (of 9.0).

mvdB> least some low figure but NOT equal zero...
mvdB> And...  well I won't even go into the fifth rule... come on ;-)

This rule seems to have matched no ham in my corpus. I'm also curious,
though, where you got the 0 score from. On my system this rule scores

mvdB> Well, I'll grant you that much although I did study it a fair
mvdB> amount. But let's look at another aspect here too. There is not a
mvdB> single rule that scores higher than 4.999. That is plain wrong in
mvdB> my book; ...

Me, I do not want any distributed rule to flag something as spam. Most of
my rules that I develop and add to my own system are limited to 1/3 of my
spam threshold. I strongly agree with the developers that spam is not
identified by a single rule, but instead by a combination of
characteristics, verified through a combination of rules.

There are three exceptions:
* emails sent to a completely invalid email address are always spam.
  There is no such address here as [EMAIL PROTECTED], and so any email
  sent to that address gets a "spam without a doubt" score.
* emails sent from known spamming organizations. That's what the
  distributed blacklists are for (thanks again, William Stearns).
* emails which contain URI links to sites that do nothing but spam.

Blacklists are automatically scored 100. The other two I will personally
score anywhere from 4.5 (half my threshold) to 50 (5x my threshold).

But the point is that *I* want to make this determination. I don't want
to trust anyone else's corpus to do this for me.

(I even have a list of blacklist addresses on William Stearns' public
list which I remove each and every time I update my download, since those
are not considered spam in my domain.)

mvdB> Not wanting to be a PITA ;-), I would almost start questioning the
mvdB> statistics file cause it seems not to reflect real-life situations.
mvdB> But hey, who am I ?

Not one of the mass check contributors yet, I can tell.  :-)

Stick around.  Learn how to use the masscheck capabilities (see the
masses directory within the SA distribution). Each time we move from one
major distribution to another (eg: 2.6x to 2.70) there's a mass check
scoring round, and you can help by testing the new ruleset against YOUR

mvdB> Of course. I know. The reason I started writing this in the first
mvdB> place is just _because_ I see so many messages that are SO full of
mvdB> spam signs, yet invariably score 4.90...  And thus, they fall right
mvdB> through...  :-((

Head for the Rules Emporium, or for the Wiki, and you'll see how many of
us beef up our own ruleset. You'll have those 4.90's scoring 14.90
without much delay.

Bob Menschel

Version: PGP 8.0


This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
Spamassassin-talk mailing list

Reply via email to