RE: unwanted breakthrough

Herb Martin Sun, 31 Jul 2005 02:10:06 -0700

> for some reason the spam sample at
> http://wolfgang.remsnet.de/medspam.txt
> is only classified by html rules, and by various dns tests, 
> but the common drugs and human body part rules missed it. 
> Anyone would have an idea why this is so?
> 
> I am running 3.0.4 default rules, plus a few SARE ones


Caveat again:  I am not a real expert (yet):

First, the mail is short so there is less for SpamAssassin
to work with, Bayes for instance doesn't kick in for either
of us; and you don't seem to be running many network tests 
if that is all you hit.  My score is 29.2 but would only be 
4.5 without the network tests.

Now, I probably overkill the net tests (RBLs, Pyzor, DCC,
Razor, and URIBLs).  I will not block directly on any
blacklist but I love using them as way to drive the score
very high.  

(Currently I am very pleased with an email server where I
am testing using blacklists to DRIVE greylisting tests in
front of SpamAssassin -- even if the mail is passed on, the
blacklist lookups will all be in the local DNS cache by
the time SA runs so it doesn't cost much to do this.  The
greylisting doesn't show here, but I am planning to try
using SpamAssassin to also drive the greylisting -- if
spammers have to resend few will do so and it is a LOT
safer than auto-deleting high score spam.

X-Spam-Status: Yes, score=29.2 required=6.0 tests=BODY_ENHANCEMENT2,
 
DIGEST_MULTIPLE,FB_HARD_ERECTION,HELO_DYNAMIC_IPADDR2,HM_URIBL_SC2_XS,
        HM_URIBL_SC_DBL,HM_URIBL_SC_XS,HTML_30_40,HTML_MESSAGE,INFO_TLD,
        MIME_HTML_ONLY,PYZOR_CHECK,RAZOR2_CF_RANGE_51_100,
        RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,
        SARE_SUB_BREAKTHRU,URIBL_AB_SURBL,URIBL_BLACK,URIBL_BLOK_MPRHS,
        URIBL_JP_SURBL,URIBL_OB_SURBL,URIBL_SBL,URIBL_SC2_SURBL,
        URIBL_SC_SURBL,URIBL_WS_SURBL,URIBL_XS_SURBL, DIGEST_MULTIPLE,
        HM_URIBL_SC_DBL, HM_URIBL_SC_XS 

         -- last 2 rules are actually -3.5 & -2.5 = -6 ------

Rules with HM_prefix are my own, the rest are all either stock
or probably from SARE (I have about everything available from
SARE including aggressive (Ham hitters) but NOT including those
that "hit nothing but seem cool".)  Scores are down below.

As for HTML, I have such rules at the default which are near zero.

As to overkill (I worry most about getting the same result, and
same false positives from multiple sources -- i.e., for basically
the same reason) so I have started writing some negative rules, e.g,
where scores are X=2, Y=2, and X && Y = -1 (total 3 instead of 4) 
to increase the confidence with multiple hits, but not score the 
complete score for both rules.

But so far, I just don't get many false positives due to my
aggressive net scoring.  (I whitelist very little now --
mostly just lists like this where the conversation is 
inherently spammy, or things like the X10 newsletter which
I just happen to like viewing AND did request originally.)

Here are the scores:

 *  0.3 SARE_SUB_BREAKTHRU subject has likely spammer phrase or word
 *  0.8 HELO_DYNAMIC_IPADDR2 Relay HELO'd using suspicious hostname (IP
 *      addr 2)
 *  0.6 FB_HARD_ERECTION BODY: FB_HARD_ERECTION
 *  0.8 BODY_ENHANCEMENT2 BODY: Information on getting larger body parts
 *  0.5 INFO_TLD URI: Contains an URL in the INFO top-level domain
 *  0.1 HTML_30_40 BODY: Message is 30% to 40% HTML
 *  0.0 HTML_MESSAGE BODY: HTML included in message
 *  1.2 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 *  1.5 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
 *      above 50%
 *      [cf: 100]
 *  0.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
 *  1.5 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level
 *      above 50%
 *      [cf:  60]
 *  2.0 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
 *      [cf: 100]
 *  2.0 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/)
 *  0.6 URIBL_SBL Contains an URL listed in the SBL blocklist
 *      [URIs: jjplanularch.info]
 *  2.5 URIBL_BLACK Contains an URL listed in the URIBL blacklist
 *      [URIs: jjplanularch.info]
 *  4.0 URIBL_SC_SURBL Contains an URL listed in the SC SURBL blocklist
 *      [URIs: jjplanularch.info]
 *  2.0 URIBL_AB_SURBL Contains an URL listed in the AB SURBL blocklist
 *      [URIs: jjplanularch.info]
 *  2.0 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blocklist
 *      [URIs: jjplanularch.info]
 *  2.5 URIBL_BLOK_MPRHS Contains URL from MailPolice BLOCK Combined list
 *      [URIs: jjplanularch.info]
 *  1.0 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist
 *      [URIs: jjplanularch.info]
 *  3.2 URIBL_OB_SURBL Contains an URL listed in the OB SURBL blocklist
 *      [URIs: jjplanularch.info]
 *  3.0 URIBL_XS_SURBL Has URI in XS - Testing
 *      [URIs: jjplanularch.info]
 *  4.0 URIBL_SC2_SURBL Has URI in SC2 at http://www.surbl.org/lists.html
 *      [URIs: jjplanularch.info]
 *  1.0 DIGEST_MULTIPLE Message hits more than one network digest check
 * -3.5 HM_URIBL_SC_DBL Prevent SC-SC2 double score
 * -2.5 HM_URIBL_SC_XS Prevent SC-XS double score

--
Herb Martin

RE: unwanted breakthrough

Reply via email to