Greg Troxel wrote:
> I have been seeing several occasions where two rules hit for the same
> underlying issue, and it seems that this isn't really desired.
> 
> Example 1: I got ham that had a line with
> 
>   dig [some.isp.name.].isphosts.junkemailfilter.com
> 
> in it.  It seems giving it 2.3 points for SPOOF_COM2COM is fair, but
> that turns out to be 4.3 because SPOOF_COM2OTH gets 2.0.  This ended
> up as a FP because I filter to spam folder at 1, preferring to
> misclassify some list mail to keep my inbox as clean as I can.

Yep.  With a threshold of 1, you're going to get plenty of false
positives...

> X-Spam-Status: Yes, score=1.7 required=1.0
>       tests=AWL,BAYES_00,HTML_MESSAGE, SPOOF_COM2COM,SPOOF_COM2OTH
> autolearn=no version=3.2.4 
> X-Spam-Report:
>       *  2.0 SPOOF_COM2OTH URI: URI contains ".com" in middle
>       *  2.3 SPOOF_COM2COM URI: URI contains ".com" in middle and end
>       * -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1%
>       *      [score: 0.0000]
>       *  0.0 HTML_MESSAGE BODY: HTML included in message
>       *  0.0 AWL AWL: From: address is in the auto white-list

Since both of these are base rules and the scores were generated with
the overlapping rules in place, the scores are probably pretty accurate.

Keep in mind that all of the scores are generated based on a spam
threshold of 5.

> Example 2: blacklists
> 
> Here, the mail is spam from a bad source, but with two lists more or
> less claiming this it doesn't seem quite right to add the scores.  In
> this case spamcop says the machine has sent spam, and spamhaus that
> it's 
> in XBL for being a compromised box.
> 
> X-Spam-Status: Yes, score=3.6 required=1.0
>         tests=AWL,BAYES_50,HTML_MESSAGE,
> RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_XBL,RDNS_NONE autolearn=spam
> version=3.2.4 X-Spam-Report: 
>         *  0.0 HTML_MESSAGE BODY: HTML included in message
>         *  0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
>         *      [score: 0.5676]
>         *  4.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in
> bl.spamcop.net 
>         *      [Blocked - see
> <http://www.spamcop.net/bl.shtml?123.142.103.19>] 
>         *  3.0 RCVD_IN_XBL RBL: Received via a relay in Spamhaus XBL
>         *      [123.142.103.19 listed in zen.spamhaus.org]
>         *  0.1 RDNS_NONE Delivered to trusted network by a host with
> no rDNS 
>         * -3.6 AWL AWL: From: address is in the auto white-list

This I see as desirable.  You have two different blacklists reporting
a spammy relay.  I don't have any problem with these scores.

In fact, I never would have seen this message since I block based on
zen at the MTA level.

> So, I realize this would be complicated, but I wonder about having a
> score combining function for tests that are making essentially the
> same claim.  Perhaps the 4 and 3 above should combine to 5, and the
> SPOOF_COM2* should just be 2.3.

I would leave the scores alone and maybe modify the SPOOF_COM2OTH to
exclude uris that end in .com.  Maybe something like this (untested):

uri __SPOOF_COM2OTH m{^https?://(?:\w+\.)+?com\.(?:\w+\.){2}}i
meta SPOOF_COM2OTH __SPOOF_COM2OTH && !SPOOF_COM2COM

Of course, I didn't write these rules in the first place.  They may
overlap on purpose.

-- 
Bowie

Reply via email to