Re: SARE_URI_EQUALS false positives

jdow Fri, 23 Dec 2005 04:07:21 -0800

From: "Chris Lear" <[EMAIL PROTECTED]>

* jdow wrote (23/12/05 11:26):

From: "Chris Lear" <[EMAIL PROTECTED]>

I'm getting false positives for SARE_URI_EQUALS, which scores 5 and is
therefore skewing the scoring of some mail quite badly.
The weird thing is that the uris that spamassassin is complaining about
aren't uris at all. The mail in question is auto-created reports of cvs
diffs, so it's slightly unusual.


[...]


I've had a bit of a look at the regexps that spamassassin uses to work
out what is a uri, and it seems that "updated.by=Updated" is treated as
a uri because .by is a valid tld and spamassassin looks for "schemeless"
uris, then prepends http:// for the tests.

I'm running spamassassin 3.1.0 on perl 5.8.2.

Does anyone have any suggestions, apart from simply reducing the score
for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to
guarantee that only real uris are parsed as such?


Before you drop the score precipitously check if there is some other
characteristic of the emails that trigger falsely which can be used to
apply a negative score. If there is such a characteristic then generate
the appropriate negative score. If not weigh how effective the rule is
for you. The version of "sa-stats.pl" that is on the SARE site helps
figure this out nicely.

That said it's close to a "50/50" rule that hits on very few messages
here so should have a low score. (It hit on 6 messages out of 75000.)
Cutting it out completely here seems like it would be effective TODAY.
That could change. At one time it was quite necessary. Spammer fads
change.)


I've reduced the score, and a quick check shows that that rule hits
almost nothing anyway, so it's not a big problem. The bayes rules were
keeping the false positives from doing much damage, anyway.
But spamassassin uses uris for lots of things, and if it's commonly
parsing (reasonably) normal text as uris, I would expect that to be a
problem in more rules than just SARE_URI_EQUALS.


That is a standalone rule.

And I do note that many of the SARE rules have severe problems in very
specific cases. There are some mailing lists that are not well filtered
for spam which have postings which trigger some of the "too effective
to toss" SARE rules. I've developed some massive meta rules to at least
partially get a handle on the problem. (A number of times XXX hit option
would be nice to have for this.)

{^_^}

Re: SARE_URI_EQUALS false positives

Reply via email to