* jdow wrote (23/12/05 11:26):
> From: "Chris Lear" <[EMAIL PROTECTED]>
> 
>> I'm getting false positives for SARE_URI_EQUALS, which scores 5 and is
>> therefore skewing the scoring of some mail quite badly.
>> The weird thing is that the uris that spamassassin is complaining about
>> aren't uris at all. The mail in question is auto-created reports of cvs
>> diffs, so it's slightly unusual.

[...]
>> 
>> I've had a bit of a look at the regexps that spamassassin uses to work
>> out what is a uri, and it seems that "updated.by=Updated" is treated as
>> a uri because .by is a valid tld and spamassassin looks for "schemeless"
>> uris, then prepends http:// for the tests.
>> 
>> I'm running spamassassin 3.1.0 on perl 5.8.2.
>> 
>> Does anyone have any suggestions, apart from simply reducing the score
>> for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to
>> guarantee that only real uris are parsed as such?
> 
> Before you drop the score precipitously check if there is some other
> characteristic of the emails that trigger falsely which can be used to
> apply a negative score. If there is such a characteristic then generate
> the appropriate negative score. If not weigh how effective the rule is
> for you. The version of "sa-stats.pl" that is on the SARE site helps
> figure this out nicely.
> 
> That said it's close to a "50/50" rule that hits on very few messages
> here so should have a low score. (It hit on 6 messages out of 75000.)
> Cutting it out completely here seems like it would be effective TODAY.
> That could change. At one time it was quite necessary. Spammer fads
> change.)

I've reduced the score, and a quick check shows that that rule hits
almost nothing anyway, so it's not a big problem. The bayes rules were
keeping the false positives from doing much damage, anyway.
But spamassassin uses uris for lots of things, and if it's commonly
parsing (reasonably) normal text as uris, I would expect that to be a
problem in more rules than just SARE_URI_EQUALS.

Chris

Reply via email to