SA has long gone to great lengths to extract URIs from things which are not strictly URIs, on the basis that mail clients do the same and SA needs to inspect such things for DNSBL lookups. I'm fine with this.

However, once in a while I come across a case where something is clearly being extracted and canonicalized a little too enthusiastically, which usually comes to my attention in the context of an FP due in large part to a hit on our local DNSBL. (Which listing is in turn likely due to the same extraction and canonicalization on a batch of missed spam, and the minimal "is this an abused legit domain or a spammer domain" check I do before adding an entry to the DNSBL.)

The latest case is mail from the Cornell Lab of Ornithology, which has some message element that SA extracts "none" from, and converts it to "none.com" to try to look up "none.com" in DNSBLs. At a guess, it's an image tag with a "background" attribute of "none".

"uridnsbl_skip_domain none" doesn't seem to suppress this lookup, either in 3.4.6 or a recent test install from SVN trunk.

I've worked around this specific case, and past ones, in one way or another, but I'd like to more precisely target the bad URI extraction. In particular, I'd like to suppress this at the "random crap that looks like a URI" stage rather than later on. I specifically do NOT want to suppress lookups of the canonicalized URI, since that may be justifiably listed on the local DNSBL.

Am I missing some configuration option that can do this, or am I left with doing one of:
 - just suppressing lookups of the canonicalized URI
- removing the canonicalized URI from the DNSBL, even if the listing might be justified where the *NON*-canonical version absolutely isn't
 - applying the welcomelist_* sledgehammer

-kgd

Reply via email to