Fine-tuning SA URI extraction

Kris Deugau Wed, 26 Apr 2023 08:08:03 -0700

SA has long gone to great lengths to extract URIs from things which arenot strictly URIs, on the basis that mail clients do the same and SAneeds to inspect such things for DNSBL lookups. I'm fine with this.

However, once in a while I come across a case where something is clearlybeing extracted and canonicalized a little too enthusiastically, whichusually comes to my attention in the context of an FP due in large partto a hit on our local DNSBL. (Which listing is in turn likely due tothe same extraction and canonicalization on a batch of missed spam, andthe minimal "is this an abused legit domain or a spammer domain" check Ido before adding an entry to the DNSBL.)

The latest case is mail from the Cornell Lab of Ornithology, which hassome message element that SA extracts "none" from, and converts it to"none.com" to try to look up "none.com" in DNSBLs. At a guess, it's animage tag with a "background" attribute of "none".

"uridnsbl_skip_domain none" doesn't seem to suppress this lookup, eitherin 3.4.6 or a recent test install from SVN trunk.

I've worked around this specific case, and past ones, in one way oranother, but I'd like to more precisely target the bad URI extraction.In particular, I'd like to suppress this at the "random crap that lookslike a URI" stage rather than later on. I specifically do NOT want tosuppress lookups of the canonicalized URI, since that may be justifiablylisted on the local DNSBL.

Am I missing some configuration option that can do this, or am I leftwith doing one of:

 - just suppressing lookups of the canonicalized URI

- removing the canonicalized URI from the DNSBL, even if the listingmight be justified where the *NON*-canonical version absolutely isn't

 - applying the welcomelist_* sledgehammer

-kgd

Fine-tuning SA URI extraction

Reply via email to