On 10/14, Matus UHLAR - fantomas wrote:
> While I have no doubt there is much of wanted mail with URL and text
> mismatch, I still would like to have such rule.

It exists, you're welcome to copy it out of the rules sandbox and use it,
false positives and all.  I already linked to it:
http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/khopesh/20_khop_experimental.cf?view=markup

rawbody  __SPOOFED_URL  m/<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:[^>"'\# 
]{8,29}[^>"'\# 
:\/?&=])[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i
# even with scrubbing, probably can't handle 'legit' tracking redirectors
meta     SPOOFED_URL    __SPOOFED_URL && !(__VIA_ML || __SENDER_BOT || 
__YAHOO_BULK || __UNSUB_LINK || __THREADED || URL_SHORTENER)
describe SPOOFED_URL    Has a link whose text is a different URL

And I need to remind you that it hits almost as much ham as spam:
http://ruleqa.spamassassin.org/20111008-r1180336-n/T_SPOOFED_URL/detail

I agree it seems like we should be able to improve it.  Maybe make
exceptions for known marketing trackers, as Adam Katz mentioned it has
problems with.  

-- 
"Speed is a metaphor for freedom."
http://www.ChaosReigns.com

Reply via email to