Existing rule:

rawbody  __SPOOFED_URL  m/<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:[^>"'\# 
]{8,29}[^>"'\# 
:\/?&=])[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i


How about this, to only check for a changed domain part instead?

rawbody SPOOFED_URL_DOMAIN 
/<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:\/\/?[^\/>"'\# 
]{8,29})[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i

It matches this:

  <a href="http://www.chaosreigns.com/";>http://www.example.com</a>

But does not match this (example from actual non-spam):

  <a 
href="http://www.jr.com/tracking?ord_q_num=105725494&ord_q_zip=03076";>http://www.jr.com/tracking</a>


A very simplified form of this new one:

rawbody SPOOFED_URL_DOMAIN /<a href="(https?:\/\/[^\/">]+)[^>]*>(?!\1)http/i

That "(?!\1)" bit is nice and fancy.  It means "not what was in the first
set of parentheses).  In the perlre man page: "A zero-width negative
look-ahead assertion."

-- 
"Every normal man must be tempted at times to spit upon his hands,
hoist the black flag, and begin slitting throats."
 - Henry Louis Mencken (1880-1956)
http://www.ChaosReigns.com

Reply via email to