None of these rules will hit that. That's what the second "http" is for. "Hit the host name part of the href value of an anchor tag, then do *not* match the same host name in the value part of the anchor, then hit 'href'".
I should've called it SPOOFED_URL_HOST, because this one is matching the full host name, not just the domain. I don't even know if we can get the TLD logic for domain matching into a regex. Without a modification to the perl interpreter. On 10/14, Christian Grunfeld wrote: > and what about when there is no anchor text in the link ? eg. paypal > image button > > > 2011/10/14 <dar...@chaosreigns.com>: > > Existing rule: > > > > rawbody __SPOOFED_URL m/<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:[^>"'\# > > ]{8,29}[^>"'\# > > :\/?&=])[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i > > > > > > How about this, to only check for a changed domain part instead? > > > > rawbody SPOOFED_URL_DOMAIN > > /<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:\/\/?[^\/>"'\# > > ]{8,29})[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i > > > > It matches this: > > > > <a href="http://www.chaosreigns.com/">http://www.example.com</a> > > > > But does not match this (example from actual non-spam): > > > > <a > > href="http://www.jr.com/tracking?ord_q_num=105725494&ord_q_zip=03076">http://www.jr.com/tracking</a> > > > > > > A very simplified form of this new one: > > > > rawbody SPOOFED_URL_DOMAIN /<a href="(https?:\/\/[^\/">]+)[^>]*>(?!\1)http/i > > > > That "(?!\1)" bit is nice and fancy. It means "not what was in the first > > set of parentheses). In the perlre man page: "A zero-width negative > > look-ahead assertion." > > > > -- > > "Every normal man must be tempted at times to spit upon his hands, > > hoist the black flag, and begin slitting throats." > > - Henry Louis Mencken (1880-1956) > > http://www.ChaosReigns.com > > > -- "I finally figured out the only reason to be alive is to enjoy it." - Rita Mae Brown http://www.ChaosReigns.com