None of these rules will hit that.  That's what the second "http" is for.
"Hit the host name part of the href value of an anchor tag, then do *not*
match the same host name in the value part of the anchor, then hit 'href'".

I should've called it SPOOFED_URL_HOST, because this one is matching the
full host name, not just the domain.  I don't even know if we can get the
TLD logic for domain matching into a regex.  Without a modification to the
perl interpreter.

On 10/14, Christian Grunfeld wrote:
> and what about when there is no anchor text in the link ? eg. paypal
> image button
> 
> 
> 2011/10/14  <dar...@chaosreigns.com>:
> > Existing rule:
> >
> > rawbody  __SPOOFED_URL  m/<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:[^>"'\# 
> > ]{8,29}[^>"'\# 
> > :\/?&=])[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i
> >
> >
> > How about this, to only check for a changed domain part instead?
> >
> > rawbody SPOOFED_URL_DOMAIN 
> > /<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:\/\/?[^\/>"'\# 
> > ]{8,29})[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i
> >
> > It matches this:
> >
> >  <a href="http://www.chaosreigns.com/";>http://www.example.com</a>
> >
> > But does not match this (example from actual non-spam):
> >
> >  <a 
> > href="http://www.jr.com/tracking?ord_q_num=105725494&ord_q_zip=03076";>http://www.jr.com/tracking</a>
> >
> >
> > A very simplified form of this new one:
> >
> > rawbody SPOOFED_URL_DOMAIN /<a href="(https?:\/\/[^\/">]+)[^>]*>(?!\1)http/i
> >
> > That "(?!\1)" bit is nice and fancy.  It means "not what was in the first
> > set of parentheses).  In the perlre man page: "A zero-width negative
> > look-ahead assertion."
> >
> > --
> > "Every normal man must be tempted at times to spit upon his hands,
> > hoist the black flag, and begin slitting throats."
> >  - Henry Louis Mencken (1880-1956)
> > http://www.ChaosReigns.com
> >
> 

-- 
"I finally figured out the only reason to be alive is to enjoy it."
- Rita Mae Brown
http://www.ChaosReigns.com

Reply via email to