http://bugzilla.spamassassin.org/show_bug.cgi?id=3318





------- Additional Comments From [EMAIL PROTECTED]  2004-04-27 16:46 -------
Subject: Re:  New: multiply-encoded URIs missed

On Tue, Apr 27, 2004 at 04:08:50PM -0700, [EMAIL PROTECTED] wrote:
>  
> http://images.google.ca/imgres?imgurl=gmib.free.fr/viagra.jpg&imgrefurl=http://www.google.com/url?q=http://www.google.com/url?q=%68%74%74%70%3A%2F%2F%77%77%77%2E%65%78%70%61%67%65%2E%63%6F%6D%2F%6D%61%6E%67%65%72%33%32
> 
> we currently don't catch it, because of the second layer of encoding.

1) <grrr>  I was going to say that the redirect doesn't work, but of
   course it works fine in IE.  What a POS.

2) The reason we don't catch it is that we follow the spec...  If ':'
   or '/' are encoded (%3A and %2F), it's supposed to stay encoded.
   So the code doesn't catch the encoded version.

The URI works down to:

http://images.google.ca/imgres?imgurl=gmib.free.fr/viagra.jpg&imgrefurl=http://www.google.com/url?q=http://www.google.com/url?q=http%3A%2F%2Fwww.expage.com%2Fmanger32

which also works in IE, BTW, then we grab the refurl:

http://www.google.com/url?q=http%3A%2F%2Fwww.expage.com%2Fmanger32

then we stop due to the encoding, which (as above) is supposed to stay encoded.

We can handle it to some degree by putting in a kluge:

    # If we see something that looks like a redirector, deal with it.
    if ($nuri =~ m#^(https?.+?https?)(\%3[aA]|:)((?:\%2[fF]|/){0,2})(.*)$#){
      my($start, $col, $slash, $end) = ($1,$2,$3,$4);
      if ($col=~/\%/ || $slash=~/\%/) {
        push(@uris, "$start://$end");
      }
    }

Which makes the redirect stripper figure out that there's a redirection going 
on:

debug: uri found: 
http://www.google.com/url?q=http%3A%2F%2Fwww.expage.com%2Fmanger32
debug: uri found: http://www.google.com/url?q=http://www.expage.com%2Fmanger32
debug: uri found: http://www.expage.com%2Fmanger32

However, that then makes other things screw up since the path
%2F is encoded, and there could be a port encoding too (ie:
'www.kluge.net%3A8080').  All of which the spec says needs to stay
encoded.

I suppose we could build the code to deal with the encoding up to the
first %2F, convert it to a /, then leave everything else.  Have to churn
on that for a little bit.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to