GoogleBot uncovered an error in URL parsing after it spidered a bad URL 
which contained a "%25s" due to translator error. The "%25s" causes 
urlDecode to raise a ValueError because it assumes anything following % 
will be a valid hexadecimal value. URLParser does not catch that exception:

Traceback (most recent call last):
  File "./WebKit/Application.py", line 465, in dispatchRawRequest [edit]
  File "./WebKit/Application.py", line 527, in runTransaction [edit]
  File "./WebKit/URLParser.py", line 67, in findServletForTransaction [edit]
  File "./WebKit/URLParser.py", line 273, in parse [edit]
  File "./WebKit/URLParser.py", line 330, in parse [edit]
  File "./WebUtils/Funcs.py", line 84, in urlDecode [edit]
ValueError: invalid literal for int(): s

The notes in urlDecode indicate it will raise this type of exception, 
but I don't see why it needs to. Any reason why urlDecode should not 
handle that exception and treat "%" like any other character if the two 
characters following it are not valid hexadecimal?

Here's a patch against the 0.9.1 release for WebUtils/Funcs.py to handle 
the error inside urlDecode:

84c84,89
<               p2.append(chr(int(p[:2], 16)) + p[2:])
---
 >               try:
 >                       hx = int(p[:2], 16)
 >               except ValueError:
 >                       p2.append('%' + p)
 >               else:
 >                       p2.append(chr(hx) + p[2:])


- Ben


_______________________________________________
Webware-discuss mailing list
Webware-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/webware-discuss

Reply via email to