GoogleBot uncovered an error in URL parsing after it spidered a bad URL
which contained a "%25s" due to translator error. The "%25s" causes
urlDecode to raise a ValueError because it assumes anything following %
will be a valid hexadecimal value. URLParser does not catch that exception:
Traceback (most recent call last):
File "./WebKit/Application.py", line 465, in dispatchRawRequest [edit]
File "./WebKit/Application.py", line 527, in runTransaction [edit]
File "./WebKit/URLParser.py", line 67, in findServletForTransaction [edit]
File "./WebKit/URLParser.py", line 273, in parse [edit]
File "./WebKit/URLParser.py", line 330, in parse [edit]
File "./WebUtils/Funcs.py", line 84, in urlDecode [edit]
ValueError: invalid literal for int(): s
The notes in urlDecode indicate it will raise this type of exception,
but I don't see why it needs to. Any reason why urlDecode should not
handle that exception and treat "%" like any other character if the two
characters following it are not valid hexadecimal?
Here's a patch against the 0.9.1 release for WebUtils/Funcs.py to handle
the error inside urlDecode:
84c84,89
< p2.append(chr(int(p[:2], 16)) + p[2:])
---
> try:
> hx = int(p[:2], 16)
> except ValueError:
> p2.append('%' + p)
> else:
> p2.append(chr(hx) + p[2:])
- Ben
_______________________________________________
Webware-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/webware-discuss