GoogleBot uncovered an error in URL parsing after it spidered a bad URL which contained a "%25s" due to translator error. The "%25s" causes urlDecode to raise a ValueError because it assumes anything following % will be a valid hexadecimal value. URLParser does not catch that exception:
Traceback (most recent call last): File "./WebKit/Application.py", line 465, in dispatchRawRequest [edit] File "./WebKit/Application.py", line 527, in runTransaction [edit] File "./WebKit/URLParser.py", line 67, in findServletForTransaction [edit] File "./WebKit/URLParser.py", line 273, in parse [edit] File "./WebKit/URLParser.py", line 330, in parse [edit] File "./WebUtils/Funcs.py", line 84, in urlDecode [edit] ValueError: invalid literal for int(): s The notes in urlDecode indicate it will raise this type of exception, but I don't see why it needs to. Any reason why urlDecode should not handle that exception and treat "%" like any other character if the two characters following it are not valid hexadecimal? Here's a patch against the 0.9.1 release for WebUtils/Funcs.py to handle the error inside urlDecode: 84c84,89 < p2.append(chr(int(p[:2], 16)) + p[2:]) --- > try: > hx = int(p[:2], 16) > except ValueError: > p2.append('%' + p) > else: > p2.append(chr(hx) + p[2:]) - Ben _______________________________________________ Webware-discuss mailing list Webware-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/webware-discuss