On Fri, Jul 16, 2010 at 11:28 PM, Graham Dumpleton < graham.dumple...@gmail.com> wrote:
> > Nah, not nearly that hard: > > > > path_info = > urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8') > > > > I don't see the problem? If you want to distinguish %2f from /, then > you'll do it slightly differently, like: > > > > path_parts = [ > > urllib.parse.unquote_to_bytes(p).decode('UTF-8') > > for p in environ['wsgi.raw_path_info'].split('/')] > > > > This second recipe is impossible to do currently with WSGI. > > So... before jumping to conclusions, what's the hard part with using > > Sorry, it is not that simple. The thing that everyone is ignoring is > that SCRIPT_NAME and PATH_INFO are also normalized by the web server > normally. That is, .. instances are removed. By passing the raw URL > through to the application, you are now forcing every application to > have to deal with that as well with the possibility of directory > traversal attacks when people get it wrong and the URL is mapping > somehow to file system resources. It is a huge can of worms which at > the moment the web server deals with. > Well... at least to me "raw" only means "not URL decoded", so it doesn't necessarily mean you can't clean up the request path. I guess an attacker could encode "." to make things harder. Nevertheless, WSGI servers don't currently guarantee this cleaning. I added it to paste.httpserver, but I don't know one way or the other about any other servers. A quick test shows wsgiref does not clean paths. So apps shouldn't rely on a clean path. -- Ian Bicking | http://blog.ianbicking.org
_______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com