Re: [Web-SIG] URL quoting in WSGI (or the lack therof)
Luis Bruno wrote: > Robert Brewer wrote: > > > IMHO [changing CP's wsgiserver to do decoding] is the wrong answer > > Why? > > > Because then I'm stuck monkey patching every WSGI server (and/or stuck > using my own URL dispatcher) so that I don't lose the information that > one of the forward slashes is NOT a path delimiter. You said that > %-encoding is meant for slashes not participating in hierarchy > semantics, if I read you correctly; so I think you'll agree with me on > this. Ah. Now I see. We've had a test case for this since Nov 2005 [1]. FWIW, CherryPy took the option of special-casing forward slashes; those are the only characters which are *not* decoded--they are left as %2F characters in SCRIPT_NAME and PATH_INFO [2]: # Unquote the path+params (e.g. "/this%20path" -> "this path"). # http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2 # # But note that "...a URI must be separated into its components # before the escaped characters within those components can be # safely decoded." http://www.ietf.org/rfc/rfc2396.txt, sec 2.4.2 atoms = [unquote(x) for x in quoted_slash.split(path)] path = "%2F".join(atoms) environ["PATH_INFO"] = path ...and CherryPy then decodes these on the WSGI-app-side, only after the dispatching step (to produce "virtual path" atoms) [3]: if func: # Decode any leftover %2F in the virtual_path atoms. vpath = [x.replace("%2F", "/") for x in vpath] request.handler = LateParamPageHandler(func, *vpath) else: request.handler = cherrypy.NotFound() You're absolutely right; it would be nice to standardize a solution to this. Of course, I'm going to propose we standardize *our* solution. ;) > I'll see your CGI draft and raise you the URI spec. Heh. Quoted in the code comments above. Robert Brewer [EMAIL PROTECTED] [1] cf http://www.cherrypy.org/ticket/393 [2] http://www.cherrypy.org/browser/trunk/cherrypy/wsgiserver/__init__.py#L3 14 [3] http://www.cherrypy.org/browser/trunk/cherrypy/_cpdispatch.py#L71 ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] URL quoting in WSGI (or the lack therof)
Luis Bruno wrote: > Hello y'all, delurking, > > I'm using a /-delimited path, %-encoding each literal '/' appearing in > the path segments. I was not amused to see egg:Paste#http urldecoding > the whole PATH_INFO. Unfortunately this is in the WSGI spec, so it's not Paste#http so much as WSGI that demands this. I think in the CGI implementations this is kind of handled by REQUEST_URI containing the quoted value. But relating REQUEST_URI with SCRIPT_NAME/PATH_INFO is awkward and having the information in duplicate places can lead to errors and unclear situations if they don't match up properly. > Ben Bangert wrote: >> This recently became an issue, when a user noticed that the %2B URL >> encoding for a + sign, had turned into a space when it hit their app. > A swift monkey-patch to > paste.httpserver.py:WSGIHandlerMixin.wsgi_setup() later, and > ORIGINAL_PATH_INFO is part of the WSGI spec in my world. The following > URL now Does The Right Thing: > > http://127.0.0.1:5000/catalog/NEC/Computers/Laptops/LN500%2F9DW/ It would be the Right Thing, except for not being WSGI. I made note of this issue on the WSGI 2.0 ideas page, but I don't think anyone (including myself) has proposed any good resolution. Diverging from CGI and leaving PATH_INFO/SCRIPT_NAME quoted would work. But it's libel to lead to bugs as it's a fairly subtle thing and for most applications the semantics won't change and people won't realize their code is broken for some corner case. I suppose we could remove SCRIPT_NAME and PATH_INFO entirely and replace them with new keys. Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] URL quoting in WSGI (or the lack therof)
I'll top post my "solution"; scare quoted because I'm still not sure this is the smartest idea: environ['wsgiorg.path-segments'] = ['catalog', 'NEC', 'Computers', 'Laptop', 'LN500/9DW'] Robert Brewer wrote: > All HTTP URI are /-delimited, and any '/' appearing in a single segment > that is not intended to participate in the hierarchy semantics must be > %-encoded before transmitting it over HTTP. I wholeheartedly agree. And your explanation is clearer than mine. >> IMHO [changing CP's wsgiserver to do decoding] is the wrong answer > Why? > Because then I'm stuck monkey patching every WSGI server (and/or stuck using my own URL dispatcher) so that I don't lose the information that one of the forward slashes is NOT a path delimiter. You said that %-encoding is meant for slashes not participating in hierarchy semantics, if I read you correctly; so I think you'll agree with me on this. > You have to explain why you think the application should receive %XX encoded > URI's instead of decoded ones. What's the benefit? I only see a con: > every piece of middleware that cares has to repeat the decoding of > PATH_INFO and SCRIPT_NAME, wasting CPU and memory. > I was aware of this trade off, which is why I'm still not sure the application should receive the %-encoded URIs. My app was forced to split the URL on the '/' delimiters. If I can get the framework to do that job while dispatching, so much the better. Hence the solution I top posted. My problem rises when I output a link created from suitably %-encoding these path segments: '/'.join(['NEC', 'Computers', 'Laptop', 'LN500/9DW']) And after the user clicks that link, the framework gives me (and Routes has no way to avoid this when Paste is the one who's doing the whole path decoding): ['NEC', 'Computers', 'Laptop', 'LN500', '9DW'] Think dispatching to a ``callable(*segments, **urlvariables)``. I think we'll agree this is not what the app writer intended. And I'm out of luck if the WSGI server/dispatcher is the one doing the urldecoding. > According to [1], the right answer is "yes": > I'll see your CGI draft and raise you the URI spec[2]. When you've read the last sentence, you'll see how unoriginal the top posted solution was: > 2.4.2. When to Escape and Unescape > > A URI is always in an "escaped" form, since escaping or unescaping a > completed URI might change its semantics. Normally, the only time > escape encodings can safely be made is when the URI is being created > from its component parts; each component may have its own set of > characters that are reserved, so only the mechanism responsible for > generating or interpreting that component can determine whether or > not escaping a character will change its semantics. Likewise, a URI > must be separated into its components before the escaped characters > within those components can be safely decoded. [1] http://cgi-spec.golux.com/draft-coar-cgi-v11-03-clean.html#6.1.6 [2] http://www.ietf.org/rfc/rfc2396.txt>. There is a CGI Informational RFC somewhere, which I've read diagonally coming here to grumble. -- Luís Bruno ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com