Re: [Web-SIG] URL quoting in WSGI (or the lack therof)

2008-01-21 Thread Robert Brewer
Luis Bruno wrote:
> Robert Brewer wrote:
> > > IMHO [changing CP's wsgiserver to do decoding] is the wrong answer
> > Why?
> >
> Because then I'm stuck monkey patching every WSGI server (and/or stuck
> using my own URL dispatcher) so that I don't lose the information that
> one of the forward slashes is NOT a path delimiter. You said that
> %-encoding is meant for slashes not participating in hierarchy
> semantics, if I read you correctly; so I think you'll agree with me on
> this.

Ah. Now I see. We've had a test case for this since Nov 2005 [1]. FWIW,
CherryPy took the option of special-casing forward slashes; those are
the only characters which are *not* decoded--they are left as %2F
characters in SCRIPT_NAME and PATH_INFO [2]:

# Unquote the path+params (e.g. "/this%20path" -> "this path").
# http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2
#
# But note that "...a URI must be separated into its components
# before the escaped characters within those components can be
# safely decoded." http://www.ietf.org/rfc/rfc2396.txt, sec 2.4.2
atoms = [unquote(x) for x in quoted_slash.split(path)]
path = "%2F".join(atoms)
environ["PATH_INFO"] = path

...and CherryPy then decodes these on the WSGI-app-side, only after the
dispatching step (to produce "virtual path" atoms) [3]:

if func:
# Decode any leftover %2F in the virtual_path atoms.
vpath = [x.replace("%2F", "/") for x in vpath]
request.handler = LateParamPageHandler(func, *vpath)
else:
request.handler = cherrypy.NotFound()

You're absolutely right; it would be nice to standardize a solution to
this. Of course, I'm going to propose we standardize *our* solution. ;)

> I'll see your CGI draft and raise you the URI spec.

Heh. Quoted in the code comments above.


Robert Brewer
[EMAIL PROTECTED]

[1] cf http://www.cherrypy.org/ticket/393
[2]
http://www.cherrypy.org/browser/trunk/cherrypy/wsgiserver/__init__.py#L3
14
[3] http://www.cherrypy.org/browser/trunk/cherrypy/_cpdispatch.py#L71

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] URL quoting in WSGI (or the lack therof)

2008-01-21 Thread Ian Bicking
Luis Bruno wrote:
> Hello y'all, delurking,
> 
> I'm using a /-delimited path, %-encoding each literal '/' appearing in 
> the path segments. I was not amused to see egg:Paste#http urldecoding 
> the whole PATH_INFO.

Unfortunately this is in the WSGI spec, so it's not Paste#http so much 
as WSGI that demands this.

I think in the CGI implementations this is kind of handled by 
REQUEST_URI containing the quoted value.  But relating REQUEST_URI with 
SCRIPT_NAME/PATH_INFO is awkward and having the information in duplicate 
places can lead to errors and unclear situations if they don't match up 
properly.

> Ben Bangert wrote:
>> This recently became an issue, when a user noticed that the %2B URL 
>> encoding for a + sign, had turned into a space when it hit their app.
> A swift monkey-patch to 
> paste.httpserver.py:WSGIHandlerMixin.wsgi_setup() later, and 
> ORIGINAL_PATH_INFO is part of the WSGI spec in my world. The following 
> URL now Does The Right Thing:
> 
> http://127.0.0.1:5000/catalog/NEC/Computers/Laptops/LN500%2F9DW/

It would be the Right Thing, except for not being WSGI.  I made note of 
this issue on the WSGI 2.0 ideas page, but I don't think anyone 
(including myself) has proposed any good resolution.  Diverging from CGI 
and leaving PATH_INFO/SCRIPT_NAME quoted would work.  But it's libel to 
lead to bugs as it's a fairly subtle thing and for most applications the 
semantics won't change and people won't realize their code is broken for 
some corner case.  I suppose we could remove SCRIPT_NAME and PATH_INFO 
entirely and replace them with new keys.

   Ian


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] URL quoting in WSGI (or the lack therof)

2008-01-21 Thread Luis Bruno
I'll top post my "solution"; scare quoted because I'm still not sure 
this is the smartest idea:
environ['wsgiorg.path-segments'] = ['catalog', 'NEC', 'Computers', 
'Laptop', 'LN500/9DW']

Robert Brewer wrote:
> All HTTP URI are /-delimited, and any '/' appearing in a single segment
> that is not intended to participate in the hierarchy semantics must be
> %-encoded before transmitting it over HTTP.
I wholeheartedly agree. And your explanation is clearer than mine.
>> IMHO [changing CP's wsgiserver to do decoding] is the wrong answer
> Why?
>   
Because then I'm stuck monkey patching every WSGI server (and/or stuck 
using my own URL dispatcher) so that I don't lose the information that 
one of the forward slashes is NOT a path delimiter. You said that 
%-encoding is meant for slashes not participating in hierarchy 
semantics, if I read you correctly; so I think you'll agree with me on this.
> You have to explain why you think the application should receive %XX encoded
> URI's instead of decoded ones. What's the benefit? I only see a con:
> every piece of middleware that cares has to repeat the decoding of
> PATH_INFO and SCRIPT_NAME, wasting CPU and memory.
>   
I was aware of this trade off, which is why I'm still not sure the 
application should receive the %-encoded URIs. My app was forced to 
split the URL on the '/' delimiters. If I can get the framework to do 
that job while dispatching, so much the better. Hence the solution I top 
posted. My problem rises when I output a link created from suitably 
%-encoding these path segments:

'/'.join(['NEC', 'Computers', 'Laptop', 'LN500/9DW'])

And after the user clicks that link, the framework gives me (and Routes 
has no way to avoid this when Paste is the one who's doing the whole 
path decoding):

['NEC', 'Computers', 'Laptop', 'LN500', '9DW']

Think dispatching to a ``callable(*segments, **urlvariables)``. I think 
we'll agree this is not what the app writer intended. And I'm out of 
luck if the WSGI server/dispatcher is the one doing the urldecoding.
> According to [1], the right answer is "yes":
>   
I'll see your CGI draft and raise you the URI spec[2]. When you've read 
the last sentence, you'll see how unoriginal the top posted solution was:
> 2.4.2. When to Escape and Unescape
>
> A URI is always in an "escaped" form, since escaping or unescaping a
> completed URI might change its semantics.  Normally, the only time
> escape encodings can safely be made is when the URI is being created
> from its component parts; each component may have its own set of
> characters that are reserved, so only the mechanism responsible for
> generating or interpreting that component can determine whether or
> not escaping a character will change its semantics. Likewise, a URI
> must be separated into its components before the escaped characters
> within those components can be safely decoded.
[1] http://cgi-spec.golux.com/draft-coar-cgi-v11-03-clean.html#6.1.6
[2] http://www.ietf.org/rfc/rfc2396.txt>. There is a CGI 
Informational RFC somewhere, which I've read diagonally coming here to 
grumble.

-- 
Luís Bruno
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com