Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

Robert Brewer Mon, 21 Sep 2009 13:15:22 -0700

P.J. Eby wrote:
> At 11:23 AM 9/21/2009 -0700, Robert Brewer wrote:
> >I still don't see why the environ should have multiple versions of
> >anything. It's not as if the HTTP request gives us multiple
> >Request-URI's. There's a single processing step that has to happen
> >somewhere: decoding the bytes of the Request-URI to unicode. For the
> >vast majority of apps, it should only happen once. Twice is
> >acceptable to me for some apps. As I pointed out in the linked
> >email, doing that as soon as possible (i.e. in the WSGI origin
> >server) allows URI's to be compared as character strings more
> >easily. If you deploy a piece of middleware that transcodes (based
> >on more information than servers want to deal with), it had better
> >be nearly first in the stack so routing works reliably.
> 
> The problem with this whole approach is that it's not
> composable.  You can't stick in an application under a router that
> uses a different method for grokking its subtree of the URI space,
> unless it knows what's been done to the URI and can un-do it.


I don't understand. If SCRIPT_NAME/PATH_INFO/QUERY_STRING are unicode, the only 
answer to "what's been done to the URI?" can be "wsgi.uri_encoding", which 
allows someone to un-do it. What more do you want?

1. bytes arrive. server decodes with utf8, sets 'wsgi.uri_encoding' to 'utf-8'.
2. middleware says "oops, that's wrong". encodes back to bytes using 'utf-8', 
and re-decodes with koi-8, changing wsgi.uri_encoding to 'koi-8'
3. further middlewares and app use the unicode value, and don't really care 
what encoding was used.

> Maybe I'm missing something here, but the only way I see to preserve
> composability here is to use latin-1 or bytes.
> 
> The fundamental problem is that, like it or not, HTTP headers are
> actually byte strings.  The *only* reason we ever supported unicode
> in WSGI was to handle platforms where there's no such thing as a
> non-unicode string, and there we made it explicit that it's just a
> way of manipulating *bytes*, not unicode.
> 
> ISTM that very few (if any) of the proposals floating around for
> modifying WSGI are taking this concept into account.  Most of them
> sound to me like people saying, "yeah, but this particular hack will
> work for *my* apps...  so everybody else must be doing something
> stupid."
> 
> But WSGI was built on the principle of *equally inconveniencing
> everyone*, specifically to avoid an impossible attempt at consensus
> between incompatible ways of doing things.  (E.g., nine million
> request/response APIs.)
> 
> So, if the only problem we're going to cause by using bytes
> everywhere is to make everyone need to change their routing code on
> Python 3, I vote +1000.  ;-)

That's not the only problem. Using native strings wherever possible makes web 
programing in Python easier, regardless of version. In Python 3, that happens 
to be unicode, for good reasons.

For HTTP, there's a more specific reason: URI's should be compared for 
equivalence character by character, not byte by byte. See 
http://tools.ietf.org/html/rfc3986#section-6.2.1. That includes routing 
middleware.


Robert Brewer
[email protected]

_______________________________________________
Web-SIG mailing list
[email protected]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

Reply via email to