At 01:15 PM 9/21/2009 -0700, Robert Brewer wrote:
I don't understand. If SCRIPT_NAME/PATH_INFO/QUERY_STRING are unicode, the only answer to "what's been done to the URI?" can be "wsgi.uri_encoding", which allows someone to un-do it. What more do you want?

To be sure that there's no possible way for all the broken middleware out there to mess this up.

Let me put it this way: out of all the times I've seen people post example WSGI 1 middleware code, I don't remember *any* where the middleware was actually complying with the spec correctly... and that includes examples I wrote myself. So I'm not real impressed with any solution that requires middleware to get it right.

That having been said, I'm beginning to think that PEP 383 (surrogateescape) is actually the way to go, now that I've looked over the PEP, docs, and Ian's posts here about it.

First, it's compatible with CGI (os.environ) right off the bat, as well as being the standard way to handle this sort of issue in Python 3.

Second, it's redundancy-free: you don't need a separate environ key to know what's going on.

Third, it's unconditional: if you want bytes or a non-UTF-8 encoding you perform the same steps every time.

Up until now, I've not paid much attention because so many people kept saying you can't get surrogateescape on Python 2. However, that's only an issue for code that *needs the original byte string*, as the old codec error handler API is sufficient for doing decoding. (Meaning you could register a handler for it on older Pythons.)

I think this approach would let us have our cake and eat it too, for the most part. WSGI on Python 2.x uses byte strings for these, and then 3.x works transparently. It's a bit of a stretch to call it a "clarification" of WSGI 1.0, but since for all intents and purposes WSGI doesn't really *run* on Python 3, it might be the way to go.

To be clear, I'm talking about simply allowing (on Python 3 and in WSGI versions>1.0) for all environ values to be utf-8-decoded, surrogate-escaped unicode values, in the "native string" case. (This would further imply that a CGI gateway would have to check whether the system encoding is UTF-8, and if not, transcode accordingly.)

_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to