OK, after some consideration, I think I'm sold.

Answering my own original question about why unicode seems to make sense as values in the WSGI environment even without consideration for Python 3 compatibility: *something* needs to do this translation. Currently I personally rely on WebOb to do a lot of this translation. I can't think of a good reason that implementations at the level of WebOb would each need to do this translation work; pushing the job into WSGI itself seems to make sense here. This is particularly true for PATH_INFO and QUERY_STRING; these days it's foolish to assume these values will be entirely composed of "low order" characters, and thus being able to access them as bytes natively isn't very useful.

OTOH, I suspect the Python 3 stdlib is still broken if it requires native strings in various places (and prohibits the use of bytes).

James Bennett wrote:
On Sun, Sep 20, 2009 at 11:25 PM, Chris McDonough <chr...@plope.com> wrote:
WSGI is a fairly low-level protocol aimed at folks who need to interface a
server to the outside world.  The outside world (by its nature) talks bytes.
 I fear that any implied conversion of environment values and iterable
return values to Unicode will actually eventually make things harder than
they are now.  I realize that it would make middleware implementors lives
harder to need to deal in bytes.  However, at this point, I also believe
that middleware kinda should be hard.  We have way too much middleware that
shouldn't be middleware these days (some written by myself).

Well, ordinarily I'd be inclined to agree: HTTP deals in bytes, so an
interface to HTTP should deal in bytes as well.

The problem, really is that despite being a very low-level interface,
WSGI has a tendency to leak up into much higher-level code, and (IMO)
authors of that high-level code really shouldn't have to waste their
time dealing with details of the underlying low-level gateway.

You've said you don't want to hear "Python 3" as the reason, but it
provides some useful examples: in high-level code you'll commonly want
to be doing things like, say, comparing parts of the requested URL
path to known strings or patterns. And that high-level code will
almost certainly use strings, while WSGI, in theory, will be using
bytes. That's just a recipe for disaster; if WSGI mandates bytes, then
bytes will have to start "infecting" much higher-level code (since
Python 3 -- rightly -- doesn't let you be nearly as promiscuous about
mixing bytes and strings).

Once I'm at a point where I can use Python 3, I know I'll personally
be looking for some library which will normalize everything for me
before I interact with it, precisely to avoid this sort of leakage; if
WSGI itself would at least *allow* that normalization to happen at the
low level (mandating it is another discussion entirely) I'd feel much
happier about it going forward.



_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to