At 02:28 PM 7/16/2010 -0500, Ian Bicking wrote:
On Fri, Jul 16, 2010 at 1:40 PM, P.J. Eby <<mailto:p...@telecommunity.com>p...@telecommunity.com> wrote:
At 11:07 AM 7/16/2010 -0500, Ian Bicking wrote:
And this doesn't help with Python 3: either we have byte values of SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I think bytes will be more awkward to port to than text, and inconsistent with other WSGI values.


OTOH, it has the tremendous advantage of pushing the encoding question onto the app (or framework) developer... Â who's really the only one who can make the right decision for their particular application. Â And personally, I'd rather have clear boundaries between text and bytes, such that porting (even if tedious or awkward) is *consistent*, and clear as to when you're finished, not, "oh, did I check to make sure I converted SCRIPT_NAME and PATH_INFO... Â not just in my app code, but in all the library code I call *from* my app?"

IOW, the bytes/string discussion on Python-dev has kind of led me to realize that we might just as well make the *entire* stack bytes (incoming and outgoing headers *and* streams), and rewrite that bit in PEP 333 about using str on "Python 3000" to say we go with bytes on Python 3+ for everything that's a str in today's WSGI.


This was my first intuition too, until I started thinking in more detail about the particular values involved. Some obviously are textish, like environ['SERVER_NAME']. Not a very useful value, but definitely text.

Basically all the internal strings are textish, so we're left with:

wsgi.url_scheme
SCRIPT_NAME/PATH_INFO
QUERY_STRING
HTTP_*, CONTENT_TYPE, CONTENT_LENGTH (headers)
response status
response headers (name and value)

What I'm getting at, though, is it's precisely this sort of "hm, which ones are bytes again?" stuff that makes you have to stop and *think*, i.e., it doesn't Fit My Brain<tm> any more. ;-)

There should be one, and preferably *only* one, obvious way to do it.

And given that HTTP is inherently a bunch of bytes, bytes is the one obvious way.

I previously was under the impression that bytes wouldn't interoperate with strings in 3.x, but they *do*, in much the same way as they did in 2.x. That means you'll be (mostly) bug-compatible in 3.x, only you'll likely encounter encoding issues *sooner*, rather than later. (i.e., the minute you combine non-ASCII inputs with your regular string constants).

Yes, you will also be forced to convert your return values to bytes, but if you've used string constants *anywhere*, then you know you'll be outputting text, which you should already have been encoding for output. (So you'll just be forced to deal with errors on that side sooner as well.)

All in all, I'd say this also fits with what people on Python-Dev keep hammering on as the One Obvious Way to deal with bytes and strings in a program: i.e., bytes for I/O, text for text processing.

WSGI is HTTP, and HTTP is I/O, ergo, WSGI is I/O, and we should therefore "byte" the bullet here. ;-)

_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to