At 02:28 PM 7/16/2010 -0500, Ian Bicking wrote:
On Fri, Jul 16, 2010 at 1:40 PM, P.J. Eby
<<mailto:p...@telecommunity.com>p...@telecommunity.com> wrote:
At 11:07 AM 7/16/2010 -0500, Ian Bicking wrote:
And this doesn't help with Python 3: either we have byte values of
SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.Ã Â I
think bytes will be more awkward to port to than text, and
inconsistent with other WSGI values.
OTOH, it has the tremendous advantage of pushing the encoding
question onto the app (or framework) developer... Â who's really the
only one who can make the right decision for their particular
application. Â And personally, I'd rather have clear boundaries
between text and bytes, such that porting (even if tedious or
awkward) is *consistent*, and clear as to when you're finished, not,
"oh, did I check to make sure I converted SCRIPT_NAME and
PATH_INFO... Â not just in my app code, but in all the library code
I call *from* my app?"
IOW, the bytes/string discussion on Python-dev has kind of led me to
realize that we might just as well make the *entire* stack bytes
(incoming and outgoing headers *and* streams), and rewrite that bit
in PEP 333 about using str on "Python 3000" to say we go with bytes
on Python 3+ for everything that's a str in today's WSGI.
This was my first intuition too, until I started thinking in more
detail about the particular values involved. Some obviously are
textish, like environ['SERVER_NAME']. Not a very useful value, but
definitely text.
Basically all the internal strings are textish, so we're left with:
wsgi.url_scheme
SCRIPT_NAME/PATH_INFO
QUERY_STRING
HTTP_*, CONTENT_TYPE, CONTENT_LENGTH (headers)
response status
response headers (name and value)
What I'm getting at, though, is it's precisely this sort of "hm,
which ones are bytes again?" stuff that makes you have to stop and
*think*, i.e., it doesn't Fit My Brain<tm> any more. ;-)
There should be one, and preferably *only* one, obvious way to do it.
And given that HTTP is inherently a bunch of bytes, bytes is the one
obvious way.
I previously was under the impression that bytes wouldn't
interoperate with strings in 3.x, but they *do*, in much the same way
as they did in 2.x. That means you'll be (mostly) bug-compatible in
3.x, only you'll likely encounter encoding issues *sooner*, rather
than later. (i.e., the minute you combine non-ASCII inputs with your
regular string constants).
Yes, you will also be forced to convert your return values to bytes,
but if you've used string constants *anywhere*, then you know you'll
be outputting text, which you should already have been encoding for
output. (So you'll just be forced to deal with errors on that side
sooner as well.)
All in all, I'd say this also fits with what people on Python-Dev
keep hammering on as the One Obvious Way to deal with bytes and
strings in a program: i.e., bytes for I/O, text for text processing.
WSGI is HTTP, and HTTP is I/O, ergo, WSGI is I/O, and we should
therefore "byte" the bullet here. ;-)
_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe:
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com