On 27 August 2010 13:45, P.J. Eby <p...@telecommunity.com> wrote: > At 01:37 AM 8/27/2010 +0200, Armin Ronacher wrote: >> >> Hi, >> >> Is there a status update on that now I missed? Did something decide on >> bytes for the environment values or are we still unsure about that? > > To the extent we're "unsure", I think the holdup is simply that nobody has > tried doing an all-bytes WSGI implementation -- unless of course you count > all our Python 2.x experience as experience with an all-bytes > implementation. ;-) > > (Of course, that experience won't help us with Python 3 stdlib issues.) > > >> At that point I don't care at all about what is decided on as long as >> something is decided. Can someone please stand up and just do that? :) > > Essentially the problem right now is that unless such a choice is made, > there's little hope of getting the stdlib issues to be resolved, because we > can't exactly file bug reports against the stdlib if we don't know what we > want it to do. ;-) > > My personal inclination is to define WSGI 2 as a bytes-oriented protocol, > and then encourage people to port to WSGI 2 before moving to Python 3.
Since the major stumbling block, irrespective of other changes, to any sort of agreement is still bytes vs unicode, and where we have a reasonable clear definition of what unicode suggestion is, can we please as a first step get a definition of what bytes actually implies so everyone knows what we are talking about. I specifically ask this, as it isn't clear because people don't explain in detail what they mean when they are saying 'bytes'. Going back to my definition #2 in my blog post from a year ago, I had: 1. The application is passed an instance of a Python dictionary containing what is referred to as the WSGI environment. All keys in this dictionary are native strings. For CGI variables, all names are going to be ISO-8859-1 and so where native strings are unicode strings, that encoding is used for the names of CGI variables 2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI environment, the value of the variable should be a native string. 3. For the CGI variables contained in the WSGI environment, the values of the variables are byte strings. 4. The WSGI input stream 'wsgi.input' contained in the WSGI environment and from which request content is read, should yield byte strings. 5. The status line specified by the WSGI application must be a byte string. 6. The list of response headers specified by the WSGI application must contain tuples consisting of two values, where each value is a byte string. 7. The iterable returned by the application and from which response content is derived, must yield byte strings. The points of disagreement I have seen about this is are as follows. For (1), the keys should also be bytes, including names of 'wsgi.' special keys. For (2), the value of 'wsgi.url_scheme' should be bytes. So, do you really want bytes absolutely everywhere, or are keys still going to be unicode taken as ISO-8859-1. Note that we are not agreeing to the final solution here, just what bytes means in contrast to the unicode option, so we know that we are comparing only two options and not many options because people have different interpretations of what bytes means. As contrast, what we generally mean by the unicode option is definition #3 from my blog post. That being: 1. The application is passed an instance of a Python dictionary containing what is referred to as the WSGI environment. All keys in this dictionary are native strings. For CGI variables, all names are going to be ISO-8859-1 and so where native strings are unicode strings, that encoding is used for the names of CGI variables 2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI environment, the value of the variable should be a native string. 3. For the CGI variables contained in the WSGI environment, the values of the variables are native strings. Where native strings are unicode strings, ISO-8859-1 encoding would be used such that the original character data is preserved and as necessary the unicode string can be converted back to bytes and thence decoded to unicode again using a different encoding. 4. The WSGI input stream 'wsgi.input' contained in the WSGI environment and from which request content is read, should yield byte strings. 5. The status line specified by the WSGI application should be a byte string. Where native strings are unicode strings, the native string type can also be returned in which case it would be encoded as ISO-8859-1. 6. The list of response headers specified by the WSGI application should contain tuples consisting of two values, where each value is a byte string. Where native strings are unicode strings, the native string type can also be returned in which case it would be encoded as ISO-8859-1. 7. The iterable returned by the application and from which response content is derived, should yield byte strings. Where native strings are unicode strings, the native string type can also be returned in which case it would be encoded as ISO-8859-1. Even though call it unicode, it actually has bytes in places as well. The key issues over bytes vs unicode has been in values in the dictionary, but as pointed out about, not clear whether for bytes option, we are talking about bytes for keys as well and for value of 'wsgi.url_scheme'. So, can we can clarify this first. And if you are going to comment, for that extra clarity, cut and paste my definition #2 above and make the changes to it so we have the full definition, rather than just referring to bits. That way people who come and read this don't have to troll through the whole email chain to derive the context. Once we get that clarification, then we can perhaps discuss exclusively any issues people have with that bytes definition. That is before we even try to balance it against the unicode option or look at other WSGI 2 changes such as dropping start_response and wsgi.file_wrapper. And I apologise in advance if I start getting cranky and people think I am trying to hijack the conversation. I want a solution more so than probably anyone else as I can't fix up mod_wsgi until there is and right now am I feeling pretty unmotivated towards doing anything with mod_wsgi at all, even non Python 3.X enhancements because of all this. So, if we can keep focus and try going one step at a time, maybe I will not got ballistic. ;-) Graham _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com