Sorry, after having had a bit of think while eating lunch, I am going to throw up another point of view on this whole issue. So, sit back and be just a little bit concerned.
WSGI stands for Web Server GATEWAY Interface. My understanding is that right back at the beginning WSGI was purely intended to only be used at the direct interface with the underlying web server. This is why I understand, in part at least, the term 'gateway is used in the acronym. The problem was that people discovered one could apply the same interface for use as middleware. As we all know, that has been used quite successfully, but has also been equally abused. With that in mind, maybe we should start instead to look more at WSGI being a series of layers. Yes people have talked about standardised request/response objects, but I am not thinking at that high of a level. What I am going to suggest is that there perhaps should still be a clear line between bytes and unicode. So, rather than throw away completely the idea of bytes everywhere, and rewrite the WSGI specification, we could instead say that the existing conceptual idea of WSGI 1.0 is still valid, and just build on top of it a translation interface to present that as unicode. We might still want to respecify WSGI as is now as per the bytes/unicode/native definitions I explained in my blog post at: http://blog.dscpl.com.au/2009/09/roadmap-for-python-wsgi-specification.html I'd suggest this would possibly be the same or quite similar to my original definition #2 in the blog post. To save you having to go back to the blog post, I include it here again. 1. The application is passed an instance of a Python dictionary containing what is referred to as the WSGI environment. All keys in this dictionary are native strings. For CGI variables, all names are going to be ISO-8859-1 and so where native strings are unicode strings, that encoding is used for the names of CGI variables 2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI environment, the value of the variable should be a native string. 3. For the CGI variables contained in the WSGI environment, the values of the variables are byte strings. 4. The WSGI input stream 'wsgi.input' contained in the WSGI environment and from which request content is read, should yield byte strings. 5. The status line specified by the WSGI application must be a byte string. 6. The list of response headers specified by the WSGI application must contain tuples consisting of two values, where each value is a byte string. 7. The iterable returned by the application and from which response content is derived, must yield byte strings. By seeing WSGI as being layers instead, first thing is that web frameworks such as web2py and CherryPy which merely use WSGI as the gateway interface would continue to work directly on this layer, regardless of whether they use Python 2.X or 3.X. Those frameworks are already going to translate what ever this interface defines into their own internal interface and effectively relegate WSGI from any higher levels of the application. We now get back to the unicode vs bytes argument we have been having. This argument will not vanish by virtue of doing this, but instead of pushing the unicode translation down into the gateway interface layer, we just apply it on top. There is possibly not even a need for the gateway interface layer to even implement the unicode translation layer, and instead this may instead be a documented standard convention that any web application which mounts directly on the gateway interface layer should implement. The danger in taking this approach is that you now risk having two types of so called middleware. These are bytes middleware and unicode middleware. Confusion obviously could come about if you accidentally mix the two, although some middleware may actually be able to operate on either bytes or unicode and so not care. To avoid conflict, one could as a minimal measure just add an additional 'wsgi.' variable which indicates whether interface is 'bytes' or 'unicode' and hope middleware validate they have been plugged in at the correct level. Alternatively you change the interface in some way that they couldn't be plugged together in the first place. Some may see this though as the opportunity to introduce a full request/response object. There is some merit to that as these may actually want to access the original bytes rather than deal with the result of the unicode translation layer. Anyway, that is the thought. Should we be looking at WSGI as a set of layers instead of assuming we have to push unicode into the gateway interface layer? I don't believe this is the same as the prior question of whether WSGI should be bytes or unicode as we are saying it encompasses both, but as separate layers. Previously in asking whether should be bytes or unicode, if the answer was yes to bytes, then the intention was that unicode would be out of scope and every man and his dog could do it differently. Here we would still define the unicode layer that would sit on top of the bytes layer. If we were to say it is layered, and the gateway interface should always bytes to the extent of definition #2 above, it would potentially pave the way for mod_wsgi and CherryPy WSGI servers to be released in quick order. Doing this does though in part take the Java approach of punting the problem up to the next layer. The difference would be that whereas Java doesn't really define that next translation layer as I understand what people are saying, we could define it and so at least improve on things. FWIW, I thought of this because I was going to suggest at this point that overall we have a break from the discussion at this point. The discussion has been robust and useful in helping us uncover the issues, but I think we are all perhaps starting to get overwhelmed. I was thus also going to suggest that we setup an area on bitbucket and start documenting each of the main proposals, along with supplying reference code which provides a Python 2.X WSGI 1.0 to WSGI Proposal X.Y that people could actually experiment. Since wsgiref and mod_wsgi in Python 3.X also basically use same interface, we could also supply reference code for Python 3.X as well. The point of doing this would be such that the various proposals were documented concisely and people could quickly come to understand what they are and compare rather than have to wade through the mountains of email messages. It was at this point it occurred to me that since this layering is possible on top of the original bytes interface for the purposes of some reference code to demonstrate new interface, that maybe we should continue treat it as a series of translation layers that build on top of base raw bytes interface, rather than try and make it monolithic. Comments? Graham _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com