At 12:57 PM 9/21/2010 -0400, Ian Bicking wrote:
On Tue, Sep 21, 2010 at 12:09 PM, P.J. Eby <<mailto:p...@telecommunity.com>p...@telecommunity.com> wrote:
The Python 3 specific changes are to use:

* ``bytes`` for I/O streams in both directions
* ``str`` for environ keys and values
* ``bytes`` for arguments to start_response() and write()


This is the only thing that seems odd to me -- it seems like the response should be symmetric with the request, and the request in this case uses str for headers (status being header-like), and bytes for the body.

So, I've given some thought to your suggestion, and, while it's true that most of the output headers are far less prone to ending up with unintended unicode content, there are at least two output headers that can include some sort of application content (and can therefore have random failures): Location and Set-Cookie.

If these headers accidentally contain non-Latin1 characters, the error isn't detectable until the header reaches the origin server doing the transmission encoding, and it'll likely be a dynamic (and therefore hard-to-debug) error.

However, if the output is always bytes (and this can be relatively-statically verified), then any error can't occur except *inside* the application, where the app's developer can find it more easily.

So I guess the question boils down to: would we rather make sure that coding errors happen *inside* applications, or would we rather make porting WSGI apps trivial (or nearly so)?

But I think that it's possible here to have one's cake and eat it too: if we require bytes for all outputs, but provide a pair of decorators in wsgiref.util like the following:

    def encode_body(codec='utf8'):
"""Allow a WSGI app to output its response body as strings w/specified encoding"""
        def decorate(app):
            def encode(response):
                try:
                    for data in response:
                        yield data.encode(codec)
                finally:
                    if hasattr(response, 'close'):
                        response.close()
            def decorated_app(environ, start_response):
                def start(status, response_headers, exc_info=None):
_write = start_response(status, response_headers, exc_info)
                    def write(data):
                        return _write(data.encode(codec))
                    return write
                return encode(app(environ, start))
            return decorated_app
        return decorate

    def encode_headers(codec='latin1'):
"""Allow a WSGI app to output its headers as strings, w/specified encoding"""
        def decorate(app):
            def decorated_app(environ, start_response):
                def start(status, response_headers, exc_info=None):
                    status = status.encode(codec)
                    response_headers = [
(k.encode(codec), v.encode(codec)) for k,v in response_headers
                    ]
                    return start_response(status, response_headers, exc_info)
                return app(environ, start)
            return decorated_app
        return decorate

So, this seems like a win-win to me: relatively-static verification, errors stay in the app (or at least in the decorator), and the API is clean-and-easy. Indeed, it seems likely that at least some apps that don't read wsgi.input themselves could be ported *just* by adding the appropriate decorator(s). And, if your app is using unicode on 2.x, you can even use the same decorators there, for the benefit of 2to3. (Assuming I release an updated standalone wsgiref version with the decorators, of course.)

So, unless somebody has some additional arguments on this one, I think I'm going to stick with bytes output.

_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to