[Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3)

P.J. Eby Thu, 23 Sep 2010 09:36:57 -0700

At 12:57 PM 9/21/2010 -0400, Ian Bicking wrote:

On Tue, Sep 21, 2010 at 12:09 PM, P.J. Eby<<mailto:p...@telecommunity.com>p...@telecommunity.com> wrote:
The Python 3 specific changes are to use:
* ``bytes`` for I/O streams in both directions
* ``str`` for environ keys and values
* ``bytes`` for arguments to start_response() and write()
This is the only thing that seems odd to me -- it seems like theresponse should be symmetric with the request, and the request inthis case uses str for headers (status being header-like), and bytesfor the body.

So, I've given some thought to your suggestion, and, while it's truethat most of the output headers are far less prone to ending up withunintended unicode content, there are at least two output headersthat can include some sort of application content (and can thereforehave random failures): Location and Set-Cookie.

If these headers accidentally contain non-Latin1 characters, theerror isn't detectable until the header reaches the origin serverdoing the transmission encoding, and it'll likely be a dynamic (andtherefore hard-to-debug) error.

However, if the output is always bytes (and this can berelatively-statically verified), then any error can't occur except*inside* the application, where the app's developer can find it more easily.

So I guess the question boils down to: would we rather make sure thatcoding errors happen *inside* applications, or would we rather makeporting WSGI apps trivial (or nearly so)?

But I think that it's possible here to have one's cake and eat ittoo: if we require bytes for all outputs, but provide a pair ofdecorators in wsgiref.util like the following:


    def encode_body(codec='utf8'):

"""Allow a WSGI app to output its response body as stringsw/specified encoding"""

        def decorate(app):
            def encode(response):
                try:
                    for data in response:
                        yield data.encode(codec)
                finally:
                    if hasattr(response, 'close'):
                        response.close()
            def decorated_app(environ, start_response):
                def start(status, response_headers, exc_info=None):

_write = start_response(status,response_headers, exc_info)

                    def write(data):
                        return _write(data.encode(codec))
                    return write
                return encode(app(environ, start))
            return decorated_app
        return decorate

    def encode_headers(codec='latin1'):

"""Allow a WSGI app to output its headers as strings,w/specified encoding"""

        def decorate(app):
            def decorated_app(environ, start_response):
                def start(status, response_headers, exc_info=None):
                    status = status.encode(codec)
                    response_headers = [

(k.encode(codec), v.encode(codec)) for k,vin response_headers

                    ]
                    return start_response(status, response_headers, exc_info)
                return app(environ, start)
            return decorated_app
        return decorate

So, this seems like a win-win to me: relatively-static verification,errors stay in the app (or at least in the decorator), and the API isclean-and-easy. Indeed, it seems likely that at least some apps thatdon't read wsgi.input themselves could be ported *just* by adding theappropriate decorator(s). And, if your app is using unicode on 2.x,you can even use the same decorators there, for the benefit of2to3. (Assuming I release an updated standalone wsgiref version withthe decorators, of course.)

So, unless somebody has some additional arguments on this one, Ithink I'm going to stick with bytes output.


_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

[Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3)

Reply via email to