On Thu, Sep 23, 2010 at 11:06 AM, P.J. Eby <p...@telecommunity.com> wrote:
> At 12:57 PM 9/21/2010 -0400, Ian Bicking wrote: > >> On Tue, Sep 21, 2010 at 12:09 PM, P.J. Eby <<mailto:p...@telecommunity.com >> >p...@telecommunity.com> wrote: >> The Python 3 specific changes are to use: >> >> * ``bytes`` for I/O streams in both directions >> * ``str`` for environ keys and values >> * ``bytes`` for arguments to start_response() and write() >> >> >> This is the only thing that seems odd to me -- it seems like the response >> should be symmetric with the request, and the request in this case uses str >> for headers (status being header-like), and bytes for the body. >> > > So, I've given some thought to your suggestion, and, while it's true that > most of the output headers are far less prone to ending up with unintended > unicode content, there are at least two output headers that can include some > sort of application content (and can therefore have random failures): > Location and Set-Cookie. > > If these headers accidentally contain non-Latin1 characters, the error > isn't detectable until the header reaches the origin server doing the > transmission encoding, and it'll likely be a dynamic (and therefore > hard-to-debug) error. > I don't see any reason why Location shouldn't be ASCII. Any header could have any character put in it, of course, there's just no valid case where Location shouldn't be a URL, and URLs are ASCII. Cookie can contain weirdness, yes. I would expect any library that abstracts cookies to handle this (it's certainly doable)... otherwise, this seems like one among many ways a person can do the wrong thing. This can also be detected with the validator, which doesn't avoid runtime errors, but bytes allow runtime errors too -- they will just happen somewhere else (e.g., when a value is converted to bytes in an application or library). If servers print the invalid value on error (instead of just some generic error) I don't think it would be that hard to track down problems. This requires some explicit effort on the part of the server (most servers handle app_iter==None ungracefully, which is a similar problem). However, if the output is always bytes (and this can be > relatively-statically verified), then any error can't occur except *inside* > the application, where the app's developer can find it more easily. > > So I guess the question boils down to: would we rather make sure that > coding errors happen *inside* applications, or would we rather make porting > WSGI apps trivial (or nearly so)? > > But I think that it's possible here to have one's cake and eat it too: if > we require bytes for all outputs, but provide a pair of decorators in > wsgiref.util like the following: > > def encode_body(codec='utf8'): > """Allow a WSGI app to output its response body as strings > w/specified encoding""" > def decorate(app): > def encode(response): > try: > for data in response: > yield data.encode(codec) > finally: > if hasattr(response, 'close'): > response.close() > def decorated_app(environ, start_response): > def start(status, response_headers, exc_info=None): > _write = start_response(status, response_headers, > exc_info) > def write(data): > return _write(data.encode(codec)) > return write > return encode(app(environ, start)) > return decorated_app > return decorate > > def encode_headers(codec='latin1'): > """Allow a WSGI app to output its headers as strings, w/specified > encoding""" > def decorate(app): > def decorated_app(environ, start_response): > def start(status, response_headers, exc_info=None): > status = status.encode(codec) > response_headers = [ > (k.encode(codec), v.encode(codec)) for k,v in > response_headers > ] > return start_response(status, response_headers, > exc_info) > return app(environ, start) > return decorated_app > return decorate > > So, this seems like a win-win to me: relatively-static verification, errors > stay in the app (or at least in the decorator), and the API is > clean-and-easy. Indeed, it seems likely that at least some apps that don't > read wsgi.input themselves could be ported *just* by adding the appropriate > decorator(s). And, if your app is using unicode on 2.x, you can even use > the same decorators there, for the benefit of 2to3. (Assuming I release an > updated standalone wsgiref version with the decorators, of course.) > This doesn't seem that different than the validator, except that the decorator uses a different interface internally and externally (the internal interface using text, the external one bytes). -- Ian Bicking | http://blog.ianbicking.org
_______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com