On Mon, Sep 21, 2009 at 6:16 PM, Graham Dumpleton < graham.dumple...@gmail.com> wrote:
> > Of course you can directly use `environ['some_key']` if you know you'll > > get the 'right' encoding all the time. But when the encoding changes, > > you'll have to fix all your middlewares. > > > > > > I am missing something? > > For one, we aren't talking about arbitrary keys needing this treatment. > > We are only talking about SCRIPT_NAME and PATH_INFO. > OK, another proposal entirely: we kill SCRIPT_NAME and PATH_INFO, and introduce two equivalent variables that hold the NOT url-decoded values. So if you request /fran%e7cois then environ['PATH_INFO_RAW'] is '/fran%e7cois'. This will be quite disruptive, as these are variables that are frequently accessed directly (libraries that expose them as attributes can just turn them into properties that do URL decoding, using UTF8). But it's an easy fix at least. I would actually want to specify that if we added this key, we should disallow the old keys -- terrible confusion could ensue from both in the environ. This also fixes the problem with not being able to distinguish %2F from /, which isn't a big problem but is annoying, and is hiding meaningful information. (I believe the relevant spec does distinguish between these two values -- i.e., ideally decoding should happen on path segments, each segment separated by a real /.) If we do that, then the only really tricky thing left is HTTP_COOKIE, and since the Cookie header is a mess then HTTP_COOKIE will be a mess and we just have to figure out a hacky way to deal with that. Maybe surrogateescape, but probably just Latin1 would be fine (and easy to do in Python 2). -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker
_______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com