René Dudfield wrote:
> On Mon, Sep 21, 2009 at 6:05 PM, Robert Brewer <fuman...@aminus.org>
> wrote:
> > Armin Ronacher wrote:
> >> WSGI will demand UTF-8 URLs and only
> >> provide iso-XXX support for backwards compatibility.
> >
> > WSGI cannot demand that; a recommendation for utf-8 in a few draft
> > specifications is at least a decade removed from ubiquitous
> > implementation. We can default to utf-8 at best. I discussed this at
> > length in
> > http://mail.python.org/pipermail/web-sig/2009-August/003948.html
> >
> 
> that post does have good arguments why "a single encoding is not
> acceptable".  utf-8 seems the most common at this point to be the
> default... but we do need a way to specify encoding.
> 
> Is that what you're saying Robert?  Do you have a suggestion for
> specifying encodings?

CherryPy 3.2 does this (pseudocode):

    try:
        decode_uri(userdefault or 'utf-8')
    except UnicodeDecodeError:
        decode_uri('iso-8859-1')

> I think surrogateescape will handle the issues with allowing bytes to
> be stored in utf-8.
>     http://www.python.org/dev/peps/pep-0383/
> 
> However, I think that is only implemented in python 3.1?... but maybe
> there is someway to have it work on other pythons too?

As Henry Prêcheur says, "that's not an issue if the 'new' WSGI sticks to native 
strings." Which I'd be happy with.

> How about...
> 
> Being able to request which encoding you want has the benefit of only
> having to store one representation before 'baking' the result into the
> environ.  So if someone only ever wants utf-8 they can get it...
> however if they choose to 'bake' the environ then they can request
> something else.  This is similar to a per server setting, but I think
> should work with middleware too?

As noted above, it *is* a per-server setting in CherryPy 3.2. And any 
middleware can certainly be configured as its authors see fit; I don't see a 
need for a generic mechanism to specify what encodings middleware should try. 
However, we still need a generic mechanism declaring which encoding was 
successfully used; this is 'wsgi.uri_encoding'.

> As multiple things should be
> available, and if baked middleware (if it wants to modify things, will
> need to change each version of things).
> 
> These 'baking' methods could live in wsgi to simplify modifying the
> environs multiple versions of things. It would just have some get/set
> functions to put correct handling of encodings in one place.  Of
> course middleware is still free to change things as it wants.

I still don't see why the environ should have multiple versions of anything. 
It's not as if the HTTP request gives us multiple Request-URI's. There's a 
single processing step that has to happen somewhere: decoding the bytes of the 
Request-URI to unicode. For the vast majority of apps, it should only happen 
once. Twice is acceptable to me for some apps. As I pointed out in the linked 
email, doing that as soon as possible (i.e. in the WSGI origin server) allows 
URI's to be compared as character strings more easily. If you deploy a piece of 
middleware that transcodes (based on more information than servers want to deal 
with), it had better be nearly first in the stack so routing works reliably.


Robert Brewer
fuman...@aminus.org


_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to