Manlio Perillo wrote:

However what about URI (that is, for PATH_INFO and the like)?
For URI (if I remember correctly) the suggested encoding is UTF-8, so
URLS should be decoded using

  url.decode('utf-8', 'surrogateescape')

Is this correct?

The currently-discussed proposal is ISO-8859-1, allowing the real bytes to be trivially extracted. This is consistent with the other headers and would be my preferred approach.

Python 3.1's wsgiref.simple_server, on the other hand, blindly uses urllib.unquote, which defaults to UTF-8 without surrogateescape, mangling any non-UTF-8 input.

I don't really care whether UTF-8+surrogateescape or ISO-8859-1 encoding is blessed. But *something* needs to be blessed. An encoding, an alternative undecoded path_info, both, something else... just *something*.

Let's consider the `wsgiref.util.application_uri` function
There is a potential problem, here, with the quote function.

Yes. wsgiref is broken in Python 3.1. Not quite as broken as it was in 3.0, but still broken. Until we can come to a Pronouncement on what WSGI *is* in Python 3, it is meaningless anyway.

Cookie data SHOULD be transparent to the server/gateway; however WSGI is
going to assume that data is encoded in latin-1.

Yeah. This is no big deal because non-ASCII characters in cookies are already broken everywhere(*). Given this and other limitations on what characters can go in cookies, they are habitually encoded using ad-hoc mechanisms handled by the application (typically a round of URL-encoding).

*: in particular:

- Opera and Chrome send non-ASCII cookie characters in UTF-8.
- IE encodes using the system codepage (which can never be UTF-8),
  mangling any characters that don't fit in the codepage through the
  traditional Windows 'similar replacement character' scheme.
- Mozilla uses the low byte of each UTF-16 code point (so ISO-8859-1
  gets through but everything else is mangled)
- Safari refuses to send any cookie containing non-ASCII characters.

I don't know what the HTTP/Cookie spec says about this.

The traditional interpretation of RFC2616 is that headers are ISO-8859-1.

You will notice that no browser correctly follows this.

...sigh.

--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/

_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to