Dirkjan Ochtman wrote:
1. The application is passed an instance of a Python dictionary
containing what is referred to as the WSGI environment. All keys
in this dictionary are native strings. For CGI variables, all names
are going to be ISO-8859-1 and so where native strings are
unicode strings, that encoding is used for the names of CGI
variables.
Perhaps explain where those ISO-8859-1 bytes might come from:
...are native strings. Where native strings are Unicode, any
keys derived from byte-oriented sources (such as custom headers
in the HTTP request reflected in the CGI environment variables)
should be decoded using the ISO-8859-1 encoding.
3. For the CGI variables contained in the WSGI environment, the values
of the variables are native strings. Where native strings are
unicode strings, ISO-8859-1 encoding would be used such that the
original character data is preserved and as necessary the unicode
string can be converted back to bytes and thence decoded to unicode
again using a different encoding.
Good. The only problem that remains with this is that in certain
environments (notably: all IIS use, not just CGI) a WSGI gateway cannot
fully comply with this requirement.
a. disallow environments that cannot be sure they are preserving the
original byte data from declaring that they support wsgi.version 1.1?
b. add an extra wsgi.something flag for a WSGI server to add, to specify
that it is sure that the original bytes have been preserved? (ie. so
wsgiref's CGI handler would have to declare it wasn't sure when running
under Windows.)
c. just let WSGI gateways silently ignore the ISO-8859-1 requirement if
they can't honour it and let the application spend its time trying to
unravel the mess (status quo).
(Can wsgiref be fixed to use ISO-8859-1 in time for Python 3.2?)
7. The iterable returned by the application and from which response
content is derived, should yield byte strings. Where native strings
are unicode strings, the native string type can also be returned in
which case it would be encoded as ISO-8859-1.
8. The value passed to the 'write()' callback returned by
'start_response()' should be a byte string. Where native strings
are unicode strings, a native string type can also be supplied, in
which case it would be encoded as ISO-8859-1.
Weren't we going to only allow US-ASCII for the output? (These threads
are always so far apart I can never remember what conclusion we
reached... if any.)
Whilst ISO-8859-1 is in the HTTP standard for headers, and required to
preserve bytes in input, it's much, much less likely that the response
body is going to be ISO-8859-1. It could maybe be cp1252, but more
likely the author wanted UTF-8.
If we must support Unicode strings for response body output at all, I'd
prefer to be conservative here and spit a UnicodeEncodeError straight
away, rather than quietly mangle characters U+0080 to U+00FF.
Manlio Perillo wrote:
The run_with_cgi sample function should be changed, since it probably
does not work correctly, on Python 3.x.
Yes, the 'URL Reconstruction' fragment will be wrong too, since it uses
urllib.quote() to encode the path part. quote() defaults to UTF-8 rather
than the ISO-8859-1 WSGI 1.1 requires.
--
And Clover
mailto:[email protected]
http://www.doxdesk.com/
_______________________________________________
Web-SIG mailing list
[email protected]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe:
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com