Re: [Web-SIG] Draft PEP: WSGI 1.1

And Clover Thu, 15 Apr 2010 08:34:20 -0700

Dirkjan Ochtman wrote:

1. The application is passed an instance of a Python dictionary
   containing what is referred to as the WSGI environment. All keys
   in this dictionary are native strings. For CGI variables, all names
   are going to be ISO-8859-1 and so where native strings are
   unicode strings, that encoding is used for the names of CGI
   variables.


Perhaps explain where those ISO-8859-1 bytes might come from:

    ...are native strings. Where native strings are Unicode, any
    keys derived from byte-oriented sources (such as custom headers
    in the HTTP request reflected in the CGI environment variables)
    should be decoded using the ISO-8859-1 encoding.

3. For the CGI variables contained in the WSGI environment, the values
   of the variables are native strings. Where native strings are
   unicode strings, ISO-8859-1 encoding would be used such that the
   original character data is preserved and as necessary the unicode
   string can be converted back to bytes and thence decoded to unicode
   again using a different encoding.

Good. The only problem that remains with this is that in certainenvironments (notably: all IIS use, not just CGI) a WSGI gateway cannotfully comply with this requirement.

a. disallow environments that cannot be sure they are preserving theoriginal byte data from declaring that they support wsgi.version 1.1?

b. add an extra wsgi.something flag for a WSGI server to add, to specifythat it is sure that the original bytes have been preserved? (ie. sowsgiref's CGI handler would have to declare it wasn't sure when runningunder Windows.)

c. just let WSGI gateways silently ignore the ISO-8859-1 requirement ifthey can't honour it and let the application spend its time trying tounravel the mess (status quo).


(Can wsgiref be fixed to use ISO-8859-1 in time for Python 3.2?)

7. The iterable returned by the application and from which response
   content is derived, should yield byte strings. Where native strings
   are unicode strings, the native string type can also be returned in
   which case it would be encoded as ISO-8859-1.

8. The value passed to the 'write()' callback returned by
   'start_response()' should be a byte string. Where native strings
   are unicode strings, a native string type can also be supplied, in
   which case it would be encoded as ISO-8859-1.

Weren't we going to only allow US-ASCII for the output? (These threadsare always so far apart I can never remember what conclusion wereached... if any.)

Whilst ISO-8859-1 is in the HTTP standard for headers, and required topreserve bytes in input, it's much, much less likely that the responsebody is going to be ISO-8859-1. It could maybe be cp1252, but morelikely the author wanted UTF-8.

If we must support Unicode strings for response body output at all, I'dprefer to be conservative here and spit a UnicodeEncodeError straightaway, rather than quietly mangle characters U+0080 to U+00FF.


Manlio Perillo wrote:

The run_with_cgi sample function should be changed, since it probably
does not work correctly, on Python 3.x.

Yes, the 'URL Reconstruction' fragment will be wrong too, since it usesurllib.quote() to encode the path part. quote() defaults to UTF-8 ratherthan the ISO-8859-1 WSGI 1.1 requires.


--
And Clover
mailto:[email protected]
http://www.doxdesk.com/

_______________________________________________
Web-SIG mailing list
[email protected]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Re: [Web-SIG] Draft PEP: WSGI 1.1

Reply via email to