Re: [Web-SIG] resources for porting wsgi apps from python 2 to 3

And Clover Tue, 02 Oct 2012 05:57:31 -0700

On 01/10/12 18:07, chris.d...@gmail.com wrote:
>     * Use bytes or str for environ keys?
>     * Use bytes or str for environ values?


str, decoded from the request bytes using ISO-8859-1.

>       * Are all environ values created equal or would, for example,
>         QUERY_STRING's value (prior to any parameter to decoding)
>         be handled differently from HTTP_COOKIE

All environ values are created equal (other than the CGI-mandated odddecoding behaviour of SCRIPT_NAME and PATH_INFO).


>       * If str, I see that ISO-8859-1 is the assumed encoding. How much
>         hurt occurs in the world if I just assume utf-8 when decoding to
>         str[4]?

Immediately, all non-ASCII characters in the path would be interpretedincorrectly.

The more general hurt to the world would be that we would continue thesad pre-PEP3333 situation where every web server handles non-ASCIIcharacters differently, and so no WSGI application can reliably useUnicode in path segments.

There is little impact to any header other than the path, becausenon-ASCII characters almost never appear in them. The query stringremains %-encoded so any non-ASCII characters are safe. The other placesusers can put non-ASCII characters are in cookies and HTTP BasicAuthorisation headers, but browser support here is so variable/brokenthat Python's handling would be the least of your worries.


> [4] Which is what it should have been all along?

Not necessarily. Even if you decide that all web apps must use UTF-8 fortext encoding, it's valid to have URL-encoded, non-text binary data in apath segment. This would be unrecoverable using straight UTF-8.

(They would be recoverable if surrogateescape were used, but PEP 3333has to encompass language versions that don't have surrogateescape, andalso it's questionable whether it should be possible to smugglenon-UTF-8 data into strings that applications assume are safe.)

Plus header values are less likely to be UTF-8, and HTTP specifies thatthey're ISO-8859-1 (even if that is not well-observed by browsers).

Ideally, the interfaces should all be bytes, because HTTP is defined interms of bytes. But that plays poorly with Python 3's default Unicodestrs (for environ et al). So ISO-8859-1 was chosen as a str interfacefor which the original bytes can at least be recovered.


>     * Should start_response only accept bytes (and error if not), or
>       should it also accept str and encode appropriately?

status and response_headers are, like the request headers, native str(to be ISO-8859-1 encoded). It's only the HTTP entity body that isalways bytestring.


>     * Should the returned iterable be rejected or encoded if not bytes?

I don't think it's specified by the PEP, but wsgiref looks like it'llchuck TypeError when it tries to write str to the buffer/socket.


cheers,

--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/
gtalk:chat?jid=bobi...@gmail.com
_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Re: [Web-SIG] resources for porting wsgi apps from python 2 to 3

Reply via email to