On Aug 4, 2009, at 8:53 PM, Graham Dumpleton wrote:

2. How would use of bytes work for a CGI-WSGI bridge given that
os.environ is not bytes? Where does one get what encoding was used for
os.environ values so it can be converted back to bytes?

On Unix it's simple enough:
On py2.X on Unix: environ is bytes already.
On py3.0: you're screwed, because some env vars were discarded already.
On py3.1+: 'string'.encode(sys.getfilesystemencoding(), 'surrogateescape') should do it.

On Windows, I guess the OS environment is unicode, so, I don't know precisely what to do to reversibly obtain the bytes sent from the end- users's browser. It looks to me from source code as if Apache will encode the bytes from the client (utf-8 or otherwise!) as the Unicode values 0x00 to 0xFF in the windows environment, that is, as if decoding the client input in latin-1. But it does that for the following keys only:
HTTP_*
SERVER_*
REQUEST_*
QUERY_STRING
PATH_INFO
PATH_TRANSLATED
(from 
http://svn.apache.org/repos/asf/httpd/httpd/trunk/modules/arch/win32/mod_win32.c)

Other values are decoded from utf-8 (or, if passed through from an enclosing environment, passed through untouched -- via encoding into utf-8 for internal use and then decoding back from utf-8 to put back in the Windows environment.)

I'll note that while it's important to get this transformation correct for a CGI->WSGI bridge to work right in Windows, and thus is definitely a useful discussion to have here, it doesn't actually need to be part of the WSGI spec.

James
_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to