On Aug 4, 2009, at 8:53 PM, Graham Dumpleton wrote:
2. How would use of bytes work for a CGI-WSGI bridge given that
os.environ is not bytes? Where does one get what encoding was used for
os.environ values so it can be converted back to bytes?
On Unix it's simple enough:
On py2.X on Unix: environ is bytes already.
On py3.0: you're screwed, because some env vars were discarded already.
On py3.1+: 'string'.encode(sys.getfilesystemencoding(),
'surrogateescape') should do it.
On Windows, I guess the OS environment is unicode, so, I don't know
precisely what to do to reversibly obtain the bytes sent from the end-
users's browser. It looks to me from source code as if Apache will
encode the bytes from the client (utf-8 or otherwise!) as the Unicode
values 0x00 to 0xFF in the windows environment, that is, as if
decoding the client input in latin-1. But it does that for the
following keys only:
HTTP_*
SERVER_*
REQUEST_*
QUERY_STRING
PATH_INFO
PATH_TRANSLATED
(from
http://svn.apache.org/repos/asf/httpd/httpd/trunk/modules/arch/win32/mod_win32.c)
Other values are decoded from utf-8 (or, if passed through from an
enclosing environment, passed through untouched -- via encoding into
utf-8 for internal use and then decoding back from utf-8 to put back
in the Windows environment.)
I'll note that while it's important to get this transformation correct
for a CGI->WSGI bridge to work right in Windows, and thus is
definitely a useful discussion to have here, it doesn't actually need
to be part of the WSGI spec.
James
_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe:
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com