Re: [Web-SIG] WSGI Amendments thoughts: the horror of charsets

2008-11-18 Thread Andrew Clover
ctypes.windll.kernel32.GetEnvironmentVariableW(u'PATH_INFO', ...) Hmm... it turns out: no. IIS appears to be mangling characters that are not in mbcs even *before* it puts the decoded value into the envvars. The same is true with isapi_wsgi, which is the only other WSGI adapter I know of

Re: [Web-SIG] WSGI Amendments thoughts: the horror of charsets

2008-11-17 Thread Andrew Clover
Mark Hammond wrote: I don't think Python explicitly converts it - the CRT's ANSI version of environ is used Yes, it would be the CRT on Python 2.x. (Python 3.0 on non-NT does a conversion always using UTF-8, if I'm reading convertenviron right.) so the resulting strings should be encoded

Re: [Web-SIG] WSGI Amendments thoughts: the horror of charsets

2008-11-16 Thread Mark Hammond
Python decodes the environ to its own copy (wrapped in os.environ) at interpreter startup time; I don't think Python explicitly converts it - the CRT's ANSI version of environ is used, so the resulting strings should be encoded using the 'mbcs' encoding. What mangling do you see? there's

Re: [Web-SIG] WSGI Amendments thoughts: the horror of charsets

2008-11-14 Thread Andrew Clover
Ian Bicking wrote: As it is (in Python 2), you should do something like environ['PATH_INFO'].decode('utf8') and it should work. See the test cases in my original post: this doesn't work universally. On WinNT platforms PATH_INFO has already gone through a decode/encode cycle which almost

Re: [Web-SIG] WSGI Amendments thoughts: the horror of charsets

2008-11-14 Thread Ian Bicking
Andrew Clover wrote: Ian Bicking wrote: As it is (in Python 2), you should do something like environ['PATH_INFO'].decode('utf8') and it should work. See the test cases in my original post: this doesn't work universally. On WinNT platforms PATH_INFO has already gone through a decode/encode

Re: [Web-SIG] WSGI Amendments thoughts: the horror of charsets

2008-11-14 Thread Andrew Clover
Ian Bicking wrote: This is something messed up with CGI on NT, and whatever server you are using, and perhaps the CGI adapter (maybe there's a way to get the raw environment without any encoding, for example?) Python decodes the environ to its own copy (wrapped in os.environ) at interpreter

[Web-SIG] WSGI Amendments thoughts: the horror of charsets

2008-11-12 Thread Andrew Clover
It would be lovely if we could allow WSGI applications to reliably accept Unicode paths. That is to say, allow WSGI apps to have beautiful URLs like Wikipedia's, without requiring URL-rewriting magic. (Which is so highly server-specific, potentially unavailable to non-admin webmasters, and

Re: [Web-SIG] WSGI Amendments thoughts: the horror of charsets

2008-11-12 Thread Ian Bicking
Andrew Clover wrote: If we could reliably read the bytes the browser sends to us in the GET request that would be great, we could just decode those and be done with it. Unfortunately, that's not reliable, because: 1. thanks to an old wart in the CGI specification, %XX hex escapes are decoded

Re: [Web-SIG] WSGI Amendments thoughts: the horror of charsets

2008-11-12 Thread Graham Dumpleton
FWIW, there was a past discussion on these issues on mod_wsgi list. I can't really remember what the outcome of the discussion was. The discussion is at: http://groups.google.com/group/modwsgi/browse_frm/thread/2471a1a71620629f Graham 2008/11/13 Andrew Clover [EMAIL PROTECTED]: It would be