And Clover wrote: > > A middleware might re-decode the values if the `wsgi.uri_encoding` > > is `iso-8859-1` and only then. > > Seems like a mistake. If the middleware knows iso-8859-7 is in use, it > would need to transcode the charset regardless of whether the > initially-submitted bytes were a valid UTF-8 sequence or not. Otherwise > the application would break when fed with eg. Greek words that happened > to encode to valid UTF-8 bytes.
If the entire site expects iso-8859-7 Request-URL's then the deployer should tell the WSGI server to decode using iso-8859-7 instead of utf-8. If only part of the site expects iso-8859-7 then...yeah, it needs to transcode. So what? > > The application MUST use this value to decode the ``'QUERY_STRING'`` > > as well. > > This will break all use of non-UTF-8 encodings in QUERY_STRING, where > the path part of the URL does not contain non-UTF-8 sequences. That > includes the very common case where the path part contains only ASCII. > > http://greek.example.com/myscript.cgi?x=%C2 > > will fail, as the given UTF-8 sniffer only looks at the path part to > determine what encoding to use for both of the path part and the query > string. No, it won't fail. WSGI servers do not perform %-decoding of the QUERY_STRING. In the example given, a WSGI 1.1 server will set the Python 3 environ values: {'SCRIPT_NAME': '', 'PATH_INFO': 'myscript.cgi', 'QUERY_STRING': 'x=%C2'} Robert Brewer fuman...@aminus.org _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com