Hi, Graham Dumpleton schrieb: > So, no strict need to make the WSGI adapter do it differently. You may > want to only do that if concerned about overhead of transcoding. > > Transcoding just these is most probably going to be less overhead than > the WSGI adapter having to set up both unicode and raw values in a > dictionary for everything. So if I understand you correctly the wsgi.uri_encoding would be used *only* as a information what the URI encoding was, the application however should use the internal encoding it wants? That sounds right, but then let's make that should a MUST.
Your query_string example is flawed as the query string is always quoted and encoding/decoding an ASCII only string will not change much if the encoding is a superset of ASCII which is required anyways for various reasons. I would go with this wording for the spec then: wsgi.uri_encoding holds the encoding of the URI that was used to decode the SCRIPT_NAME and PATH_INFO. If the application decodes the query string it MUST obey the encoding here. If REQUEST_URI is available, the server will use the URI encoding to decode this value as well. However for encoding of URIs it MUST not use the wsgi.uri_encoding information but MUST use UTF-8 to encode the URI. Backwards compatibility for URIs: If the application depends on non UTF-8 URIs and the fallback encoding is NOT latin1 the application will have to check the wsgi.uri_encoding for latin1 and if it detects it, it has to encode back to latin1 and decode from the fallback encoding (eg: iso-8859-7). WSGI 2.0 however requires the application to use UTF-8 for generated URIs. I checked the browser implementations now and for arbitrary URIs (not generated URIs in a page) the browser will always try UTF-8. RFC 3987 also recommends UTF-8 for URIs. > Even with your iso-8859-4 example, can't see how you can without > knowing loose what original characters are, as wsgi.uri_encoding being > provided always allows you to transcode to what you needed it to be > when what was supplied didn't match. Assuming the only possible values for wsgi.uri_encoding are latin1/iso-8859-1 and utf-8 when the application is invoked, I'm totally fine with that. Because if the application's fallback URI encoding is something like iso-8859-4, the application can itself check for latin1 and reencode the data. I could live with that. What I don't want to see in WSGI is that the fallback encoding (latin1) could be changed in the server configuration. > Now you can go back to monologue, as definitely sleeping now. ;-) \o/ Regards, Armin _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com