Re: [Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO

Massimo Di Pierro Tue, 22 Sep 2009 20:08:18 -0700

Hello Ian,

I really like your proposal.


Massimo


On Sep 22, 2009, at 9:22 PM, Ian Bicking wrote:

OK, I mentioned this in the last thread, but... I can't keep up withall this discussion, and I bet you can't either.
So, here's a rough proposal for WSGI and unicode:
I propose we switch primarily to "native" strings: str on bothPython 2 and 3.
Specifically:

environ keys: native
environ CGI values: native
wsgi.* (that is text): native
response status: native
response headers: native

wsgi.input remains byte-oriented, as does the response app_iter.
I then propose that we eliminate SCRIPT_NAME and PATH_INFO. Insteadwe have:
wsgi.script_name
wsgi.path_info (I'm not entirely set on these names)
These both form the original path. It is not URL decoded, so itshould be ASCII. (I believe non-ASCII could be rejected by theserver, with Bad Request? A server could also choose to treat it asUTF8 or Latin1 and encode unsafe characters to make it ASCII) Thusto re-form the URL, you do:
environ['wsgi.url_scheme'] + '://' + environ['HTTP_HOST'] +environ['wsgi.script_name'] + environ['wsgi.path_info'] + '?' +environ['QUERY_STRING']
All incoming headers will be treated as Latin1. If an applicationsuspects another encoding, it is up to the application to transcodethe header into another encoding. The transcoded value should notbe put into the environ. In most cases headers should be ASCII, andLatin1 is simply a fallback that allows all bytes to be representedin both Python 2 and 3.
Similarly all outgoing headers will be Latin1. Thus if you (againstgood sense) decide to put UTF8 into a cookie, you can do:
headers.append(('Set-Cookie',unicode_text.encode('UTF8').decode('latin1')))
The server will then decode the text as latin1, sending the UTF8bytes. This is lame, but non-ASCII in headers is lame. It would bepreferable to do:
headers.append(('Set-Cookie',urllib.quote(unicode_text.encode('UTF8'))))
This sends different text, but is highly preferable. If you wantedto parse a cookie that was set as UTF8, you'd do:
parse_cookie(environ['HTTP_COOKIE'].encode('latin1').decode('utf8'))

Again, it would be better to do;

parse_cookie(urllib.unquote(environ['HTTP_COOKIE']).decode('utf8'))
Other variables like environ['wsgi.url_scheme'],environ['CONTENT_TYPE'], etc, will be native strings. A Python 3hello work app will then look like:
def hello_world(environ):
return ('200 OK', [('Content-type', 'text/html; charset=utf8')],['Hello World!'.encode('utf8')])
start_response and changes to wsgi.input are incidental to what I'mproposing here (except that wsgi.input will be bytes); we can decideabout themseparately.
Outstanding issues:
Well, the biggie: is it right to use native strings for the environvalues, and response status/headers? Specifically, tricks like thelatin1 transcoding won't work in Python 2, but will in Python 3. Isthis weird? Or just something you have to think about when usingthe two Python versions?
What happens if you give unicode text in the response headers thatcannot be encoded as Latin1?
Should some things specifically be ASCII?  E.g., status.

Should some things be unicode on Python 2?

Is there a common case here that would be inefficient?



--
Ian Bicking  |  http://blog.ianbicking.org  |  http://topplabs.org/civichacker
<ATT00001..txt>

_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Re: [Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO

Reply via email to