At 09:51 PM 1/4/2011 +1100, Graham Dumpleton wrote:
Add another point. FWIW, these are coming up because of questions
being asked on python-dev IRC channel about PEP 3333.

The issue as it came down to was that the PEP may not be clear enough
in explaining that where str() is unicode and as such something like
PATH_INFO, although unicode, is actually bytes decoded as ISO-8859-1,
needed to be re encoded/decoded to get it back to Unicode in the
charset required before use.

They were thinking that because it was unicode already they could use
it as is and not need to do anything. Ie., didn't realise that need to
do:

  path_info = environ.get('PATH_INFO', '')
  path_info = path_info.encode('ISO-8859-1').decode('UTF-8')

for example to get it interpreted as UTF-8 first. They were simply
looking at concatenating new URL bits to the ISO-8859-1 variant from
other unicode strings that weren't bytes represented as ISO-8859-1.

In Python 2.X it was obvious that since it wasn't unicode that you had
to decode it, but confusion may arise for Python 3.X if this
requirement is not explicitly spelled out with a code example like
above.

We all may see it as obvious and yes perhaps it could be covered in
separate articles or commentaries be people, but given this person was
new to it, maybe it is deserving of more explanation in the PEP itself
if they were confused.

It would be really awesome if somebody would write separate Application Authors' Guide and Middleware Authors' Guides to WSGI. They don't need to know absolutely everything in the PEP, unlike server authors.


It could also be that the PEP covers it adequately already. I am too
tired to read through it again right now.

It's pretty prominently stated early on that NO strings in the spec are really unicode, they're just bytes packed into unicode objects.

Obviously, no matter how prominently this is stated, some people will still make this mistake, but if desired, we could always put some additional info near the environ part of the spec for clarification.

(It occurs to me in retrospect that I should probably have updated wsgiref in the stdlib to check the bytesy-ness of strings used to create Header objects. Too late for 3.2, though.)

_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to