Hi. I'm doing some tests to try to understand how HTTP headers are encoded by browsers.
I have written a simple WSGI application that asks authentication credentials and then print them on the terminal and return the data as response, as raw bytes http://paste.pocoo.org/show/154633/ Then I used some browsers to try to send an username with non ascii characters. When I try with simple characters in the iso-8859-1 charset, things works well; the data is encoded using this charset. However when I try to use some extraneus character, like Euro, there are problems. Firefox (Iceweasel 3.0.14, Linux Debian Squeeze) sends me a '\xac' I don't know where \xac come from, but it is the last byte in the utf-8 encoded Euro: '\xe2\x82\xac' Internet Explorer 6.0 sends me a '\x80' and this this the Euro characted encoded using cp1252 (and I suspect that it always use this encoding, instead of iso-8859-1). Unfortunately I can not test with IE 7 and 8. With a browser working on a terminal, like lynx, things get worse. If I enter as user name the string "àè", lynx sends me '\xc3\xa0\xc3\xa8' This happens in a GNOME terminal, with an it_IT.utf8 locale. wget and curl do the same. Can someone else reproduce this? Thanks Manlio _______________________________________________ Web-SIG mailing list [email protected] Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
