
I'm doing some tests to try to understand how HTTP headers are encoded
by browsers.

I have written a simple WSGI application that asks authentication
credentials and then print them on the terminal and return the data as
response, as raw bytes

Then I used some browsers to try to send an username with non ascii

When I try with simple characters in the iso-8859-1 charset, things
works well; the data is encoded using this charset.

However when I try to use some extraneus character, like Euro, there are

Firefox (Iceweasel 3.0.14, Linux Debian Squeeze) sends me a

I don't know where \xac come from, but it is the last byte in the utf-8
encoded Euro: '\xe2\x82\xac'

Internet Explorer 6.0 sends me a
and this this the Euro characted encoded using cp1252 (and I suspect
that it always use this encoding, instead of iso-8859-1).

Unfortunately I can not test with IE 7 and 8.

With a browser working on a terminal, like lynx, things get worse.
If I enter as user name the string "àè", lynx sends me

This happens in a GNOME terminal, with an it_IT.utf8 locale.

wget and curl do the same.

Can someone else reproduce this?

Thanks   Manlio
Web-SIG mailing list
Web SIG: http://www.python.org/sigs/web-sig

Reply via email to