Mirko Solic wrote:
On Thu, 2010-01-21 at 11:30 +0100, André Warnier wrote:

Mirko,
just for info : there is a related other thread taking place at the same time, entitled "Basic Authentication Failed with multibyte username".

Basically, I am interested in those topics because I encounter them myself often in our own web applications.
I don't know all the answers, but I know that it is confusing.

As far as I can interpret :

According to the HTTP 1.1 RFC 2616, HTTP header fields MAY contain *TEXT portions representing character sets other than US-ASCII. But then, such header field values MUST be encoded according to the rules of RFC 2047. RFC 2047 in turn, in "2. Syntax of encoded-words ", indicates that this should be done using the form :
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
for example :

Header-name: =?iso-8859-1?B?some iso-8859-1 text, base-64 encoded?=
or
Header-name: =?utf-8?B?some unicode/utf-8 text, base-64 encoded?=
(I am not quite sure here of the "utf-8" part as the correct name for the charset.)

Now, I am not sure that if you pass a HTTP header, encoded as above, from Apache to Tomcat, the Tomcat getHeader() call will properly decode it, using the indicated charset.

If not, you will have to do the decoding yourself, if you want to pass non-ascii (or non-iso-8859-1) characters in those headers. Admittedly, it is a pain; but there are still quite a few grey areas like that in the WWW-related RFCs in what concerns character sets. If you have to do this kind of encoding/decoding, I suggest to have a look in MIME (email) libraries. Such kind of encoding/decoding is regularly used in email headers. Save the original text (.eml) format of an email, with a non-ascii subject line, for an example.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to