Mirko Solic wrote:
On Thu, 2010-01-21 at 11:30 +0100, André Warnier wrote:
Mirko,
just for info : there is a related other thread taking place at the same
time, entitled "Basic Authentication Failed with multibyte username".
Basically, I am interested in those topics because I encounter them
myself often in our own web applications.
I don't know all the answers, but I know that it is confusing.
As far as I can interpret :
According to the HTTP 1.1 RFC 2616, HTTP header fields MAY contain *TEXT
portions representing character sets other than US-ASCII.
But then, such header field values MUST be encoded according to the
rules of RFC 2047.
RFC 2047 in turn, in "2. Syntax of encoded-words ", indicates that this
should be done using the form :
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
for example :
Header-name: =?iso-8859-1?B?some iso-8859-1 text, base-64 encoded?=
or
Header-name: =?utf-8?B?some unicode/utf-8 text, base-64 encoded?=
(I am not quite sure here of the "utf-8" part as the correct name for
the charset.)
Now, I am not sure that if you pass a HTTP header, encoded as above,
from Apache to Tomcat, the Tomcat getHeader() call will properly decode
it, using the indicated charset.
If not, you will have to do the decoding yourself, if you want to pass
non-ascii (or non-iso-8859-1) characters in those headers.
Admittedly, it is a pain; but there are still quite a few grey areas
like that in the WWW-related RFCs in what concerns character sets.
If you have to do this kind of encoding/decoding, I suggest to have a
look in MIME (email) libraries. Such kind of encoding/decoding is
regularly used in email headers. Save the original text (.eml) format
of an email, with a non-ascii subject line, for an example.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org