Hi everybody,

I am having an issue where Unicode characters (e.g. Ž and & #105;) are
passed by the Apache Webserver 2.4 to Tomcat as UTF-8 encoded bytes while
Tomcat seems to evaluate them as ISO-8859-15 encoded.

Having taken a network trace with TCPDUMP I see the following bytes for my
header field (truncated the output after byte ‘72’):

0200   0a 48 54 54 50 5f 56 6f 6f 72 6e 61 61 6d 3a 20   .HTTP_Voornaam:

0210   4d 61 c5 82 67 6f 72

Here the bytes C582 is the UTF-8 encoded value for the Unicode character

Now when inspecting the header value in Tomcat using:

                String headerValue = request.getHeader("HTTP_Voornaam");

I’m getting the value ‘MaÅ.gor’ which seems to be using the ISO-8859-15
repesentation for the bytes C582. The byte string from the TCPDUMP seems to
match the result of  headerValue.getBytes(Charset.forName("ISO-8859-15"))
and not the result of headerValue.getBytes(Charset.forName("UTF-8")).

The FAQ ( indicates
that ‘headers are always in US-ASCII encoding. Anything outside of that
needs to be encoded’, in this case it seems to be UTF-8 encoded.

The headers are evaluated by a servlet 2.5 web application which has defined
a ‘CharacterEncodingFilter’ as first filter performing the following


              response.setContentType("text/html; charset=UTF-8");


              filterChain.doFilter(request, response);

Is there a way to tell Tomcat to decode the headers as being UTF-8 encoded

This is not defined and do not expect it to work properly. The best and morstreliable you can do is to encode your values with This is the same approach done for Content-Disposition filename qualifier. You may want to evaluate mod_lua for that.

Everything else will make you suffer as you have seen.


