On Thu, 2010-01-21 at 11:30 +0100, André Warnier wrote:

This was quite replay :). Thanks for you time and 
knowledge.

> Mirko,
> I am from Belgium, Europe too. I live in Spain and work mostly for 
> German and other international customers (among which are some from 
> Poland too). This to say that I am well-aware of multi-lingual character 
> set issues, and confront them every day.
> So, just so as to give you some "context" for your issues :
> 
> Despite the fact that Unicode and UTF-8 are now being increasingly used 
> on the web, the fact is that HTTP, and HTML, and most of the other 
> WWW-relevant RFCs, are still US-ASCII and ISO-8859-1 (latin-1) based.
> 
> For example, HTTP header values are /supposed/ to contain only 
> single-byte character codes that are part of the (printable subset of) 
> US-ASCII character set.
> For example also, by default, all "content" exchanged between browsers 
> and web servers is iso-8859-1.
> And it is so because the relevant RFCs say that it should be.
> (So the developers of Apache and mod_jk and Tomcat have little choice in 
> the matter; they have to follow the RFCs).

I agree RFC are there to be used.

> 
> This does not mean that you cannot handle other character sets on the 
> web.  But it means that whenever you do, you have to be attentive to the 
> fact that it is /not/ the standard, and that you may have to do 
> character set translations and/or encoding.
> It may even mean that, in order to exchange non-US-ASCII or 
> non-ISO-8859-1 data, you may have to use "tricks".
> It also means that, in some cases, by using such "tricks", your 
> applications may become "non-standard", and will not necessarily work 
> with all servers and all clients.
> 
> So for example, to get back to your question above : mod_jk is not 
> responsible for translating anything, and will not translate anything. 
> That is because mod_jk follows the relevant WWW RFCs, which specify that 
> such and such data is ASCII or ISO-8859-1.
> 
> If the original HTTP request, as it is given by Apache to mod_jk, 
> contains HTTP headers, mod_jk will forward these headers "as is" to the 
> back-end Tomcat.  But, because the HTTP RFC specifies that HTTP headers 
> should contain only US-ASCII character data, mod_jk would be allowed, if 
> it finds non-US-ASCII data in a HTTP header, to strip this data or 
> ignore the header or something like that.  I don't know if mod_jk 
> actually does this, but if it did, it would be justified, because 
> according to the HTTP RFC this would be an invalid header.

That what i'm afraid of. This code: new
 String(request.getHeader(headerName).getBytes("ISO-8859-1")) works for
now but it really shouldn't work.
That way i'm searching for more legitimate way.
> 
> So, to be practical :
> - the current HTTP 1.1 RFC specifies that HTTP headers can only contain 
> US-ASCII printable character data
> - some UTF-8 codes contain bytes that are not part of the US-ASCII 
> character set (e.g. : bytes with values above 0x7F)
> - so, if you want to forward such a header from Apache to Tomcat, in 
> principle the "right" way is to "encode" the value of this header on the 
> Apache side, in such a way that it contains only US-ASCII data (for 
> example, using Base64 encoding), then pass it to mod_jk.
> - at the other end, your application would have to decode this header 
> (using Base64 decoding) back into UTF-8, and then it would have to read 
> this header value as UTF-8/Unicode.
> 
> There is no guarantee that any standard module or class under Apache or 
> mod_jk or Tomcat would properly handle a header that contains 
> non-US-ASCII data.  That because, in principle, they never have to.
> 
> I know it is a mess. It is possible that there are shortcuts.  It is 
> possible that mod_jk would transmit a HTTP header, even if it contains 
> non-US-ASCII data. But it is not sure, because "the bible" for mod_jk, 
> as for Apache and as for Tomcat, are the RFCs.

But where to put this Base64 encoding (i do not use apache often :( i'm
java programmer using tomcat). 
>From Idp (AAI identity provider) i get user data and SP (AAI service
provide, this is module in apache) put this data in apache environment
variables with utf-8 values. Then as i understand mod_jk take this
variables and pack them in http header. I would like to have environment
variables on apache with utf-8 values so applications (e.g php web
pages) that are on this apache would still work.
So my guess is that Base64 encoding should happen before mod_jk takes
values from environment variables and puts them in http header.Is this
possible (i mean except to make change in mod_jk code)? Or is this topic
for some other mail list :).


> We non-English speakers worldwide desperately need a new version of the 
> HTTP protocol where the default would be Unicode/UTF-8, for everything.
> But I do not see much happening right now in that direction.

O i do agree on that :)

> 
> 
> Maybe a tip for your authentication issues :
> If, in the AJP <Connector> on the Tomcat side, you set the attribute
> tomcatAuthentication="false"
> then Tomcat will accept the user-id authenticated by Apache, as the 
> user-id for Tomcat (mod_jk transmits it).
> So if your user authentication mechanism works fine at the Apache level, 
> and generates a user-id that is "acceptable" by Tomcat, this may be a 
> solution for your issue.
> I have no idea if this user-id, for Tomcat, can or cannot contain 
> non-US-ASCII characters.

AAI returns more then just user-id. Idea behind AAI is that application
save as little as possible data about user. All data is provided by AAI.
In this data is for example first-name, last-name, address, .... It
would be perfect that we would have this SP running on tomcat and we
wouldn't need apache but at the time there is no such SP.

mirko




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to