Hi, I have a JSP file (see attachment) which lets you submit text in UTF-8 to the same JSP file. For this to work the JSP file contains code for converting the submitted text from Unicode to UTF-8. I run some test to submit the Euro symbol. In Unicode this is code point 0x20ac and in UTF-8 it is 0xE2 0x82 0xAC (3 bytes). It works for all servlet engines I know of incl. Tomcat up to 3.2 beta 6 but not for Tomcat 4.0m4 if you have an URL like http://host/post.jsp?text=%E2%82%AC I expect the following output: text [as text] = â'¬ text [as hex] = 0xe2 0x82 0xac text [corrected] = EUR but I get text [as text] = â'¬ text [as hex] = 0xe2 0x201a 0xac text [corrected] = Note the second hex code. Interestingly 0x201a is a Unicode code point containing a , character but I'm clueless how Tomcat got there ... Bye Christian PS: I have attached a JSP file for more multibyte samples ... -- Christian Mallwitz INTERSHOP Communications Germany Senior Software Engineer phone: +49 3641 894 334
complete-charset-unicode-utf8.jsp
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]