I found it somewhere on the Internet, which I cannot now remember, but here it is. We've been using it for the past couple of months, and it appears to work well.
Good luck, Evan public static String convertURLEncodedUTF8Str(String s) { if (s == null) { return ""; } StringBuffer sbuf = new StringBuffer () ; int l = s.length() ; int ch = -1 ; int b, sumb = 0; for (int i = 0, more = -1 ; i < l ; i++) { /* Get next byte b from URL segment s */ switch (ch = s.charAt(i)) { case '%': ch = s.charAt (++i) ; int hb = (Character.isDigit ((char) ch) ? ch - '0' : 10+Character.toLowerCase((char) ch) - 'a') & 0xF ; ch = s.charAt (++i) ; int lb = (Character.isDigit ((char) ch) ? ch - '0' : 10+Character.toLowerCase ((char) ch)-'a') & 0xF ; b = (hb << 4) | lb ; break ; case '+': b = ' ' ; break ; default: b = ch ; } /* Decode byte b as UTF-8, sumb collects incomplete chars */ if ((b & 0xc0) == 0x80) { // 10xxxxxx (continuation byte) sumb = (sumb << 6) | (b & 0x3f) ; // Add 6 bits to sumb if (--more == 0) sbuf.append((char) sumb) ; // Add char to sbuf } else if ((b & 0x80) == 0x00) { // 0xxxxxxx (yields 7 bits) sbuf.append((char) b) ; // Store in sbuf } else if ((b & 0xe0) == 0xc0) { // 110xxxxx (yields 5 bits) sumb = b & 0x1f; more = 1; // Expect 1 more byte } else if ((b & 0xf0) == 0xe0) { // 1110xxxx (yields 4 bits) sumb = b & 0x0f; more = 2; // Expect 2 more bytes } else if ((b & 0xf8) == 0xf0) { // 11110xxx (yields 3 bits) sumb = b & 0x07; more = 3; // Expect 3 more bytes } else if ((b & 0xfc) == 0xf8) { // 111110xx (yields 2 bits) sumb = b & 0x03; more = 4; // Expect 4 more bytes } else /*if ((b & 0xfe) == 0xfc)*/ { // 1111110x (yields 1 bit) sumb = b & 0x01; more = 5; // Expect 5 more bytes } /* We don't test if the UTF-8 encoding is well-formed */ } return sbuf.toString() ; } -----Original Message----- From: Lee Chin Khiong [mailto:[EMAIL PROTECTED]] Sent: Thursday, April 11, 2002 6:23 PM To: 'Tomcat Users List' Subject: RE: Foreing Character encoding from jsp form (Character Encoding doesn't work) Yes, can I have it too. Thanks. -----Original Message----- From: Evan Child [mailto:[EMAIL PROTECTED]] Sent: Friday, April 12, 2002 8:21 AM To: 'Tomcat Users List' Subject: RE: Foreing Character encoding from jsp form (Character Encoding doesn't work) What browser are you using to submit the form? Before you start getting parameters, you need to do a request.setCharacterEncoding("UTF-8"); I couldn't understand from below if you're already doing that. Assuming that you ultimately want the characters to end up in a utf-8 encoding. If the browser url-encodes the parameters, (for example if this is an HTTP GET request), you'll need to get a decoder to decode that and convert it into regular UTF-8. I have a decoder in java, if you want it. Thanks, Evan -----Original Message----- From: Steve Vanspall [mailto:[EMAIL PROTECTED]] Sent: Thursday, April 11, 2002 6:27 PM To: Tomcat Users List Subject: Foreing Character encoding from jsp form (Character Encoding doesn't work) HI there, I am having problem with reading foreign characters from a form. I am trying to make it do that Chinese characters can be entered. When I enter them the a received by the request in the form 寇蔆 etc... I have set the character encoding and filter for UTF-8 in web.xml, I know that it goes through the filter, but the output is the same. presumably because it reads each character in as '&' '#' '2' '3' 4' '9' '5' ';', seeing these character as regular ascii character, it doesn't try to change them All my pages are set to UTF-8 charcter encoding. I have altered the filter code myself to intercept the filter and recursive replace these code. basically converting the integere one by one into chars. two problems arise from this. 1. When I then add then string them together using a string buffer/string I get a string of '??????', this is also how it is entered into the database (which is set to UTF-8 encoding also) 2. Surely there is a better way to do this. Can anybody help me here, Thanks in advance Steve Vanspall -- To unsubscribe: <mailto:[EMAIL PROTECTED]> For additional commands: <mailto:[EMAIL PROTECTED]> Troubles with the list: <mailto:[EMAIL PROTECTED]> -- To unsubscribe: <mailto:[EMAIL PROTECTED]> For additional commands: <mailto:[EMAIL PROTECTED]> Troubles with the list: <mailto:[EMAIL PROTECTED]> -- To unsubscribe: <mailto:[EMAIL PROTECTED]> For additional commands: <mailto:[EMAIL PROTECTED]> Troubles with the list: <mailto:[EMAIL PROTECTED]>