I found it somewhere on the Internet, which I cannot now remember, but here
it is. We've been using it for the past couple of months, and it appears to
work well.

Good luck,

Evan


        public static String convertURLEncodedUTF8Str(String s) {
          if (s == null) {
                return "";
          }
                StringBuffer sbuf = new StringBuffer () ;
                int l  = s.length() ;
                int ch = -1 ;
                int b, sumb = 0;
                for (int i = 0, more = -1 ; i < l ; i++) {
                  /* Get next byte b from URL segment s */
                  switch (ch = s.charAt(i)) {
                case '%':
                  ch = s.charAt (++i) ;
                  int hb = (Character.isDigit ((char) ch) 
                                ? ch - '0'
                                : 10+Character.toLowerCase((char) ch) - 'a')
& 0xF ;
                  ch = s.charAt (++i) ;
                  int lb = (Character.isDigit ((char) ch)
                                ? ch - '0'
                                : 10+Character.toLowerCase ((char) ch)-'a')
& 0xF ;
                  b = (hb << 4) | lb ;
                  break ;
                case '+':
                  b = ' ' ;
                  break ;
                default:
                  b = ch ;
                  }
                  /* Decode byte b as UTF-8, sumb collects incomplete chars
*/
                  if ((b & 0xc0) == 0x80) {                     // 10xxxxxx
(continuation byte)
                sumb = (sumb << 6) | (b & 0x3f) ;       // Add 6 bits to
sumb
                if (--more == 0) sbuf.append((char) sumb) ; // Add char to
sbuf
                  } else if ((b & 0x80) == 0x00) {              // 0xxxxxxx
(yields 7 bits)
                sbuf.append((char) b) ;                 // Store in sbuf
                  } else if ((b & 0xe0) == 0xc0) {              // 110xxxxx
(yields 5 bits)
                sumb = b & 0x1f;
                more = 1;                               // Expect 1 more
byte
                  } else if ((b & 0xf0) == 0xe0) {              // 1110xxxx
(yields 4 bits)
                sumb = b & 0x0f;
                more = 2;                               // Expect 2 more
bytes
                  } else if ((b & 0xf8) == 0xf0) {              // 11110xxx
(yields 3 bits)
                sumb = b & 0x07;
                more = 3;                               // Expect 3 more
bytes
                  } else if ((b & 0xfc) == 0xf8) {              // 111110xx
(yields 2 bits)
                sumb = b & 0x03;
                more = 4;                               // Expect 4 more
bytes
                  } else /*if ((b & 0xfe) == 0xfc)*/ {  // 1111110x (yields
1 bit)
                sumb = b & 0x01;
                more = 5;                               // Expect 5 more
bytes
                  }
                  /* We don't test if the UTF-8 encoding is well-formed */
                }
                return sbuf.toString() ;
        }



-----Original Message-----
From: Lee Chin Khiong [mailto:[EMAIL PROTECTED]]
Sent: Thursday, April 11, 2002 6:23 PM
To: 'Tomcat Users List'
Subject: RE: Foreing Character encoding from jsp form (Character
Encoding doesn't work)



Yes, can I have it too.  Thanks.


-----Original Message-----
From: Evan Child [mailto:[EMAIL PROTECTED]]
Sent: Friday, April 12, 2002 8:21 AM
To: 'Tomcat Users List'
Subject: RE: Foreing Character encoding from jsp form (Character
Encoding doesn't work)


What browser are you using to submit the form?

Before you start getting parameters, you need to do a 
request.setCharacterEncoding("UTF-8");

I couldn't understand from below if you're already doing that. Assuming that
you ultimately want the characters to end up in a utf-8 encoding.

If the browser url-encodes the parameters, (for example if this is an HTTP
GET request), you'll need to get a decoder to decode that and convert it
into regular UTF-8. I have a decoder in java, if you want it.

Thanks,

Evan

-----Original Message-----
From: Steve Vanspall [mailto:[EMAIL PROTECTED]]
Sent: Thursday, April 11, 2002 6:27 PM
To: Tomcat Users List
Subject: Foreing Character encoding from jsp form (Character Encoding
doesn't work)


HI there,

I am having problem with reading foreign characters from a form.

I am trying to make it do that Chinese characters can be entered.

When I enter them the a received by the request in the form &#23495;&#34054;

etc...

I have set the character encoding and filter for UTF-8 in web.xml, I know
that it goes through the filter, but the output is the same. presumably
because it reads each character in as '&' '#' '2' '3' 4' '9' '5' ';', seeing
these character as regular ascii character, it doesn't try to change them

All my pages are set to UTF-8 charcter encoding.

I have altered the filter code myself to intercept the filter and recursive
replace these code.

basically converting the integere one by one into chars.

two problems arise from this.

1. When I then add then string them together using a string buffer/string I
get a string of '??????', this is also how it is entered into the database
(which is set to UTF-8 encoding also)

2. Surely there is a better way to do this.

Can anybody help me here,

Thanks in advance

Steve Vanspall



--
To unsubscribe:   <mailto:[EMAIL PROTECTED]>
For additional commands: <mailto:[EMAIL PROTECTED]>
Troubles with the list: <mailto:[EMAIL PROTECTED]>

--
To unsubscribe:   <mailto:[EMAIL PROTECTED]>
For additional commands: <mailto:[EMAIL PROTECTED]>
Troubles with the list: <mailto:[EMAIL PROTECTED]>

--
To unsubscribe:   <mailto:[EMAIL PROTECTED]>
For additional commands: <mailto:[EMAIL PROTECTED]>
Troubles with the list: <mailto:[EMAIL PROTECTED]>

Reply via email to