RE: Problems converting from UTF-8 to UCS-2 and vice-versa using JRun 3.1, SQL Server 2000, Windows 2000 and Java 3.1

Marco Cimarosti Thu, 12 Sep 2002 05:17:17 -0700

Philippe de Rochambeau wrote:
> On the other hand, if I store the previous "go" character 
> plus an unusual 
> CJK ideogram whose Unicode equivalent is \u5439 (E5 90 B9 in UTF-8) 
> in the DB and retrieve the data, JRun 3.1 will only display the first 
> character in my form's textarea, plus a few invisible 
> characters, and the 
> database will contain the following hex values:
> 
> E8 AA 9E E5 3F B9 20 20 20 20 20 20 0D 0A 0A
> 
> As you can see, "go" is still there, but the following 
> character (E5 3F B9) 
> is not \u5439 (E5 90 B9). I cannot figure out how to fix this problem.
> 
> Any help with this problem would be much appreciated.


I see what the problem is. As usual, it's all the fault of Bill Gate$. :-)

If you interpret <E5, 90, B9> according to Windows-1252, you see that E5 is
"å", B9 is "¹", but 90 is an unassigned slot! Unassigned characters are
normally turned into a question marks, and "?"'s code is (guess what) 3F...

<E8, AA, 9E> this works only by chance, because all three bytes are valid
Windows-1252 characters: "é", "ª", and "ž", respectively.

I guess that the problem starts when you try to fool the system into
thinking that the text is ISO 8859-1:

        byte[] byt = (newQfLibelleArray[i]).getBytes( "ISO8859_1" );
        String tempUtf16 = new String( byt );

But, sorry. I can't help with a fix, because I don't know Java API's well
enough.

Can't you do something like <.getBytes("UTF-8")>? Or, even better, doesn't
(newQfLibelleArray[i]) have a method to return a <String> object directly?

_ Marco

RE: Problems converting from UTF-8 to UCS-2 and vice-versa using JRun 3.1, SQL Server 2000, Windows 2000 and Java 3.1

Reply via email to