RE: Utf8 question

Colosi, John 8 Nov 2001 13:58:06 -0000

Thanks for the response Andy.
I'm writing an application which requires a utf8 value.  I think this value
can be input in two ways:

1)

<abc>[EMAIL PROTECTED]@^$#</abc>
   (here the element value is raw utf8)

or

2)

<abc>&#xe5;&#x9e;&#xbe;</abc>
   (here the element value is utf8 written using the hex notation.

In the first example, the parser is modifying the utf-8 and returning to me
a Java string containing utf-16.  In the second example, the Java string I
get is just the exact binary that I entered (because the parser makes no
assumption about the binary data).

So how can my application know whether it's looking at utf-8 or utf-16
because it can't really know how the parser handled the input?

Any help is appreciated.

thanks,
-- John

-----Original Message-----
From: Andy Clark [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 08, 2001 12:32 AM
To: [EMAIL PROTECTED]
Subject: Re: Utf8 question

"Colosi, John" wrote:
>         It looks like the Xerces parser is converting incoming UTF-8 to
> UTF-16 automatically during the parse.

Since Java uses UTF16 internally, wouldn't this be what
it's supposed to do? Or maybe I'm not understanding what
you mean. Please provide some more detailed information.

-- 
Andy Clark * IBM, TRL - Japan * [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Utf8 question

Reply via email to