There are no smarts about surrogate pairs in DOMString AFAIK. Its not an
issue internally because all of the chars that are special to XML are
non-surrogate values. But if you mess with any output from the parser or the
DOM, its up to you to do the right thing wrt surrogates.

--------------
Dean Roddey
Software Geek Extraordinaire
Portal, Inc
[EMAIL PROTECTED]



-----Original Message-----
From: Perry A. Caro [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, April 03, 2001 4:18 PM
To: [EMAIL PROTECTED]
Subject: Re: What's the internal encoding used by Xerces-C?


Dean Roddey wrote:
> 
> It is UTF-16, in the native endianess, and no it cannot be changed. It
used
> to be changeable between UTF-16 and UCS-4, but now its fixed at UTF-16.

That's great!  I thought Xerces-C was UCS-2 only.  So this means that
Unicode characters beyond the Base Plane are encoded as Surrogate Pairs. 
What does this mean for DOMString and DOM_CharacterData, when offsets are
used to insert characters?  Is the Surrogate Pair treated as a single
character? Or are all lengths and offsets in terms of 16-bit storage units,
not characters?

Thanks,

Perry

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to