Dean Roddey wrote:
> 
> It is UTF-16, in the native endianess, and no it cannot be changed. It used
> to be changeable between UTF-16 and UCS-4, but now its fixed at UTF-16.

That's great!  I thought Xerces-C was UCS-2 only.  So this means that
Unicode characters beyond the Base Plane are encoded as Surrogate Pairs. 
What does this mean for DOMString and DOM_CharacterData, when offsets are
used to insert characters?  Is the Surrogate Pair treated as a single
character? Or are all lengths and offsets in terms of 16-bit storage units,
not characters?

Thanks,

Perry

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to