Dean Roddey wrote:
>
> It is UTF-16, in the native endianess, and no it cannot be changed. It used
> to be changeable between UTF-16 and UCS-4, but now its fixed at UTF-16.
That's great! I thought Xerces-C was UCS-2 only. So this means that
Unicode characters beyond the Base Plane are encoded as Surrogate Pairs.
What does this mean for DOMString and DOM_CharacterData, when offsets are
used to insert characters? Is the Surrogate Pair treated as a single
character? Or are all lengths and offsets in terms of 16-bit storage units,
not characters?
Thanks,
Perry
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]