> The character itself, as bytes that is, is not wrong and users should be able 
> to create these.
> But preferably through macros that ensure that they come correctly paired.

placing two character tokens representing a surrogate pair should not
though magically turn itself
into a single character. The UTF-8 or ^^^^ encoding should refer to
the unicode code point not
to the UTF-16 encoding,

In the current versions ^^^^d835^^^^dc00 is two characters in luatex
and one character in xetex
as the implementation detail that xetex's underlying storage is mostly
UTF-16 is exposed. If it is
not possible to prevent ^^^ or utf8 encoded surrogate pairs combining
then it is better to
prevent them being formed.

this is no different to XML where & #xd835;& #xdc00; always refers to
two (invalid) characters not
to & #x1d400;

David


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Reply via email to