> The character itself, as bytes that is, is not wrong and users should be able > to create these. > But preferably through macros that ensure that they come correctly paired.
placing two character tokens representing a surrogate pair should not though magically turn itself into a single character. The UTF-8 or ^^^^ encoding should refer to the unicode code point not to the UTF-16 encoding, In the current versions ^^^^d835^^^^dc00 is two characters in luatex and one character in xetex as the implementation detail that xetex's underlying storage is mostly UTF-16 is exposed. If it is not possible to prevent ^^^ or utf8 encoded surrogate pairs combining then it is better to prevent them being formed. this is no different to XML where & #xd835;& #xdc00; always refers to two (invalid) characters not to & #x1d400; David -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex