On 23/4/15 20:59, David Carlisle wrote:
I can confirm that \string does convert character tokens
to two tokens giving the UTF-16 representation.

With the attached file luatex produces

90,33
34,33
233,33
233,33
65530,33
65537,33
65537,33


which is in each case the unicode value of the character followed by
that of !

xetex produces

90,33
34,33
233,33
233,33
65530,33
55296,56321
55296,56321


where the last two lines show that \string has generated U+D800 U+DC01
which does correspond to the UTF-16 encoding of U+10001 confirming
that \string on a character token has produced two tokens that have been
picked up separately as #1 and #2 of the \test macro.

A fix for this bug, so that \string generates single Unicode characters even for values above U+FFFF, is currently on the utf16-issues branch in the XeTeX repository on sourceforge.[1]

A bug with characters above U+FFFF within \scantokens[2] is also fixed on this branch.


There are also a couple of new primitives available in this branch:

(1) \Uchar <number>

    where <number> is a number in the range 0.."10FFFF

is an expandable command that produces a character token with the given Unicode value, and catcode=12 (other character). This is different from TeX's \char primitive from a macro-programming point of view, in that it expands to a character token rather than being a typesetting command.

(I believe this is similar to the \Uchar command available in luatex.)


(2) \Ucharcat <number1> <number2>

    where <number1> is a number in the range 0.."10FFFF
    and <number2> is a number in the ranges 1..4, 6..8, 10..12

is an expandable command that produces a character token with Unicode value <number1> and catcode <number2>. This allows macro programmers to create character tokens with various catcode assignments much more easily than is otherwise possible.


Feedback and testing is invited; but note that currently this will require pulling the code from sourceforge and building the new xetex, as binary packages are not available.

If testing in the next day or two doesn't uncover any alarming problems, these fixes/features will be merged to the master branch and to TeXLive, in preparation for the TL2015 release.

JK


[1] https://sourceforge.net/p/xetex/code/ci/utf16-issues/tree/
[2] https://sourceforge.net/p/xetex/bugs/80/



--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex

Reply via email to