Re: [XeTeX] Bug fixes and new features related to Unicode character codes, surrogates, etc

2015-05-07 Thread David Carlisle
On 7 May 2015 at 02:07, Ross Moore ross.mo...@mq.edu.au wrote: Hi David, .. No disagreement to this. OK:-) In the current versions d835dc00 is two characters in luatex and one character in xetex as the implementation detail that xetex's underlying storage is mostly

Re: [XeTeX] Bug fixes and new features related to Unicode character codes, surrogates, etc

2015-05-06 Thread Arthur Reutenauer
While working on these bugs, we also discussed how surrogate characters were handled in XeTeX. Surrogate characters are the 2048 code points that are used in UTF-16 to encode characters with code points above 65536: a pair of them makes up one Unicode character; however they're not meant to be

Re: [XeTeX] Bug fixes and new features related to Unicode character codes, surrogates, etc

2015-05-06 Thread David Carlisle
On 6 May 2015 at 23:04, Arthur Reutenauer arthur.reutena...@normalesup.org wrote: While working on these bugs, we also discussed how surrogate characters were handled in XeTeX. Surrogate characters are the 2048 code points that are used in UTF-16 to encode characters with code points above

Re: [XeTeX] Bug fixes and new features related to Unicode character codes, surrogates, etc

2015-05-06 Thread David Carlisle
The character itself, as bytes that is, is not wrong and users should be able to create these. But preferably through macros that ensure that they come correctly paired. placing two character tokens representing a surrogate pair should not though magically turn itself into a single character.

Re: [XeTeX] Bug fixes and new features related to Unicode character codes, surrogates, etc

2015-05-06 Thread Ross Moore
Hi Arthur, On 07/05/2015, at 8:04, Arthur Reutenauer arthur.reutena...@normalesup.org wrote: While working on these bugs, we also discussed how surrogate characters were handled in XeTeX. Surrogate characters are the 2048 code points that are used in UTF-16 to encode characters with code

Re: [XeTeX] Bug fixes and new features related to Unicode character codes, surrogates, etc

2015-05-06 Thread Ross Moore
Hi David, On 07/05/2015, at 9:26 AM, David Carlisle wrote: The character itself, as bytes that is, is not wrong and users should be able to create these. But preferably through macros that ensure that they come correctly paired. placing two character tokens representing a surrogate pair

Re: [XeTeX] Bug fixes and new features related to Unicode character codes, surrogates, etc

2015-05-05 Thread David Carlisle
On 4 May 2015 at 16:27, Jonathan Kew jfkth...@gmail.com wrote: ... A fix for this bug, so that \string generates single Unicode characters even for values above U+, is currently on the utf16-issues branch in the XeTeX repository on sourceforge.[1] A bug with characters above U+

[XeTeX] Bug fixes and new features related to Unicode character codes, surrogates, etc

2015-05-04 Thread Jonathan Kew
On 23/4/15 20:59, David Carlisle wrote: I can confirm that \string does convert character tokens to two tokens giving the UTF-16 representation. With the attached file luatex produces 90,33 34,33 233,33 233,33 65530,33 65537,33 65537,33 which is in each case the unicode value of the