Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

Jonathan Kew Wed, 06 May 2015 07:13:30 -0700

On 6/5/15 14:14, Joseph Wright wrote:

Based on the current files, we have a block to set \XeTeXcharclass,
which only applies to XeTeX. The logic followed in that code is that
characters in the file LineBreak.txt which have class "ID" (ideographs)
not only set the \XeTeXcharclass class to 1 but also set the \catcode of
the code point to 11. That leads to a difference between the two Unicode
engines. My current feeling is that the data file should split this
process such that the category code change applies to both XeTeX and
LuaTeX, with the XeTeX-specific code separate. Does this make sense and
indeed does the current assignment make sense?

ISTM that the most appropriate (default) \catcode for characters withclass ID is clearly letter (11), and would suggest that LuaTeX shouldfollow XeTeX in this.

So yes, splitting out the XeTeX-specific code and having LuaTeX sharethe catcode assignments makes sense.


After all, if users can write control sequences such as

  \hello
  \halló
  \Здравствуйте
  \ሰላም
  \सलाम

they should equally well be able to write

  \你好
  \こんにちわ

and have each of these treated as single control sequences, too. Thiswill not work if category ID characters are given catcode 12.

If you're making improvements to unicode-letters.def, I would suggestalso adding a section that assigns catcode 15 (invalid) to the codevalues "D800 - "DFFF (i.e. the UTF-16 surrogates, which should never beused in isolation as characters).


JK



--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

Reply via email to