On 27/11/2019 23:20, Doug McKenna wrote:
Another question about Unicode-aware TeX engine (e.g., XeTeX) initialization 
files.

The Unicode Consortium provides a file, MathClass.txt, e.g.,

./texmf-dist/tex/generic/unicode-data/MathClass.txt

It contains a list of lines (and comments).  Field 0 of an entry line is a 
Unicode code point or a range of code points, and field 1 is a single ASCII 
character that declares the Unicode math class to which the code point or range 
of code points belongs.

Comments in that file say that there are (currently) 15 different Unicode math 
class codes:

#   N - Normal - includes all digits and symbols requiring only one form
#   A - Alphabetic
#   B - Binary
#   C - Closing - usually paired with opening delimiter
#   D - Diacritic
#   F - Fence - unpaired delimiter (often used as opening or closing)
#   G - Glyph_Part - piece of large operator
#   L - Large - n-ary or large operator, often takes limits
#   O - Opening - usually paired with closing delimiter
#   P - Punctuation
#   R - Relation - includes arrows
#   S - Space
#   U - Unary - operators that are only unary
#   V - Vary - operators that can be unary or binary depending on context
#   X - Special - characters not covered by other classes

During XeTeX format initialization, the file load-unicode-math-classes.tex in 
that same directory is executed, in order to declare to the engine which 
Unicode code points belong to which TeX math classes.  The comments in that 
file say that the classes it pays attention to are those with the following 
Unicode math codes:

% This file parses MathClass.txt, provided by the Unicode Consortium, and sets
% up the following mapping between Unicode classes and TeX math types
% - "L" (large)       \mathop
% - "B" (binary)      \mathbin
% - "V" (vary)        \mathbin
% - "R" (relation)    \mathrel
% - "O" (opening)     \mathopen
% - "C" (closing)     \mathclose
% - "P" (punctuation) \mathpunct
% - "A" (alphabetic)  \mathalpha

That means that there are 7 other Unicode math classes that are unaccounted for.

Unfortunately, the documentation/comments don't say what happens to entries 
having these other Unicode math codes (N, D, F, G, S, U, and X).  Are they 
completely ignored, or are they mapped to one of the other eight codes that 
matches what TeX is interested in or only capable of handing?

I can imagine that the space character, given Unicode math class 'S' in MathClass.txt, is 
ignored during this parse.  But what happens to the '¬' character (U+00AC) ("NOT 
SIGN"), which is assigned 'U' (Unary Operator).  Surely the logical not sign is not 
being ignored during initialization of a Unicode-aware engine, yet the comments in 
load-unicode-math-classes.tex don't say one way or the other, and it appears to me that 
the parsing code is ignoring it.

The ReadMe.md file

<https://ctan.org/tex-archive/macros/generic/unicode-data>

is also deficient in answering this question.

TIA,

Er, I thought the README was reasonably clear, ah well!

The other Unicode math classes don't really map directly to TeX ones, so they are currently ignored. Suggestions for improvements here are of course welcome.

Joseph

Reply via email to