Hi Joseph.

On 28 Nov 2019, at 6:29 pm, Joseph Wright 
<joseph.wri...@morningstar2.co.uk<mailto:joseph.wri...@morningstar2.co.uk>> 
wrote:

On 28/11/2019 00:16, Ross Moore wrote:
If by ignoring you mean removing the character entirely, then that is surely 
not best at all.
Most  N Class (Normal) characters would be simply of the default  \mathord  
class.

That is already the case: it's where IniTeX starts off, chars are mathord. So 
'nothing to do here'. Also note that some of this information is already set 
from the main Unicode file: it tells us which chars are letters.

OK. That’s what I’d expect.

I’d expect others to be mapped instead into a macro that corresponds to 
something that TeX does support.
e.g.
 space characters for  thinspace, 2-em space, etc.  in  U+2000 – U+200A
can expand into things like:   \, \; \> \quad \qquad  etc.  ( even to 
constructions like  \mskip1mu )

That's not a generic IniTeX thing, I'm afraid.

Yeah, well there are so many of these extra space characters.
I really don’t know where they are all used in practice by other (non-TeX) apps.

The Unicode data loaders are explicitly about setting up the basic data in 
Unicode TeX engines that's held in (primitive) tables.

Creating macros is the job of the 'rest' of the format. Here, presumably you 
are thinking of making chars math-active: that's well out-of-scope for the 
loader.

Fair enough; especially if this is all happening before processing any textual 
input intended for the typeset page.


After all, this is essentially what happens when pdfTeX reads raw Unicode input.

pdfTeX reads bytes, there's not really much comparison. In IniTeX mode, there 
is not much happening with UTF-8 and pdfTeX: perhaps you are thinking of with 
LaTeX?

Yes, sure I’m thinking of LaTeX; at least now that UTF-8 input has become the 
default.
Previously there would be (inputenc) package and  .def  file loading.
But, as you say above, this comes later.

One has to wonder then, how much of the Unicode range needs to be (or can be) 
handled earlier;
e.g, when there is only one sensible interpretation for the use of specific 
characters?
Conversely, how much can, or should, be left to later when there may be a 
better idea of which
(classes of) characters are present within the input source?

I suppose that is the kind of question you are dealing with; so I’ll now butt 
out of this conversation,
but still watch it if there’s further continuation.


Joseph



Cheers,

Ross


Dr Ross Moore
Department of Mathematics and Statistics
12 Wally’s Walk, Level 7, Room 734
Macquarie University, NSW 2109, Australia
T: +61 2 9850 8955  |  F: +61 2 9850 8114
M:+61 407 288 255  |  E: ross.mo...@mq.edu.au<mailto:ross.mo...@mq.edu.au>
http://www.maths.mq.edu.au
[cid:image001.png@01D030BE.D37A46F0]
CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University. <http://mq.edu.au/>
<http://mq.edu.au/>

Reply via email to