Hi Joseph. On 28 Nov 2019, at 6:29 pm, Joseph Wright <joseph.wri...@morningstar2.co.uk<mailto:joseph.wri...@morningstar2.co.uk>> wrote:
On 28/11/2019 00:16, Ross Moore wrote: If by ignoring you mean removing the character entirely, then that is surely not best at all. Most N Class (Normal) characters would be simply of the default \mathord class. That is already the case: it's where IniTeX starts off, chars are mathord. So 'nothing to do here'. Also note that some of this information is already set from the main Unicode file: it tells us which chars are letters. OK. That’s what I’d expect. I’d expect others to be mapped instead into a macro that corresponds to something that TeX does support. e.g. space characters for thinspace, 2-em space, etc. in U+2000 – U+200A can expand into things like: \, \; \> \quad \qquad etc. ( even to constructions like \mskip1mu ) That's not a generic IniTeX thing, I'm afraid. Yeah, well there are so many of these extra space characters. I really don’t know where they are all used in practice by other (non-TeX) apps. The Unicode data loaders are explicitly about setting up the basic data in Unicode TeX engines that's held in (primitive) tables. Creating macros is the job of the 'rest' of the format. Here, presumably you are thinking of making chars math-active: that's well out-of-scope for the loader. Fair enough; especially if this is all happening before processing any textual input intended for the typeset page. After all, this is essentially what happens when pdfTeX reads raw Unicode input. pdfTeX reads bytes, there's not really much comparison. In IniTeX mode, there is not much happening with UTF-8 and pdfTeX: perhaps you are thinking of with LaTeX? Yes, sure I’m thinking of LaTeX; at least now that UTF-8 input has become the default. Previously there would be (inputenc) package and .def file loading. But, as you say above, this comes later. One has to wonder then, how much of the Unicode range needs to be (or can be) handled earlier; e.g, when there is only one sensible interpretation for the use of specific characters? Conversely, how much can, or should, be left to later when there may be a better idea of which (classes of) characters are present within the input source? I suppose that is the kind of question you are dealing with; so I’ll now butt out of this conversation, but still watch it if there’s further continuation. Joseph Cheers, Ross Dr Ross Moore Department of Mathematics and Statistics 12 Wally’s Walk, Level 7, Room 734 Macquarie University, NSW 2109, Australia T: +61 2 9850 8955 | F: +61 2 9850 8114 M:+61 407 288 255 | E: ross.mo...@mq.edu.au<mailto:ross.mo...@mq.edu.au> http://www.maths.mq.edu.au [cid:image001.png@01D030BE.D37A46F0] CRICOS Provider Number 00002J. Think before you print. Please consider the environment before printing this email. This message is intended for the addressee named and may contain confidential information. If you are not the intended recipient, please delete it and notify the sender. Views expressed in this message are those of the individual sender, and are not necessarily the views of Macquarie University. <http://mq.edu.au/> <http://mq.edu.au/>