Hans wrote: >On 12 Jul 2012, at 15:54, Julian Bradfield wrote: .. >> Not to mention the symbols I've used from time to time, because > >You tell me, because I posted a request for missing characters in different >forums. Perhaps you invented it after the standardization was made?
Why on earth would I care about whether my pet symbol (a mu-nu ligature, which I started using to stand for "mu or nu as appropriate" when I ran out of other plausible letters for it) is in Unicode? It would be crazy to put it there, and of precious little benefit to me, since I don't wish to write web pages about this stuff. >>> them. In math, you can always invent your own characters and styles, >> people do. >You and others knowing about those characters must make proposals if you want >to see them as a part of Unicode. But wanting to do so would be crazy. My mu-nu ligature is, as far as I know, used only by me (and co-authors who let me do the typesetting), and so if Unicode has any sanity left, it would not encode it. My colleagues in the Edinburgh PEPA group did try to get their pet symbol encoded (a bowtie where the two triangles overlap somewhat rather than just touching), but were refused; although that symbol now appears in hundreds of papers by dozens of authors from all over the world. (I think they wanted it so they could put it on web pages, which they have lots of.) Putting a symbol into Unicode imposes a huge burden on thousands of people. Everybody who thinks it important to be able to display all Unicode characters (or even all non-Han characters) has to make sure that their font has it, or that the distribution they package has it, or that all the software in the world knows how to find a font that has it. Such effort is entirely inappropriate for symbols used ad hoc by a small community, who are communicating in any case via either fully typeset documents or by TeX pseudocode - or, on occasion, with real TeX and a suitable font definition. >> You mean "private use". Crazy thing to do, because then you have to >> worry about whether your PUA code point clashes with some other >> author's PUA code point. > >There is some system for avoiding that. Perhaps someone else here can inform. There are many such systems - I don't need help or advice on this matter. But none of them is appropriate for a symbol that perhaps you want only for a few papers. >>> UTF-8 only is simplest for the programmer that has to implement it. >> Some of us are more concerned with users than programmers. >Well, if the programmers don't implement, you are left out in the cold. I'm not - if I care enough, I'll do it myself. Although most of my work has actually been implementing utf-8 - as I said, the legacy encodings are usually already done. >> Neither working mathematicians nor publishers nor >> typesetters like dealing with constantly changing extensions and >> variations on TeX - one of the biggest selling points of TeX is >> stability. (Defeated somewhat by the instability of LaTeX and its >> thousands of packages, but that's another story.) >> If I need to write complex - or even bidi - scripts routinely, I'd >> probably be forced into one of them; but the typical mathematician >> doesn't. > >I do not see your point here. The point is that you don't use unstable rapidly changing systems for anything that has an expected life of more than a year or two; and if you're planning for somebody else to use it, you try to give them something that runs on systems at least ten years older than yours. >No. TeX cannot handle UTF-8, and I recall LaTeX's capability to emulate that >was limited. Somewhat limited, but good enough for every purpose I've so far needed (maths, phonetics; and European, Indic, Chinese, Hebrew languages in small snippets rather than entire documents). The main annoyance is that combining character support is clunky, and that TeX really doesn't support bidi properly - as I said - though it's remarkable what hacking can be done. >>>> you need to encode also letters that are semantically distinctively >>>> roman upright. >>> >>> It has already been encoded as mathematical style, see the "Mathematical >>> Alphanumeric Symbols" here: >>> http://www.unicode.org/charts/ >> >> *You* look. The plain upright style is unified with the BMP characters. > >Yes, that is why the Unicode paradigm departs from the TeX one. This is as bad as Naena Guru... Unicode characters are fontless. They are plain text. The Unicode standard even has a nice little picture (Figure 2-2) showing how roman A, squashed A, bold italic A, script A, fancy A, sans-serif A, brush-stroke A, fancy script A, and versal capital A are all just LATIN LETTER A. Now, in response to the desire of some mathematicians (maybe) to write webpages without having to use clunky HTML markup (which is even worse to use than TeX's), Unicode saw fit to encode characters such as MATHEMATICAL BOLD ITALIC CAPITAL A. This is not a logical problem: that character is distinguished from LATIN LETTER A by the fact that its acceptable glyph variants cover a much narrower range than those of A. However, if you now say that MATHEMATICAL ROMAN CAPITAL A, which by definition must be a seriffed upright non-bold roman letter, is the same character as LATIN LETTER A, you must vanish in a puff of logic, for the same character cannot both be a fontless A and also an A that must be displayed in a very restricted range of glyphs. Unless, that is, you have higher level markup that tells you when A means A, and when it means \mathrm{A}. But if you have such higher level markup, you don't need all the other variants anyway. TeX provides such markup, by means of math mode. So TeX users can choose to treat A as \mathrm{A} without inconsistency. However, they can also choose to intepret the higher-level markup as saying "treat A as itself", in which case TeX can do what it likes (in particular, set in italic), also without inconsistency. Thus there is no incompability between Unicode and TeX. Similarly in MathML. However, in plain text, you are screwed. There is no way to distinguish between the generic A, and the A that must be roman, except by human intelligence. >You have yourself noted that the BMP characters must be used for upright for >consistent Unicode use, incompatible with TeX which sets them as italic. Which shows that Unicode is inconsistent, not that TeX is flawed. >It is because there are currently no convenient input methods, also mentioned >before in this thread. There will never be a convenient input methods for thousands of symbols. (I've spent some time designing convenient input methods for the range of characters I use frequently, and I still can't always remember them.) -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.