Peter Constable wrote:
There is a potential concern in Uniscribe/OpenType: substitution and positioning rules in OT are organised hierarchically by script then by individual writing system / typographic groups (the label used is languages, but the intent is really groups of writing systems that share common typographic behaviours). Thus, a rule that handles positioning of a glyph for 0950 (or whatever) relative to some member of some class of glyphs must be entered somewhere under some particular script. Now, there is nothing that prohibits a font developer from creating multiple positioning rules for 0950 with different classes of base glyphs and to have a different one placed in the hierarchy under several different scripts.
Fully agreed so far.
> But there may yet be an issue on the Uniscribe side: given a
string of characters, which it will begin by mapping into a string of initial glyphs, it has to decide which script tag(s) to apply to portions of the string. What I don't know is whether it generally assumes combining marks belong to a specific script, or whether it allows combining marks to inherit their script from the base characters with which they combine.
Look: in current Uniscribe, leading ZWJ and ZWNJ are discarded (i.e., with input U+200B U+093E, you still get the circle meaning "incorrect combining", even if this is perfectly correct Unicode as far as I understand. So clearly, they have a problem with "backtracking" when the script is not determined by the first character in stream. I can understand that. OTOH, when ZWJ or ZWNJ come second or later in conjuncts, they are properly handled. In every script it is relevant. What I would like to see, is that the Indic accents be handled in the same way. And when I spoke about that with MS people (and not only me, but also Pothana's designer), MS answered that the Unicode standard seemed to imply that these accents apply to Devanagari script only. It looks like to me taht this Scripts.txt just confirm the MS point of view. If this is as intended, that is fine, but that means that a bunch of new character (with few or no added value) are to be added to some new revision of Unicode. By the way, the situation is similar with the dandas (U+0964 and U+0965): they only appear in the Devanagari and Myanmar blocks, but are used for many other (all?) South-Asian scripts as well. Worse, they are often used, so there is already many material that is encoded with these codepoints. Luckily, dandas do not need special handling from complex script engines, so it does not matter if Uniscribe decide they are Devanagri or script-less (except perhaps on the selection of the font). Antoine