On Mon, 14 May 2018 04:12:56 -0800 James Kass via Unicode <[email protected]> wrote:
> In response to William Overington's post, it's easier to transcode > data from a PUA scheme into Unicode than it is to enter the data from > scratch. (The same could be said for a customized ASCII font.) Some > users may not wish to wait even the handful of years it took for > mainstream Indic complex scripts to be rendered properly. > > At this phase of Unicode's progress, however, we shouldn't encourage > the interchange of such PUA data. Since it's simple to transcode, any > such data should be transcoded prior to interchange or permanent > storage. > Recipients lacking systems supporting proper Unicode > rendering for complex scripts such as Tai Tham could then transcode it > to the PUA scheme for display/printing purposes. The PUA scheme would be roughly equivalent to the glyph sequence produced by the shaper. (The ccmp feature is in general not available for the PUA, though CSS allows its use to be forced.) However, there would be no extra channels, such as the component-mark association often needed for some cursive scripts. For example, in ᨣᩩ᩠ᨿ <LOW KA, SIGN U, SAKOT, LOW YA> 'to direct', SIGN U may be realised as a mark below left, a mark below <SAKOT, LOW YA>, or a spacing mark on the right of <SAKOT, YA>. One could argue that the three positions require different glyphs for SIGN U. Each font would need its own PUA. > An OpenType font, a keyboard driver, and a text conversion utility > might go a long way towards supporting complex scripts for users whose > systems cannot otherwise currently support them. This is where Apple had the right idea, but difficult of implementation, and the OTL paradigm is deficient. There are several places in Tai Tham layout where I want to swap glyphs round, but for the layout engine to do so for me would cause grief for other Tai Tham fonts. This rearrangement cannot be delegated to the rendering engine. There are Tai Tham fonts which handle Indic rearrangement in the ccmp feature, but they are then totally defeated by either ccmp not being enabled or by the USE doing basic Indic shaping. There are now two approaches for Tai Tham - (1) fix USE or restore/create a separate shaper for scripts with CVC... aksharas, and (2) overcome the USE in the font. For the latter I need to make the work-arounds in Da Lekh easier to copy. I have transferred them to Ed Trager's Hariphunchai font, yielding Lamphun, but Lamphun still needs some further revision to the positioning logic. It wasn't as complete as I'd hoped. I've done a quick fix for the vowels below, but I suspect much more work is needed to conform to the spirit of the Hariphunchai font. I could do with someone artistic to help with the combinations of NYA and subscript consonant such as NY.CA, and Pali LL.HA is currently a disaster. On Track 1, there's also more tinkering to do, such as making MEDIAL LA and MEDIAL RA 'consonant subscript' rather than 'consonant medial' /lw/ is an allowed onset in the Tai languages using the Tai Tham script, so we get orthographic onset <hlw-> with MEDIAL LA in the West. The main problem is that we do not have characters *MEDIAL WA and *MEDIAL YA - the general subscript WA and YA are used instead, and these can function as matres lectionis. (In Unicode Khmer, the matres lectionis have been reanalysed as vowels.) I think it would also help to make SIGN AA and SIGN TALL AA into letters as far as the USE is concerned. The default grapheme segmentation rules already treat them as consonants. The possible downside is that so doing might mess up some fonts. > A good keyboard driver should be able to remove some of the burden off > of the OpenType tables, enabling multiple > fonts covering the same script to be used without having bloated and > redundant OpenType tables, by offering some degree of control over the > actual character strings which are being stored (and presented to the > font for rendering). It won't work. The text input delivered by X still needs to be supported, and without modifying the application, X can only input one character at a time. Not everyone uses an 'input method'. > (Many font developers might consider that any kind of normalization > should be handled at input rather than left up to the font. Keyboard > developers might have a different idea, though.) Apparently, Hangul input should not be canonically normalised in South Korea. I've seen an implementation of the USE render canonically equivalent strings differently. It wouldn't be HarfBuzz - it normalises, as we saw when it briefly messed up Tai Tham rendering when it swapped <tone, SAKOT> to <SAKOT, tone>. That was rapidly fixed to normalise the other way round. I'd completely forgotten that Thai, Lao and Tai Tham tone marks had different combining classes. However, in Northern Thai, <TONE-1...TONE-2> and <TONE-2...TONE-1> seem to render the same, so normalisation might not be relevant. Unsurprisingly, that's the only pair of tone-marks I've seen in the same akshara, so I don't know how the other pairs of distinct tone marks combine. A pair arises when two chained syllables have different tone marks. If they have the same tone mark, one is suppressed. Richard.

