On Sun, 21 Jul 2019 20:53:19 -0700 Asmus Freytag via Unicode <unicode@unicode.org> wrote:
> There's really no inherent need for many spacing combining marks to > have a base character. At least the ones that do not reorder and that > don't overhang the base character's glyph. We are in agreement here. > As far as I can tell, it's largely a convention that originally > helped identify clusters and other lack of break opportunities. But > now that we have separate properties for segmentation, it's not > strictly necessary to overload the combining property for that > purpose. Which relates to the separate question I asked about breaking at grapheme boundaries. Interestingly, I'm not seeing breaks next to an invisible stacker, but that may be because Pali subscript consonants only slightly increase the width of the cluster. The need for a base makes sense for reordering spacing marks, but should be to detect editing errors, not deliberate effects. An unreordered rordering mark plus consonant is visually ambiguous with consonant plus reordering mark. > In you example, why do you need the ZWJ and dotted circle? The user- and application-supplied text would be <NBSP, ZWJ, spacing_mark>. > Originally, just applying a combining mark to a NBSP should normally > show the mark by itself. If a font insists on inserting a dotted > circle glyph, that's not required from a conformance perspective - > just something that's seen as helpful (to most users). It's not the font that inserts the dotted circle, it's the rendering engine. That's why the USE set Tai Tham rendering back several years. Now, there is at least one renderer (HarfBuzz) for which a cunning font can work out whether the renderer has introduced the dotted circle glyph rather than it being in the text to be rendered. I am looking for a general font-level solution to the problem that would even work on Windows 10. The ZWJ seems a reasonable hint that the space should be rendered with zero width. Do you think it is reasonable for <NBSP, spacing_mark> to have zero width contribution from the NBSP when the spacing mark has a non-overhanging glyph? It seems to be an unstandardised area, but zero width might be considered to violate the character identity of NBSP. I also have the problem of visually line-final U+1A6E TAI THAM VOWEL SIGN E, which needs to be separated from a preceding consonant in the backing store. It seems to be particularly common before the holes (two per page) for the string that holds the pages together. Perhaps the scribe tried to avoid line-final U+1A6E. There are examples of these issues in Figure 9b of http://www.unicode.org/L2/L2007/07007r-n3207r-lanna.pdf . The last syllable of _cattāro_ 'four' straddles lines 2 and 3, with its first glyph (corresponding to SIGN E) ending line 2, and <RA, SIGN AA> starting line 3. The antepenultimate syllable of _sammodamānehi_ (misspelt _samoddamānehi_) 'pleasing' is split between lines 7 and 8, with line 7 ending in MA and line 8 starting in SIGN AA. I am looking for advice on what is the least bad readily achievable solution. I can then adapt that to cope with the messier issue of the non-spacing character U+1A58 TAI THAM SIGN MAI KANG LAI, which acts like Burmese kinzi in the Pali text I am working on. (If one does not know the font well, one should not put a line break next to it unless all other options are exhausted.) Figure 9b also has an example of this issue. The initial consonant of saṅkhepaṃ (misspelt saṅkheppaṃ) 'collection, summary' is on line 9, while the rest of the word, starting <MAI KANG LAI, HIGH KHA, SIGN E>, is on line 10. There is weird hack that currently helps with LibreOffice - inserting CGJ turns off some parts of Indic shaping in the rest of the run. Or have I missed some new specification of Indic encoding? This helps with visually line-final SIGN E. Richard.