On Wed, 16 Sep 2015 22:34:17 +0100 Daniel Bünzli <daniel.buen...@erratique.ch> wrote:
> Le mercredi, 16 septembre 2015 à 20:33, Richard Wordingham a écrit : > > Have you addressed the issue of Indic scripts? There are > > discontiguous grapheme clusters composed of indecomposable code > > points (e.g. U+17C4 KHMER VOWEL SIGN OO) and of decomposable code > > points (e.g. U+0BCA TAMIL VOWEL SIGN OO), > > Not sure I understand what you mean here. In Khmer, a sequence <KA, sign OO> is rendered with glyphs in the order /sign E, KA, sign AA/, and in Tamil a sequence <KA, sign OO> is rendered with the glyphs in the order /sign EE, KA, sign AA/. All the glyphs have non-zero advance width. In both cases <KA, sign OO> splits into two legacy grapheme clusters <KA>, <sign OO> but are a single extended grapheme cluster. In Tamil, <KA, sign OO> is in NFC but not in NFD, and splits into > > and whether consonant + virama + consonant is one cell or two may > > even depend on the font (e.g. Devanagari). > > Well anything that is related to font metrics is out of scope from > the point of view of a tty as I can't get the information. You asked, "Is there any guidance on how to combine the information given by grapheme clusters and the east asian width property to do fixed-width layouts in terminal emulators ?". From this, I deduced that you are trying to write a terminal emulator. Are you actually trying to work out how a terminal emulator someone else wrote will position characters? Whether consonant + virama +consonant is once cell or two isn't a question of font metrics. For example, consider the sequence <U+0921 DEVANAGARI LETTER DDA, U+094D DEVANAGARI SIGN VIRAMA, U+0921>. This is composed of two legacy and extended grapheme clusters, <U+0921, U+094D> and <U+0921>. In the 'Lohit Hindi' font, the two consonants are arranged vertically with no other representation of VIRAMA; horizontally, this is a single cell. In the 'gargi' font, one gets two instances of DDA side by side, with VIRAMA visible below the first. Both fonts are fully compliant with Unicode. If the terminal you are working with emulates a VT100, I believe it should be possible to ask it what the current cursor position is. At http://www.ccs.neu.edu/research/gpc/VonaUtils/vona/terminal/VT100_Escape_Codes.html , the query and response are called getcursor DSR and cursor CPR. > For > example it seems that U+1F400 to U+1F579 have an east-asian width of > N but will actually occupy two columns in the built-in osx terminal; > of course these scalar values are not east asian text per se. In so far as the property is useful, they probably should be ea=Wide. > Of course the best way would be to be able to hand out a string to > the tty for it to measure. But then it already seems impossible to > test whether a terminal is able to handle UTF-8 or not… > Maybe trying to use that east asian width property, was not a good > idea to start with. If you're trying to work out what a particular emulator will do, the starting point is its documentation. For many, the useful documentation may turn out to be the source code, which is not always available. However, a successful dialogue with the terminal would avoid these problems. It may even offer a solution to the problems of terminal size and text wrapping behaviour. Richard.