On Wed, 16 Sep 2015 02:45:27 +0100 Daniel Bünzli <daniel.buen...@erratique.ch> wrote:
> This will delimit a single grapheme cluster, but if I try to add up > their east asian widths (W, N, N), this would result in 4 columns. > Does something naïve like looking up only the east asian width of the > first scalar value in the grapheme cluster and use 2 columns for it > if this is F or W and 1 column otherwise work or are there counter > examples where this breaks ? Or is there anything more clever that > can be done ? The silence is a bit worrying, but I can't see why that wouldn't work for normal text in CJK scripts. (Hangul LLLLLVVVVTTTT would probably cause some problems!) Have you addressed the issue of Indic scripts? There are discontiguous grapheme clusters composed of indecomposable code points (e.g. U+17C4 KHMER VOWEL SIGN OO) and of decomposable code points (e.g. U+0BCA TAMIL VOWEL SIGN OO), and whether consonant + virama + consonant is one cell or two may even depend on the font (e.g. Devanagari). How are you handling ligatures between grapheme clusters, e.g. English <f, i>? There are Tamil and Tai Tham examples of compulsory ligatures, shri and naa. Looking further ahead, there are characters in the pipeline that should be either Mc or Mn depending on what the base consonant is! You have dealt with grapheme clusters with a width of one cell and a depth of two, haven't you? Actually, there's a good argument for some grapheme clusters occupying cells above and below the line! Richard.