Mark Davis said (in reference to a long set of comments by Philippe Verdy on this thread):
> The statements below are incorrect And Philippe asked: > Which "statements"? My message is mostly a read as a question, not as an > affirmation... And I will attempt the fact-finding... > CGJ is a combining character that extends the grapheme cluster started > before it, True but misleading. CGJ is a combining character, and like *all* other nonspacing combining characters it has the property Grapheme_Extend=True. CGJ's *function* is not to extend the grapheme cluster before it; that just happens automatically, as for any character with gc=Mn. And that was a statement. > but it does not imply any linking with the next grapheme cluster > starting at a base character. True. Another statement. > So, even if one encodes, A+CGJ+E, there will still be two distinct grapheme > clusters A+CGJ and E, and the exact role of the trailing CGJ in the A+CGJ is > probably just a pollution, given that this CGJ has no influence on the > collation order, so that the sequence A+CGJ+E will collate like A+E, Misconstrued. Whether CGJ influences the collation order or not depends on how it is weighted in a tailored collation table. And the main *point* of having a CGJ is to provide a target for tailored collation, so that it *can* make a difference. Statements, by the way. > and it > does not influence the rendering as well. True. Another statement. > A "correct" ligaturing would be A+ZWJ+E, A matter of opinion, neither obviously true nor false. And a statement. > with the effect of creating three > default grapheme clusters, False. The correct value is 2. > that can be rendered as a single ligature, or as > separate A and E glyphs if the ZWJ is ignored. True. And a statement. > For example, a ligaturing opportunity can be encoded explicitly in the > French word "efficace": > "ef"+ZWJ+"f"+ZWJ+"icace". True (although superfluous). And a statement. > Note however that the ZWJ prohibits breaking, False. ZWJ is lb=CM, which prevents a break *before*, but not a break *after*. > despite in French there's a > possible hyphenation at the first occurence, where it is also a syllable > break, but not for the second occurence that occurs in the middle of the > second syllable. True (I assume) statements about French. > I don't know how one can encode an explicit ligaturing opportunity, while > also encoding the possibility of an hyphenation (where the sequence above > would be rendered as if the first ZWJ had been replaced by an hyphen > followed a newline.) True (I assume) statements about Philippe's state of knowledge. > To encode the hyphenation opportunity, normally I would use the SHY format > control (soft hyphen): > "ef"+SHY+"fi"+SHY+"ca"+SHY+"ce" True (I assume) statements about Philippe's practice in text representation. > > If I want to encode explicit ligatures for the "ffi" cluster, if it is not > hyphenated, I need to add ZWJ: False (at least existentially, although I cannot comment on your personal wants and needs). And a statement. > "ef"+ZWJ+SHY+"f"+ZWJ+"i"+SHY+"ca"+SHY+"ce" (1) And as Doug pointed out, this is an incredibly baroque (and obtuse) way of attempting to represent the word "efficace" in plain text. > > The problem is whever ZWJ will have the expected role of enabling a ligature > if it is inserted between a letter and a SHY, instead of the two ligated > glyphs. In any case, the ligature should not be rendered if hyphenation does > occur, else the SHY should be ignored. So two rendering are to be generated > depending on the presence or absence of the conditional syllable break: > - syllable break occurs, render as: "ef-"+NL+"f"+ZWJ+"icace", i.e. with a > ligature only for the "fi" pair, but not for the "ff" pair and not even for > the generated "f"+hyphen... > - syllable break does not occur, render as "ef"+ZWJ+"f"+ZWJ+"icace", i.e. > with the 3-letter "ffi" ligature... A whole series of statements. Together somewhat of a muddle for the simple observation that "ffi" is not rendered with a single ligature if there is a line break in the middle of it. > > I am not sure if the string coded as (1) above has the expected behavior, > including for collation where it should still collate like the unmarked word > "efficace"... True (I assume) statement about Philippe's state of knowledge. Reading to the end, I find *only* statements here, and no question actually posed. In the future, if you want a message to be taken *as* a question, it would be best to 1. Make it short, and 2. Actually pose a question in it, preferably terminating the sentence to be so interpreted with a "?" --Ken