From: "Mark Davis" <[EMAIL PROTECTED]>
I want to correct some misperceptions about CGJ; it should not be used for
ligatures.

True. CGJ is a combining character that extends the grapheme cluster started before it, but it does not imply any linking with the next grapheme cluster starting at a base character.


So, even if one encodes, A+CGJ+E, there will still be two distinct grapheme clusters A+CGJ and E, and the exact role of the trailing CGJ in the A+CGJ is probably just a pollution, given that this CGJ has no influence on the collation order, so that the sequence A+CGJ+E will collate like A+E, and it does not influence the rendering as well.

A "correct" ligaturing would be A+ZWJ+E, with the effect of creating three default grapheme clusters, that can be rendered as a single ligature, or as separate A and E glyphs if the ZWJ is ignored.

For example, a ligaturing opportunity can be encoded explicitly in the French word "efficace":
"ef"+ZWJ+"f"+ZWJ+"icace".


Note however that the ZWJ prohibits breaking, despite in French there's a possible hyphenation at the first occurence, where it is also a syllable break, but not for the second occurence that occurs in the middle of the second syllable.

I don't know how one can encode an explicit ligaturing opportunity, while also encoding the possibility of an hyphenation (where the sequence above would be rendered as if the first ZWJ had been replaced by an hyphen followed a newline.)

To encode the hyphenation opportunity, normally I would use the SHY format control (soft hyphen):
"ef"+SHY+"fi"+SHY+"ca"+SHY+"ce"


If I want to encode explicit ligatures for the "ffi" cluster, if it is not hyphenated, I need to add ZWJ:
"ef"+ZWJ+SHY+"f"+ZWJ+"i"+SHY+"ca"+SHY+"ce" (1)


The problem is whever ZWJ will have the expected role of enabling a ligature if it is inserted between a letter and a SHY, instead of the two ligated glyphs. In any case, the ligature should not be rendered if hyphenation does occur, else the SHY should be ignored. So two rendering are to be generated depending on the presence or absence of the conditional syllable break:
- syllable break occurs, render as: "ef-"+NL+"f"+ZWJ+"icace", i.e. with a ligature only for the "fi" pair, but not for the "ff" pair and not even for the generated "f"+hyphen...
- syllable break does not occur, render as "ef"+ZWJ+"f"+ZWJ+"icace", i.e. with the 3-letter "ffi" ligature...


I am not sure if the string coded as (1) above has the expected behavior, including for collation where it should still collate like the unmarked word "efficace"...





Reply via email to