Mark Davis said (in reference to a long set of comments by
Philippe Verdy on this thread):

> The statements below are incorrect

And Philippe asked:

> Which "statements"? My message is mostly a read as a question, not as an 
> affirmation...

And I will attempt the fact-finding...

> CGJ is a combining character that extends the grapheme cluster started 
> before it, 

True but misleading. CGJ is a combining character, and like *all*
other nonspacing combining characters it has the property
Grapheme_Extend=True. CGJ's *function* is not to extend the grapheme
cluster before it; that just happens automatically, as for any
character with gc=Mn.

And that was a statement.

> but it does not imply any linking with the next grapheme cluster 
> starting at a base character.

True. Another statement.

> So, even if one encodes, A+CGJ+E, there will still be two distinct grapheme 
> clusters A+CGJ and E, and the exact role of the trailing CGJ in the A+CGJ is 
> probably just a pollution, given that this CGJ has no influence on the 
> collation order, so that the sequence A+CGJ+E will collate like A+E, 

Misconstrued. Whether CGJ influences the collation order or not
depends on how it is weighted in a tailored collation table. And
the main *point* of having a CGJ is to provide a target for tailored
collation, so that it *can* make a difference. Statements, by the way.

> and it 
> does not influence the rendering as well.

True. Another statement.

> A "correct" ligaturing would be A+ZWJ+E, 

A matter of opinion, neither obviously true nor false. And a statement.

> with the effect of creating three 
> default grapheme clusters,

False. The correct value is 2.

> that can be rendered as a single ligature, or as 
> separate A and E glyphs if the ZWJ is ignored.

True. And a statement.

> For example, a ligaturing opportunity can be encoded explicitly in the 
> French word "efficace":
> "ef"+ZWJ+"f"+ZWJ+"icace".

True (although superfluous). And a statement.

> Note however that the ZWJ prohibits breaking, 

False. ZWJ is lb=CM, which prevents a break *before*, but not
a break *after*.

> despite in French there's a 
> possible hyphenation at the first occurence, where it is also a syllable 
> break, but not for the second occurence that occurs in the middle of the 
> second syllable.

True (I assume) statements about French.

> I don't know how one can encode an explicit ligaturing opportunity, while 
> also encoding the possibility of an hyphenation (where the sequence above 
> would be rendered as if the first ZWJ had been replaced by an hyphen 
> followed a newline.)

True (I assume) statements about Philippe's state of knowledge.

> To encode the hyphenation opportunity, normally I would use the SHY format 
> control (soft hyphen):
> "ef"+SHY+"fi"+SHY+"ca"+SHY+"ce"

True (I assume) statements about Philippe's practice in text representation.

> 
> If I want to encode explicit ligatures for the "ffi" cluster, if it is not 
> hyphenated, I need to add ZWJ:

False (at least existentially, although I cannot comment on
your personal wants and needs). And a statement.

> "ef"+ZWJ+SHY+"f"+ZWJ+"i"+SHY+"ca"+SHY+"ce"    (1)

And as Doug pointed out, this is an incredibly baroque (and obtuse)
way of attempting to represent the word "efficace" in plain text.

> 
> The problem is whever ZWJ will have the expected role of enabling a ligature 
> if it is inserted between a letter and a SHY, instead of the two ligated 
> glyphs. In any case, the ligature should not be rendered if hyphenation does 
> occur, else the SHY should be ignored. So two rendering are to be generated 
> depending on the presence or absence of the conditional syllable break:
> - syllable break occurs, render as: "ef-"+NL+"f"+ZWJ+"icace", i.e. with a 
> ligature only for the "fi" pair, but not for the "ff" pair and not even for 
> the generated "f"+hyphen...
> - syllable break does not occur, render as "ef"+ZWJ+"f"+ZWJ+"icace", i.e. 
> with the 3-letter "ffi" ligature...

A whole series of statements. Together somewhat of a muddle for the
simple observation that "ffi" is not rendered with a single ligature
if there is a line break in the middle of it.

> 
> I am not sure if the string coded as (1) above has the expected behavior, 
> including for collation where it should still collate like the unmarked word 
> "efficace"...

True (I assume) statement about Philippe's state of knowledge.

Reading to the end, I find *only* statements here, and no question
actually posed.

In the future, if you want a message to be taken *as* a question,
it would be best to 1. Make it short, and 2. Actually pose a
question in it, preferably terminating the sentence to be so
interpreted with a "?"

--Ken


Reply via email to