On Thursday, July 10, 2003 6:42 PM, Peter Kirk <[EMAIL PROTECTED]> wrote:

> Anyway, I understood from the recent discussion of Hebrew that it is
> Unicode policy not to do anything which could theoretically invalidate
> existing text even if it could be proved that no such text existed.

Where does the fact of saying that a Grapheme Disjoiner can be used in Turkish to 
avoid that the f collapses the dot above a next lowercase i?

This does not change anything: existing texts can still produce ligatures in a 
renderer, unless explicitly said to not do so with a Grapheme Disjoiner, or the 
renderer is specially tuned to support the Turkish/Azeri languages. Existing texts do 
not need to be reencoded, if they are already correctly labelled with their language.

The absence of such language specifier will never forbid a renderer to choose a fi 
ligature if available, unless these renderers are made conforming by correctly 
interpreting the Grapheme Disjoiner to mean "break the grapheme cluster here, and 
display the previous character(s)", then the Grapheme Disjoiner can be rendered itself 
as a non-spacing empty glyph, then the rest of the string...

I'm still convinced that a ligature is still possible for a turkish <f, dotted-i> 
sequence, using <f, i, dot-above>. The ligature would apply to the middle bar of the 
<f> joined with the top serif of the <i>, but the top-right loop of the f would simply 
be a small horital bar, disjoined from the dot still present on the i.

The same ligature could be used for the encoded sequence <f, dotless-i>, so an actual 
font would render the glyphs for <f, i, dot-above> as a base ligature glyph for <f, 
dotless-i> (with a top horizontal bar for the <f> part), and add separately the 
<dot-above> glyph kerned into the existing <f-dotless-i> ligature.

To force disable this last ligature, we would use the encoded sequence <f, GDJ, 
dot-less-i>

According to unicode the sequence <i, dot-above> has always been valid, despite it 
apparently has the same dotted glyph for all languages. It differs only in the fact 
that the explicit <dot-above> removes the Soft_Dotted property of the previous <i> to 
make it dotless, followed by a forced diacritic.

So the encoded sequence <i, dot-above> is now made "equivalent" (for rendering 
purpose) to <dotless-i, dot-above> (despite they are not canonically equivalent per 
UAX#15: NFC/D) and not "equivalent" to an isolated <i> (not followed above 
diacritics)...

-- Philippe.

Reply via email to