John H. Jenkins <jenkins at apple dot com> wrote:

> Remember, though that the Unicode approach is that ZWJ is *not* the
> preferred Unicode way to support things like a discretionary ct
> ligature in Latin text.  The standard says that the preferred way to
> handle this is through higher-level protocols.
>
> I know that you and I disagree with to what extent ligation control
> belongs in plain text, but the standard clearly allows both
> approaches.  The ZWJ mechanism is not *the* Unicode approach.

Once again, I have done a poor job of expressing myself on this topic.
Sometimes misunderstandings are the speaker's fault, sometimes the
listener's, and sometimes both.  In this case it is clearly my fault for
not communicating well.

I should not have implied that ZWJ was the only way to effect ligation
in Unicode Latin text, or that the user (or even the software) should
have to insert ZWJ everywhere ligatures are desired.  Rendering
subsystems can certainly use their own judgement to ligate or not.

The way I read the ZWJ in regard to ligation is as a request to the
renderer to override the default, in effect saying, "Look, dammit, I
want a ligature here."  The renderer (possibly influenced by the
capability of the font) still has the right to decline that request.

Let's consider our good old friend, the "ct" ligature.  Courier is a
good example of a font that had better darned well *not* have a ct
ligature; it would just look too weird.  Helvetica (≈ Arial) might or
might not have a "ct" ligature, but rendering systems using Helvetica
probably would not use it by default.  If Baskerville is used instead,
the chances of using the ligature by default might be somewhat higher.

(Note that I am deliberately avoiding the question of "default modes" of
fonts, or any mention of specific font technologies.  Also note that I
am steering way clear of the language-dependent "fi" ligature.)

So if the text contains the letters "ct", a Courier rendition definitely
would not ligate them by default, and a Helvetica rendition probably
would not, but a Baskerville rendition might.  This is all up to the
designers of the font and rendering engine, of course.  (Please, if you
are a font designer and know that one of these examples is wrong, be
gentle and just treat them as examples.)

Now, if the text contains c + ZWJ + t, that should tell the renderer
that the user would really, really like to see a ligature if possible.
In the case of Courier, it *isn't* possible, so you still get a "c" and
a "t".  In the case of Helvetica and Baskerville, assuming those fonts
have a "ct" ligature, the default (whatever it was) should be overridden
and the ligature should be displayed.

The same thing is true for ZWNJ.  That is, if the default behavior for
Baskerville is to ligate "ct", then c + ZWNJ + t should result in two
discrete letters.  Now, we know that fonts and renderers already do this
without being told, because ZWNJ breaks up the combination that would
otherwise be ligated, and that behavior (while accidental) is correct.

My point is that, if fonts and renderers are *also* breaking up
potential ligatures because of an intervening ZWJ, that is NOT correct
according to Unicode.  The accidental, naïve behavior that does the
right thing for ZWNJ does not do the right thing for ZWJ.

This is what I am proposing be changed: fonts and/or rendering engines
(wherever the intelligence lies, depending on the vendor technology)
should be updated to recognize "letter + ZWJ + letter" (and similar
combinations of 3 or more letters) as a request to ligate the characters
if possible.

I am *not* suggesting that fonts and rendering engines and intelligent
text processing tools like InDesign be stripped of all power to control
ligation.  They are probably in an excellent position to do so.  (I
wish, oh how I wish, that Microsoft Word had some facility for
generating ligatures.)  And I am *not* suggesting that user overrides of
the default ligation behavior be limited to inserting ZWJ or ZWNJ.  If
programs like InDesign give the user a convenient option to turn
ligation on and off, globally or locally, more power to them.  What I am
suggesting is that the Unicode ZWJ and ZWNJ *also* be honored as a way
to control ligation.  That is how I read the Unicode Standard.

-Doug Ewell
 Fullerton, California


Reply via email to