About combining classes

Philippe Verdy Fri, 27 Jun 2003 04:02:57 -0700

When I just look at the history of combining classes, they did not exist in the first 
Unicode standard, and they still don't exist in ISO10646 as well.
This was a technology developed by IBM and offered for free to the community to allow 
a simplified management of encoded texts, and it has long been informative (as well as 
the proposed normalization forms), before it was recognized it would be useful.


However, if there are things that this added property of characters that may break the 
encoding of languages (including future languages that may be encoded), I think that 
this creates an opportunity to standardize the use of a specific character that will 
allow bypassing the constraints added by these now standard combining classes when it 
is needed.

The case of Biblic Hebrew is what will occur in the future because combining classes 
have been defined to stay here for a long time, as it solves many problems with modern 
languages. Of course the CGJ character works, but we'll have more pressure in the 
future to use some bypassing encoding features when this is really needed for any 
newly encoded text.

Without this added character (CGJ for example), all future encoded scripts may simply 
abandon the idea of assigning non-zero combining classes, despite they would be useful 
in many cases to detect the *most common* obvious equivalences and simplify the 
unification of text with the same semantic and graphical rendering.

We *must not* come back on the encoding of Hebrew. Traditional Hebrew is definitely a 
distinct language, the same way that for Old Greek, or Old Hungarian, or the various 
regional forms of languages written historically with many variants of diacritics on 
Latin letters. This problem will become more important when Cuneiform or Phenician 
will be encoded, and I'm quite certain that many old Brahmic scripts will suffer of 
the same difficulties when we will try to adapt the model adopted for modern Brahmic 
scripts (and that work in their domain).

If we cant to keep Unicode unified, we must not break this unification of characters 
by assigning new characters when this is not justified (there's *no* clear historic 
frontier between old and new versions of a language, and scripts have always evolved 
gradually, sometimes in parallel with contradictory rules).

So if we need to be able to encode old historic text, we cannot avoid using some 
special combining mark on places where the unification with the "modern" usage of the 
script cause problems. In addition, we can accept the fact that old text will be more 
difficult to manage in softwares, if on the opposite the most common use of the script 
in modern languages requires being able to allow useful simplifications (such as 
combining classes).

Let's keep the combining classes as they are defined now. They are useful but do not 
solve all the problem tied to the unification of encoded text. Working on old historic 
text is a matter of specialists and scholars, and all we need to do is to offer them a 
framework in which the modern simplifications will not cause them too much problems.

That's why I think that using the CGJ combining character is not a "kludge" for Biblic 
Hebrew. This is an extension of the encoding of the modern script to allow encoding 
old texts, and this will probably appear later when studying all manuscripts of Latin 
or Greek, or Glagolitic texts, where the combining marks have slightly evolved in 
their glyphic position, meaning that the modern combining class may not be appropriate 
for the old uses.

So it is simpler to say to scholars that study old languages that Unicode can offer 
them a way to unify their script with the modern script, if we allow and document more 
clearly that some control characters or special marks can be used to bypass the 
required constraints defined in the modern script. CGJ, if officially documented as a 
legal way to override the combining classes of combining characters that follow it so 
that they won't be reordered furing normalization, may prove to be useful in many 
future encoded old texts...

-- Philippe.

About combining classes

Reply via email to