Re: Umlaut and =?ISO-8859-1?Q?Tr=E9ma=2C_was=3A_Variation_?= =?ISO-8859-1?Q?sele___ctors_and_vowel_marks?=

Peter Kirk Tue, 13 Jul 2004 16:08:17 -0700

On 13/07/2004 20:02, Asmus Freytag wrote:

At 11:02 AM 7/13/2004, Peter Kirk wrote:
I was surprised to see that WG2 has accepted a proposal made by the US National Body to use CGJ to distinguish between Umlaut and Tréma in German bibliographic data.
You raise some interesting questions. However, note that the purpose of CGJ is intended for sorting related distinctions, which are at issue here. This is different from variation selectors which are intended to be used for displayed variations.

OK. But this is not a unique case. For example, in Hebrew Silluq and Meteg, Dagesh and Shuruq are pairs of different marks which share a glyph and so a Unicode character but may need to be distinguished for certain processes. Should similar encodings with CGJ be proposed to make these distinctions? For that matter, what if in a certain (hypothetical) language consonant Y and vowel Y should be collated differently? Would that justify an endoing of one of them with CGJ? But then these are not combining characters in the first place. So I must agree with Doug that "CGJ + COMBINING DIAERESIS is a hack".


On 13/07/2004 19:35, Doug Ewell wrote:

...

The alternative proposed by DIN, creating a new COMBINING UMLAUT
character, would have caused *unprecedented and catastrophic*
equivalence and normalization problems.

Understood. But I can argue in the same way that creating a new RIGHT HOLAM character for Holam Male would cause *catastrophic* equivalence and normalisation problems, although no longer unprecedented because we have the umlaut/tréma precedent. The situation is really very similar: two combining marks which are not distinguished in most modern typography, but which are distinguished graphically in some typefaces (if I remember correctly, in Fraktur as well as in the typefaces mentioned in Victor Gaultney's paper); and which have distinct interpretations and are distinguished in some existing data in which the distinction is important; but which should not be split into separate characters because this would seriously destabilise the majority of existing data in the script which does not make the distinction.

What many people are telling me to do with Holam Male (e.g. Less Preferred Option 4 in http://www.qaya.org/academic/hebrew/Holam2.html) is equivalent to the following solution to the umlaut/tréma problem: define a new tréma character, or perhaps new umlaut and tréma characters, to be used only in the German bibliographic data, and ignore the problem that this makes the bibliographic data incompatible with all other German text, and unable to be displayed by existing fonts until they get round to adding the new characters - as well as ignoring the problem that the precomposed characters have the wrong decomposition. (The Hebrew equivalent to this is that U+FB4B should decompose to Holam Male not Vav Haluma.) If that solution was not acceptable for German, why should it be acceptable for Hebrew?

It seems to me that the UTC should bite the bullet and accept that there is a need for variation sequences for combining marks, and either adjust the definitions of existing variation selectors or encode new specialised variation selectors for them. The adjusted or new variation selectors can then be used for Hebrew as well as for German - see my posting on this subject to the Hebrew list.

"When 256 variation selectors just won't do, invent another." (with apologies to Ken Whistler)

256 variation selectors won't do if they have all been defined unchangeably with the wrong properties e.g combining class. On the other hand, if the UTC is prepared to ignore the combining class and normalisation problems involved in using one combining class zero character, CGJ, to modify a combining mark, it may as well ignore the identical problems involved in using variation selectors, also combining class zero, with combining marks.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/

Re: Umlaut and =?ISO-8859-1?Q?Tr=E9ma=2C_was=3A_Variation_?= =?ISO-8859-1?Q?sele___ctors_and_vowel_marks?=

Reply via email to