Re: Umlaut and =?ISO-8859-1?Q?Tr=E9ma=2C_was=3A_Variation_?= =?ISO-8859-1?Q?__selectors_and_vowel_marks?=

Peter Kirk Thu, 15 Jul 2004 02:33:02 -0700

On 15/07/2004 05:00, Asmus Freytag wrote:

At 01:52 PM 7/14/2004, Doug Ewell wrote:
It's not German data (with umlauts) that will be affected by this
solution, but non-German data (with diaereses) in German bibliographic
systems.  That makes it a much smaller problem.
the use of diaeresis is perfectly valid for words in fields that have a language ID 'German'.

The DIN request and the USNB solution didn't address this, because the problem to be solved was disambiguating {a, o, u}-with-trÃ©ma from {a, o, u}-with-umlaut. If there are combinations of (for example) a-with-trÃ©ma-and-something-else AND ALSO a-with-umlaut-and-something-else, then those two will need to be disambiguated somehow. But I strongly doubt that the latter case exists in German bibliographic data, though of course one never knows.
First off, there have to be corresponding entries in the sorting tables used for such data, to make that distinction have the correct effect. Since the sorting tables would not support anything ohter than <BASE, CGJ, DIAERESIS> there's no reason to introduce other sequences into the data.

Secondly, the dieresis is used to indicate that two vowels are pronounced separately. I haven't seen a case where the vowels would already be accented.

There are such cases (although in most but not all of them technically the vowel is not "already" accented because the diaeresis is encoded closer to the base letter than the accent). This is certainly the case in Greek, where diaeresis (indicating separate pronunciation) and accents commonly occur on the same vowel; there are precomposed forms in the Greek and Coptic and Greek Extended blocks. There are also a number of precomposed forms in Latin Extended-B and Latin Extended Additional with both diaeresis and another accent. Presumably these are used for some language or other (well, some for Pinyin, some for Livonian, others unspecified). And so they may occur in German bibliographic data. And in that database each of them must have been encoded either with umlaut or with trï¿½ma (presumably because they are understood as marking either a vowel quality modification or a separation), and there is at least the possibility that some combinations may have been encoded differently in different places in the database. (And foreign words may be used within book titles marked as German.) Therefore Unicode does need to consider the issue, both as a theoretical one (which is essentially equivalent in terms of its effect on normalisation to the theoretical problem with using variation selectors with combining characters) and potentially as a practical one.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/

Re: Umlaut and =?ISO-8859-1?Q?Tr=E9ma=2C_was=3A_Variation_?= =?ISO-8859-1?Q?__selectors_and_vowel_marks?=

Reply via email to