RE: Unicode 11 Georgian uppercase vs. fonts

Peter Constable via Unicode Fri, 27 Jul 2018 18:58:37 -0700

Just an observation on these issues: When the Mtavruli proposal was first 
presented to UTC, several UTC members voiced strong reservation because of the 
kind of issues mentioned for case mapping, and in particular on database 
indexing and querying. Several months later, various UTC members participated 
in a teleconference with representation from Georgian institutions, including 
IT people from Bank of Georgia and TBC Bank. During that meeting, the 
representatives of the Georgian enterprises (i) demonstrated an understanding 
of those issues and the implications, (ii) gave an indication of support from 
those enterprises and a commitment to update their applications as may be 
required, and (iii) gave indication of intent to develop a plan of action for 
preparing their institutions for this change as well as communicating that 
within Georgian industry and society. It was only after that did UTC feel it 
was viable to proceed with encoding Mtavruli characters.

Peter

From: Unicode <[email protected]> On Behalf Of Asmus Freytag via 
Unicode
Sent: Friday, July 27, 2018 7:01 AM
To: [email protected]
Subject: Re: Unicode 11 Georgian uppercase vs. fonts

On 7/27/2018 3:42 AM, Michael Everson via Unicode wrote:

Yes and it explains clearly that “effectively caseless Georgian” is incorrect. 
Georgian has case. Georgian uses case differently from other scripts. This is 
an orthographic distinction, not a structural one. In fact as it is also stated 
in the proposal, there are 19th-century texts which do titlecase. It’s just 
that that orthography is no longer in use and that behaviour no longer 
desirable.

"Georgian uses case differently from other scripts"

That's one of the key issues here for developers (and users) of libraries. 
Because it means that any implicit assumptions about the applicability of a 
certain case-transform is now broken.

This goes beyond whether fonts are actually installed now or at the end of some 
transition period, or ever: if functions like ToUpper, which used to have no 
effect on Georgian before, suddenly do - in ways that the users of the script 
do not expect, then your application is broken, from one day to the next.

The current situation prior to the change is perhaps best characterized by 
saying that there was support for some locale differences in the way certain 
characters were mapped, but not in whether or not to do a given mapping at all.

If, as has been suggested, the use of case in Georgian is more similar to that 
of smallcaps in other scripts, then, instead of ToUpper doing a case 
transformation for Georgian, what would be need is something like a 
"ToSmallCaps" function (better name here, because the Georgian letters aren't 
actually "small caps").

That way, the existing "ToUpper" could retain its implicit semantic of 
"uppercase transformation in those scripts where such transformations are used 
in a common way".

This would solve 1/2 of the problem, which is to prevent uppercasing where 
users of Georgian do not expect it. However, it does not work in plain text for 
the other scripts, because there, small caps are not encoded, so there's no 
plain-text solution.

To get back to Markus' original question on how to handle this for ICU: it 
seems more and more that Georgian should be exempted from standard library 
functions and that a new function needs to be added that just transforms 
Georgian and leaves all other scripts alone (or one that takes a language/local 
parameter).

A./

RE: Unicode 11 Georgian uppercase vs. fonts

Reply via email to