I see no easy way to convert ALL UPPERCASE text with consistant casing as there's no rule, except by using dictionnary lookups. In reality data should be input using default casing (as in dictionnary entries), independantly of their position in sentences, paragraphs or titles, and the contextual conversion of some or all characters to uppercase being done algorithmically (this is safe for conversion to ALL UPPERCASE, and quite reliable for conversion to Tile Case, with just a few dictionnary lookups for a small set of knows words per language.
Note that title casing works differently in English (which is most often abusing by putting capitales on every word), while most other languages capitalize only selected words, or just the first selected word in French (in addition to the possible first letter of non-selected words such as definite and indefinite articles at start of the sentence). Capitalization of initials on every word is wrong in German which uses capitalisation even more strictly than French or Italian: when in doubts, do not perform any titlecasing, and allow data to provide the actual capitalization of titles directly (it is OK and even recommanded in German to have section headings, or even book titles, written as if they were in the middle of sentences, and you capitalize only titles and headings that are full sentences grammatically, but not simple nominal groups. So title casing should not even be promoted by the UCD standard (where it is in fact using only very basic, simplistic rules) and applicable only in some applications for some languages and in specific technical or rendering contexts. Le mar. 2 oct. 2018 à 22:21, Markus Scherer via Unicode <[email protected]> a écrit : > On Tue, Oct 2, 2018 at 12:50 AM Martin J. Dürst via Unicode < > [email protected]> wrote: > >> ... The only >> operation that can cause problems is 'capitalize'. >> >> When I say "cause problems", I mean producing mixed-case output. I >> originally thought that 'capitalize' would be fine. It is fine for >> lowercase input: I stays lowercase because Unicode Data indicates that >> titlecase for lowercase Georgian letters is the letter itself. But it >> will produce the apparently undesirable Mixed Case for ALL UPPERCASE >> input. >> >> My questions here are: >> - Has this been considered when Georgian Mtavruli was discussed in the >> UTC? >> - How have any other implementers (ICU,...) addressed this, in >> particular the operation that's called 'capitalize' in Ruby? >> > > By default, ICU toTitle() functions titlecase at word boundaries (with > adjustment) and lowercase all else. > That is, we implement Unicode chapter 3.13 Default Case Conversions R3 > toTitlecase(x), except that we modified the default boundary adjustment. > > You can customize the boundaries (e.g., only the start of the string). > We have options for whether and how to adjust the boundaries (e.g., adjust > to the next cased letter) and for copying, not lowercasing, the other > characters. > See C++ and Java class CaseMap and the relevant options. > > markus >

