On Tue, Oct 2, 2018 at 12:50 AM Martin J. Dürst via Unicode < [email protected]> wrote:
> ... The only > operation that can cause problems is 'capitalize'. > > When I say "cause problems", I mean producing mixed-case output. I > originally thought that 'capitalize' would be fine. It is fine for > lowercase input: I stays lowercase because Unicode Data indicates that > titlecase for lowercase Georgian letters is the letter itself. But it > will produce the apparently undesirable Mixed Case for ALL UPPERCASE input. > > My questions here are: > - Has this been considered when Georgian Mtavruli was discussed in the > UTC? > - How have any other implementers (ICU,...) addressed this, in > particular the operation that's called 'capitalize' in Ruby? > By default, ICU toTitle() functions titlecase at word boundaries (with adjustment) and lowercase all else. That is, we implement Unicode chapter 3.13 Default Case Conversions R3 toTitlecase(x), except that we modified the default boundary adjustment. You can customize the boundaries (e.g., only the start of the string). We have options for whether and how to adjust the boundaries (e.g., adjust to the next cased letter) and for copying, not lowercasing, the other characters. See C++ and Java class CaseMap and the relevant options. markus

