2015-12-09 22:45 GMT+01:00 Richard Wordingham < richard.wording...@ntlworld.com>:
> On Wed, 9 Dec 2015 19:55:24 +0000 > Hans Meiser <bril...@hotmail.com> wrote: > > > I see. > > > > Yet, the u+1E9E doesn't quite look like two capital "S". So any > > program implementing a conversion conforming to Unicode will > > currently display/print in a wrong result: "MAßE" instead of the > > correctly converted result "MASSE". > > While the default simple uppercasing of "maße" will yield "MAßE", the > default full uppercasing will yield "MASSE". > Full uppercasing rules are normally locale-sensitive, and thus there should exist a specific rule for German not yielding this result (see for example the rules for Turkish dotless i vs dotted i). I don't think these locale-sensitive rules are irrevocably stable as more locales can be added at any time for some languages needing specific pairs. The stabilized properties are for locale-neutral mappings only, in generic contexts where the language is not known (including for standard normalizations, or for the locale-neutral "root" collations and the associated DUCET). Even for the same language, these rules cannot be hardcoded in a stable way, orthographies are evoluting over time, unless you use a locale identifying the orthographic rule precisely (and the associated rulesets are checked and corrected to reach a stable consensus: if there's an evolution or variants, use another locale identifier) and that specific orthography is entirely known (this is difficult for historic orthographies or when there's no recognized language academy or national institution fixing the rule to use for some country or region, but even these institutions are working in their current working time and limiting their scope to some applications, they will not reforme the history). > I am not aware of a useful definition of 'conforming to Unicode' that applies to either transformation. I am not aware of a useful definition of 'conforming to Unicode' that > applies to either transformation. So if you look for an example look at how this is made for Turkish. Basically this is just a matter of tailoring for specific locales.