Re: German sharp S uppercase mapping

Asmus Freytag via Unicode Sun, 01 Dec 2024 22:16:51 -0800

On 12/1/2024 5:48 PM, Dominikus Dittes Scherkl via Unicode wrote:

Am 30.11.24 um 18:16 schrieb Asmus Freytag via Unicode:

On 11/27/2024 12:15 PM, Dominikus Dittes Scherkl via Unicode wrote:
However, speaking of this as a "default" is confusing to readers who
think in terms of text processing or authoring environments where a
different set of requirements rule. Here, the proper "default" is the
best implementation of a culturally appropriate case transform.


NO. I really mean "default" in a technical sense, not something someone
tailors to local needs.
The ẞ was introduced to have an invertible casing, just like
compatibility codepoints were assigned to make preservation of old
formating information available if a translation back to some obsolete
charset is necessary.

_This new letter was invented to allow for 1:1 roundtrip conversion._

The letter was not *invented*. It was discovered (= identified asoccurring in actual writing) and encoded.

It was encoded to match a character with a unique shape and properties.One of them of *being* a capital letter and the other one of ß being itslowercase equivalent.


toUpper() shall change "ß" to "ẞ" instead of "SS", just to allow
toLower() producing back "ß" instead of a wrong spelling with "ss"
(which at the moment can only be avoided using a german dictionary - a
really heavy constraint to a small function like toLower - and for
family names simply not possible at all - the information is lost).

Your problem is that you assume an implementation of toUpper that takesno argument. For purposes like text design, publication etc. you want animplementation that selects which locale should set the rules. (Or one,where that setting is done behind the scenes, which is logicallyequivalent). Without specifiying the locale, your beautiful toUpper()does not now that in Turkish, 'i' is not mapped to 'I' but to CAPITAL IWITH DOT.

Because your beautiful toUpper does not handle at least one languagemeans that it should not need to handle any languages. Instead it shouldbe stable.

What you are describing is a change to the toUpper() that is invokedwith the german locale as parameter (or selected behind the scenes).

There's not the same requirement for that one to be stable, althoughsometimes transitions are implemented by creating a separate locale for"old" and "new" orthographies and the like.


When it comes to case conversion, purpose matters.

This doesn't detract from the need to have implementations that do the"right" thing (as currently defined) for a given language. And from theneed to enable these by default for ordinary text manipulation.

But it's not the same thing as overriding an "identifier-safe" or"filesystem-safe" implementation, just because that's incorrectly viewedas a "default" that should be applicable to text manipulation.

A./


This is a really bad situation, which should be fixed as soon as
possible, not a matter of taste.
And it should be fixed explicitly in automatic text processing - because
this is were today errors are produced, that can now be avoided.
In private letters it doesn't matter what form is used - the people
write whatever they want anyway. But automatic processing shall not drop
information that can not be brought back (expcept with re-introducing
this knowledge back manually).

And what is "best"  can change over time.

No. Fixing this round-trip bug is in the best interest of unicode and
that won't change over time. Using "SS" in all uppercase text was always
a bad workaround that became a source of spelling errors by automatic
text processing and for which a fix was invented some ten years ago. So
lets use it everywhere - at least now that it is officially allowed
(since 2017) and even preferred (since this year).

Re: German sharp S uppercase mapping

Reply via email to