Re: Combining characters

Asmus Freytag via Unicode Sun, 14 Dec 2025 21:47:03 -0800

On 12/14/2025 9:03 PM, Doug Ewell wrote:

Asmus Freytag wrote:

It would be correct for a sequence of a base character with _single_
combining mark, but as soon as you have two or more combining marks,
their order is defined by NFC.

I had mistakenly assumed that Phil’s use case considered only sequences with a 
single combining mark, and consciously chose to limit my response to that 
scenario.

I know that you were aware of the general case. What I was trying tocommunicate (and expounded upon in the other reply) is the degree towhich human writing in the general case is highly complex, usually evenmore complex than most native speakers (other than typesetters) are everaware of, even for their own language.

And it is acknowledging this complexity — and how it is necessarilyreflected in anything that aims to be a universal system of characterencoding — that drives the understanding that such a system must be fullof complexities of its own that cannot even be reconciled down to aminimally simplistic system.

For those of us, unlike the questioner, who have been around this effortfor any length of time, this complexity can seem to be a given. But manypeople who have not worked in this space are genuinely surprised andchallenged by it. And that includes people who have impressivecredentials in technical work. Without realizing it, they apply theirown native understanding of writing systems as if that was exhaustive oreven typical. When they try to come up with solutions, such asprotocols, that need to be robust in the face of the full variety ofglobal text (even only the living subset) they may reach conclusionsthat fatefully fall well short of what is needed, or they try to"simplify" away complexities that to them feel ill motivated.

Commonly, I also observe that solutions are proposed that "micro-manage"some well-understood or familiar subset of characters, but leave aprotocol without meaningful solutions or safeguards to the vast majoritywhich contains all the other scripts and writing systems.

There's no quick fix, but it is my firm conviction that we always needto start from a point of correctly scoping the issues as those belongingto a "universal" system of character encoding, as opposed to one that isoptimized for some subset.

A./

Re: Combining characters

Reply via email to