On 12/14/2025 9:03 PM, Doug Ewell wrote:
Asmus Freytag wrote:

It would be correct for a sequence of a base character with _single_
combining mark, but as soon as you have two or more combining marks,
their order is defined by NFC.
I had mistakenly assumed that Phil’s use case considered only sequences with a 
single combining mark, and consciously chose to limit my response to that 
scenario.

I know that you were aware of the general case. What I was trying to communicate (and expounded upon in the other reply) is the degree to which human writing in the general case is highly complex, usually even more complex than most native speakers (other than typesetters) are ever aware of, even for their own language.

And it is acknowledging this complexity — and how it is necessarily reflected in anything that aims to be a universal system of character encoding — that drives the understanding that such a system must be full of complexities of its own that cannot even be reconciled down to a minimally simplistic system.

For those of us, unlike the questioner, who have been around this effort for any length of time, this complexity can seem to be a given. But many people who have not worked in this space are genuinely surprised and challenged by it. And that includes people who have impressive credentials in technical work. Without realizing it, they apply their own native understanding of writing systems as if that was exhaustive or even typical. When they try to come up with solutions, such as protocols, that need to be robust in the face of the full variety of global text (even only the living subset) they may reach conclusions that fatefully fall well short of what is needed, or they try to "simplify" away complexities that to them feel ill motivated.

Commonly, I also observe that solutions are proposed that "micro-manage" some well-understood or familiar subset of characters, but leave a protocol without meaningful solutions or safeguards to the vast majority which contains all the other scripts and writing systems.

There's no quick fix, but it is my firm conviction that we always need to start from a point of correctly scoping the issues as those belonging to a "universal" system of character encoding, as opposed to one that is optimized for some subset.

A./

Reply via email to