On 12/14/2025 9:57 AM, Doug Ewell via Unicode wrote:
Normalization (NFC or NFD, not NFK*) for characters like this comes into play 
only when the character exists as both a precomposed unitary character and a 
combining sequence. When there is only one or the other, normalization to NFC 
or NFD yields the same result, and is thus a no-op, and not particularly 
adventurous.

This is actually incorrect. (And Doug actually knows better :) ).

It would be correct for a sequence of a base character with */single /*combining mark, but as soon as you have two or more combining marks, their order is defined by NFC. The idea is that that if two combining marks don't interact (such as by stacking), different orders could result in the same display and normalization enforces a preferred ordering.

To make matters more complex, some combining marks are defined to not reorder. Those can be in any order defined by the author and could lead to duplicate encoding for the same display. The reasons behind supporting that are a bit complex, but generally it's done for scripts other than Latin.

But in general, */canonical reordering/* is a thing and is part of normalization.

A./

Reply via email to