Re: Combining characters

Asmus Freytag via Unicode Sun, 14 Dec 2025 14:07:58 -0800

On 12/14/2025 9:57 AM, Doug Ewell via Unicode wrote:

Normalization (NFC or NFD, not NFK*) for characters like this comes into play 
only when the character exists as both a precomposed unitary character and a 
combining sequence. When there is only one or the other, normalization to NFC 
or NFD yields the same result, and is thus a no-op, and not particularly 
adventurous.


This is actually incorrect. (And Doug actually knows better :) ).

It would be correct for a sequence of a base character with */single/*combining mark, but as soon as you have two or more combining marks,their order is defined by NFC. The idea is that that if two combiningmarks don't interact (such as by stacking), different orders couldresult in the same display and normalization enforces a preferred ordering.

To make matters more complex, some combining marks are defined to notreorder. Those can be in any order defined by the author and could leadto duplicate encoding for the same display. The reasons behindsupporting that are a bit complex, but generally it's done for scriptsother than Latin.

But in general, */canonical reordering/* is a thing and is part ofnormalization.

A./

Re: Combining characters

Reply via email to