Re: Accented ij ligatures (was: Unicode Public Review Issues update)

Philippe Verdy Tue, 01 Jul 2003 09:23:17 -0700

On Tuesday, July 01, 2003 4:09 PM, Pim Blokland <[EMAIL PROTECTED]> wrote:
> Maybe it was a bad idea to include ĳ as a character in Unicode at
> all, but now it's there, there's no reason to ignore it when
> refining the rules, to deprecate it practically.


No, that was needed for correct Dutch support. Look at the case
conversion of <ij> into <IJ>, even with titlecase...

The character itself is not breakable in Dutch where it is definitely
not a ligature, but a single character, with its own case conversion
rule, exactly like the <ae> and <AE> letters (considered as
ligatures or as unreakable letters depending on the language that
use them).

That's why <ij> and <IJ> are not canonically decomposable as
<i, j> and <I, J> (this is just a compatibility decomposition).

If it had only been a shortcut character mapped for compatibility
reasons from some 8-bit encodings, it would have been normalized
with a canonical decomposition.

(the exception to this rule is the inclusion of Arabic ligatures which
were clearly and always decomposable, but that could not be
canonically decomposed because it would have required more than
a character pair for the NFD equivalence, so they are only
given a NFKD decomposition and their usage is strongly
deprecated, and just included for an unnecessary roundtrip
conversion from legacy Arabic encodings).

-- Philippe.

Re: Accented ij ligatures (was: Unicode Public Review Issues update)

Reply via email to