On Tuesday, July 01, 2003 4:09 PM, Pim Blokland <[EMAIL PROTECTED]> wrote:
> Maybe it was a bad idea to include ij as a character in Unicode at
> all, but now it's there, there's no reason to ignore it when
> refining the rules, to deprecate it practically.

No, that was needed for correct Dutch support. Look at the case
conversion of <ij> into <IJ>, even with titlecase...

The character itself is not breakable in Dutch where it is definitely
not a ligature, but a single character, with its own case conversion
rule, exactly like the <ae> and <AE> letters (considered as
ligatures or as unreakable letters depending on the language that
use them).

That's why <ij> and <IJ> are not canonically decomposable as
<i, j> and <I, J> (this is just a compatibility decomposition).

If it had only been a shortcut character mapped for compatibility
reasons from some 8-bit encodings, it would have been normalized
with a canonical decomposition.

(the exception to this rule is the inclusion of Arabic ligatures which
were clearly and always decomposable, but that could not be
canonically decomposed because it would have required more than
a character pair for the NFD equivalence, so they are only
given a NFKD decomposition and their usage is strongly
deprecated, and just included for an unnecessary roundtrip
conversion from legacy Arabic encodings).

-- Philippe.


Reply via email to