On Tuesday, July 01, 2003 4:09 PM, Pim Blokland <[EMAIL PROTECTED]> wrote: > Maybe it was a bad idea to include ij as a character in Unicode at > all, but now it's there, there's no reason to ignore it when > refining the rules, to deprecate it practically.
No, that was needed for correct Dutch support. Look at the case conversion of <ij> into <IJ>, even with titlecase... The character itself is not breakable in Dutch where it is definitely not a ligature, but a single character, with its own case conversion rule, exactly like the <ae> and <AE> letters (considered as ligatures or as unreakable letters depending on the language that use them). That's why <ij> and <IJ> are not canonically decomposable as <i, j> and <I, J> (this is just a compatibility decomposition). If it had only been a shortcut character mapped for compatibility reasons from some 8-bit encodings, it would have been normalized with a canonical decomposition. (the exception to this rule is the inclusion of Arabic ligatures which were clearly and always decomposable, but that could not be canonically decomposed because it would have required more than a character pair for the NFD equivalence, so they are only given a NFKD decomposition and their usage is strongly deprecated, and just included for an unnecessary roundtrip conversion from legacy Arabic encodings). -- Philippe.