From: "Peter Kirk" <[EMAIL PROTECTED]> > As for line breaking (UAX14), WJ explicitly prohibits this; ZWJ and ZWNJ > are not listed, and so as Cf characters are ignored in the line breaking > algorithm. I note also that the combining mark CGJ is listed as GL and > so is not CM. The descriptive text of rules LB7a-c implies that CM = > combining mark whereas this is not in fact true; some combining marks > are not CM and some CM are not combining marks. In rule LB7b the term > "combining character sequence" is used, contrary to its regular defined > meaning, for a sequence of CM characters and the preceding non-CM character.
Other proofs that even the Unicode exact terminology is to be used with extreme care, as there are many exceptions, even in _standard_ technical reports such as UAX's. If it was possible, I would suggest performing an audit of the terminology and classification of all character categories, including in the UTS. It's just too much complicate for now to comply to each UTR (or only to UAX and UTS), as one need to check simultaneously a lot of sometime "conflicting" properties used by various technical reports. We need a comprehensive new technical report that lists all the exceptions to the general category system, as these line-breaking or word-breaking or grapheme cluster breaking properties are orthogonal to the basic GC system and to the combining class system.