Excellent research! Thanks a lot! Am So., 20. Okt. 2024 um 16:14 Uhr schrieb Robin Leroy < [email protected]>:
> Le dim. 20 oct. 2024 à 10:48, Charlotte Eiffel Lilith Buff via Unicode < > [email protected]> a écrit : > >> As I understand it (and I believe this was even the wording used in >> previous versions of UAX #15), the script-specific exclusions exist because >> for a handful of characters the fully decomposed form is the preferred >> representation in regular usage. This makes sense to me for the precomposed >> Hebrew letters because with so many combining marks with unique CCC values, >> it just seems easier to deal exclusively with combining character sequences >> and not have some random marks “glue” themselves to the base letter. The >> two-part Tibetan subjoined letters are similar in this regard. > > >> > However, the Indic nuktas seem entirely unproblematic and in fact not all >> precomposed letters with nukta are composition-excluded: Devanagari has ऩ, >> ऱ, and ऴ for example. >> >> Does anyone remember what lead to these specific decisions or knows where >> to find the relevant documents if they exist? >> > I certainly wasn’t involved in Unicode when the relevant documents were > discussed, as I was busy learning the letters in the Basic Latin block¹, > but I looked at some of them a couple of years ago. > > - Revision 9 of then-DUTR² #15 > https://www.unicode.org/reports/tr15/tr15-9.html, dated 1998-11-23, > and entered into the registry > <https://www.unicode.org/L2/L1998/Register-1998.html> as L2/98-404, > does not mention composition exclusions. > - The first revision (10) that mentions characters *excluded from > being primary composites* is > https://www.unicode.org/reports/tr15/tr15-10.html#Definitions, > dated 1998-12-16. The rationale is indeed that *This would be to match > common practice for scripts that use fully decomposed forms.* The sole > example given is FB31. > - The next revision (11) includes a list of composition exclusions: > > https://www.unicode.org/reports/tr15/tr15-11.html#Primary%20Exclusion%20List%20Table, > dated 1999-02-25. This list includes 0958..095F. > > Between revisions 9 and 10, we have UTC #78, whose minutes are L2/98-419 > <https://www.unicode.org/L2/L1998/98419.pdf>. See the discussion in the > section titled “Normalization [Document L2/98-404]”, and in particular the > last comment from Ken Whistler. > Between revisions 10 and 11, we have UTC #79, in whose minutes L2/99-054R > <https://www.unicode.org/L2/L1999/99054r.htm#79-0>, in the section > “Proposed Draft UTR #15, Unicode Normalization”, we get a similar comment > from Ken towards the end. > The minutes of UTC #80, L2/99-176 > <https://www.unicode.org/L2/L1999/99176.htm>, have some discussion of > normalization, and motion 80-M25 letting the editorial committee change the > composition exclusions table; but by that point 0958 is already in there, > so digging there isn’t going to help. > > However, some later documents provide relevant context: > > - L2/01-304 <https://www.unicode.org/L2/L2001/01304-feedback.pdf> (p. > 17, in the section on Devanagari). > - L2/01-305 <https://www.unicode.org/L2/L2001/01305-india-resp.txt> > (section on Devanagari). > > So there was clear feedback from India that U+0958 क़ and friends should be > discouraged; presumably the UTC must have been aware of that in 1999. On > the distinction between क़ vs. ऴ, I guess this is related to ऴ being atomic > in ISCII; in turn that is because while ऴ is decomposable, corresponding > letters in other ISCII scripts (ழ, ఴ, ഴ) are not. See also point (viii) of > L2/01-304 <https://www.unicode.org/L2/L2001/01304-feedback.pdf>; there > still was a desire to make the encodings similar between the scripts. > > I am sure Ken can provide more details. > > Best regards, > > Robin Leroy > > ― > ¹ As well as a few from the Latin-1 Supplement and Latin Extended-A blocks. > ² This predates L2/00-118 > <https://www.unicode.org/L2/L2000/00118-parts.txt> and UTC decision 83-C6 > <https://www.unicode.org/L2/L2000/00115.htm#83-C6> which gave us the > terms UAX and UTS. >
