2018-06-04 21:49 GMT+02:00 Manish Goregaokar via Unicode < [email protected]>:
> Hi, > > The Rust community is considering > <https://github.com/rust-lang/rfcs/pull/2457> adding non-ascii > identifiers, which follow UAX #31 <http://www.unicode.org/reports/tr31/> > (XID_Start XID_Continue*, with tweaks). The proposal also asks for > identifiers to be treated as equivalent under NFKC. > > Are there any cases where this will lead to inconsistencies? I.e. can the > NFKC of a valid UAX 31 ident be invalid UAX 31? > Yes, such case exists, for instance in Latin alphabet and Catalan language. * Ŀ, LATIN CAPITAL LETTER L WITH MIDDEL DOT <U+013F> NFKC decomposes to LATIN CAPITAL LETTER L (U+004C) MIDDLE DOT (U+00B7): <L,·> * ŀ, LATIN SMALL LETTER L WITH MIDDLE DOT <U+0140> NFKC decomposes to LATIN SMALL LETTER L (U+006C) MIDDLE DOT (U+00B7): <l,·> Ŀ and ŀ are (were) used for Catalan language for encoding geminate L [1] when it is (was) encoded using 2 chars only. Preferred (and common used) encoding is currently that of 3 chaacters: <L,·,L>. So, some adjustments are needed if you whant to support Catalan language identifiers [2] Yours, Joan Montané [1] https://en.wikipedia.org/wiki/Interpunct#Catalan [2] http://www.unicode.org/reports/tr31/#Specific_Character_Adjustments

