On Fri, 18 Oct 2019 09:45:14 +0300 Eli Zaretskii via Unicode <unicode@unicode.org> wrote:
> > Date: Thu, 17 Oct 2019 21:58:50 +0100 > > From: Richard Wordingham via Unicode <unicode@unicode.org> > > > > > Sounds arbitrary to me. How do we know that all the users will > > > want that? > > > > If the change from codepoint by codepoint matching is just canonical > > equivalence, then there is no way that the ‘n’ of ‘na’ will be > > matched by the ‘n’ within ‘ñ’. > > "Just canonical equivalence" is also quite arbitrary, for the user's > POV. At least IME. Here's a similar issue. If I do an incremental search in Welsh text, entering bac (on the way to entering bach) will find words like "bach" and "bachgen" even though their third letter is 'ch', not 'c'. 'Canonical equivalence' is 'DTRT', unless you're working with systems too lazy or too primitive to DTRT. It involves treating sequences of character sequences declared to be identical in signification identically. The only pleasant justification for treating canonical sequences inequivalently that I can think of is to treat the difference as a way of recording how the text was typed. Quite a few editing systems erase that information, and I doubt people care how someone else typed the text. Richard.