On Wed, 16 Oct 2019 09:33:38 +0300 Eli Zaretskii via Unicode <unicode@unicode.org> wrote:
> > These are complaints about primary-level searches, not canonical > > equivalence. > > Not sure what you call primary-level searches, but if you deduced the > complaints were only about searches for base characters, then that's > not so. They are long discussions with many sub-threads, so it might > be hard to find the specific details you are looking for. The nearest I've found to complaints about including canonical equivalences are: (a) an observation that very occasionally one would need to switch canonical equivalence off. In such cases, one is not concerned with the text as such, but rather with how Unicode non-compliant processes will handle it. Compliant processes are often built out of non-compliant processes. (b) just possibly "What we have seen is that the behavior that comes from that Unicode data does not please the users very much. Users seem to have many different ideas of what folding is useful, and disagree with each other greatly." - https://lists.gnu.org/archive/html/emacs-devel/2016-02/msg01359.html I can't tell what (b) was talking about; it may well have been about folding or asymmetric search, as opposed to supporting canonical equivalence. (c) A search for 'n' finding 'ñ'. When it comes to canonical equivalence, one answer to (c) is that as soon as one adds the next letter letter, e.g. 'na', the search will no longer match 'ñ'. (This doesn't apply to diacritic-ignoring folding.) That argument doesn't work with the Polish letter 'ń' though, as it can be word-final. In programming, one might be able to prevent the issue by using 'n\b{g}', but that is a requirement of RL2.2, which doesn't seem to be high on the list of implementer's priorities, especially as it depends on properties outwith the UCD, defined in a non-ASCII file to boot. A better supported solution is probably 'n\P{Mn}'. In many cases, the answer might be a search by collation graphemes, but that has other issues besides language sensitivity. Richard.