On 2/21/2016 8:22 AM, Eli Zaretskii
wrote:
Depends what you are searching for.From: "Asmus Freytag (t)" <asmus-...@ix.netcom.com> Date: Sat, 20 Feb 2016 14:10:04 -0800What about language-independent character-folding: where in the Unicode database is the data for that?Unicode, even CLDR, doesn't nearly have enough data for the purpose.This seems to contradict what others said: they said CLDR includes the necessary data. What is missing from CLDR, and how bad will the omissions affect searching? (and as a corollary of what Elias points out, it's likely to annoy users of every language, in that it would fold essential and non-essential distinctions indiscriminately).Users can easily turn the folding off if they don't like it or if it gets in the way. Depends. If a language has a set of important distinctions but text (for users working in that language) also contains noncritical distinctions, the inability to ignore just the latter would be annoying. There are scenarios where the approximation may not matter. Also, the sorting order for some languages is radically distinct from the "generic" one. So a language-independent folding based on generic sorting order isn't going to be ideal. The important question is: will Emacs with this feature be more or less useful than without it? Another important question is whether character folding in searches should be turned on or off by default. IOW, should we expect more users wanting to turn it off than on? For languages like English, folding accents by default works really well, unless someone tries to find foreign words in English text... but that would be taken care of by making the default overridable. However, for other languages, it gives very strange (annoying) results - for at least *some* words but might be useful for some cases. Users might want to disable that default (or invert it) permanently. AFAIU, the very least that should be provided is being able to find decomposed characters when a composed one is searched for. The data for this, AFAIU, is in UnicodeData.txt in the form of the canonical decompositions. Is this correct? That's generally useful, because these cases represent two encodings hat are intentionally equivalent. No, just that there are areas of application where folding all diacritics isn't useful (remember, this was in the context of a specific use case).none has seen folding of diacritics as usefulReally? So you are saying that, based on your experience, being able to ignore diacritics in searches is not a useful feature? A./ |
- Character folding in text editors Elias Mårtenson
- Re: Character folding in text editors Janusz S. Bien
- Re: Character folding in text editors Philippe Verdy
- Re: Character folding in text editors Eli Zaretskii
- Re: Character folding in text editors Asmus Freytag (t)
- Re: Character folding in text editors Elias Mårtenson
- Re: Character folding in text editors Mark Davis ☕️
- Re: Character folding in text edit... Eli Zaretskii
- Re: Character folding in text edit... Eli Zaretskii
- Re: Character folding in text editors Eli Zaretskii
- Re: Character folding in text edit... Asmus Freytag (t)
- Re: Character folding in text editors Janusz S. Bien
- Re: Character folding in text editors Mark Davis ☕️
- Re: Character folding in text editors Doug Ewell
- Re: Character folding in text editors Philippe Verdy
- Re: Character folding in text editors Eli Zaretskii
- Re: Character folding in text editors Eli Zaretskii
- Re: Character folding in text editors Eli Zaretskii
- Re: Character folding in text editors Doug Ewell
- Re: Character folding in text editors Eli Zaretskii
- Just so story: Why isn't o-slash decomp... Ken Whistler