Just so story: Why isn't o-slash decomposed? (was: Re: Character folding in text editors)

2016-02-22 Thread Ken Whistler
On 2/21/2016 9:53 AM, Doug Ewell wrote: But that still doesn't work for a character like ø, which doesn't decompose to o + anything Why doesn't it, btw? Same question about ł. I've heard an opinion that UnicodeData.txt only included decompositions when the combining mark's glyphs don't

Re: Character folding in text editors

2016-02-21 Thread Eli Zaretskii
> From: "Doug Ewell" > Cc: > Date: Sun, 21 Feb 2016 10:53:23 -0700 > > > Given that the feature can be turned off easily, do you think that it > > will nonetheless be useful, even though language-dependent parts are > > not available? > > It's probably a lot better than no folding. Just be prep

Re: Character folding in text editors

2016-02-21 Thread Asmus Freytag (t)
On 2/21/2016 8:22 AM, Eli Zaretskii wrote: From: "Asmus Freytag (t)" Date: Sat, 20 Feb 2016 14:10:04 -0800 What about language-independent character-folding: where in the Unicode database is the data for that? Unico

Re: Character folding in text editors

2016-02-21 Thread Doug Ewell
Eli Zaretskii wrote: About the closest approximation you can get using Unicode data alone (not CLDR) is to normalize to NFD, then ignore the combining diacritics. This is what Emacs currently does, IIUC what you say. The NFD normalization uses the decomposition data included with UnicodeData.

Re: Character folding in text editors

2016-02-21 Thread Eli Zaretskii
Btw, are there any editors out there which support similar features? If so, can someone please point to them, and perhaps provide a short summary of the features they provide and how are they implemented? Thanks.

Re: Character folding in text editors

2016-02-21 Thread Eli Zaretskii
> From: Mark Davis ☕️ > Date: Sun, 21 Feb 2016 11:47:28 +0100 > Cc: Unicode Public > > If you don't use ICU, you can also use the CLDR data directly, but you'll > have to parse it yourself. You'd start with the root locale, then add in > the mappings from the children (eg de.xml). The parsing is

Re: Character folding in text editors

2016-02-21 Thread Eli Zaretskii
> From: Philippe Verdy > Date: Sun, 21 Feb 2016 00:19:19 +0100 > Cc: unicode Unicode Discussion > > Unless we have case folding tailored by language, you cannot do that based > on the Unicode database alone. > > However CLDR provides tailored data about collation. > > From my point of view, i

Re: Character folding in text editors

2016-02-21 Thread Eli Zaretskii
> From: Philippe Verdy > Date: Sun, 21 Feb 2016 00:19:19 +0100 > Cc: unicode Unicode Discussion > > It should also be noted that some kind of "folding" described/desired by > Elias will likely fail his expectations, even when using collation data in > CLDR tailored per language. I don't think t

Re: Character folding in text editors

2016-02-21 Thread Eli Zaretskii
> From: "Asmus Freytag (t)" > Date: Sat, 20 Feb 2016 14:10:04 -0800 > > > What about language-independent character-folding: where in the > > Unicode database is the data for that? > > > > > Unicode, even CLDR, doesn't nearly have enough data for the purpose. This seems to contradict what others

Re: Character folding in text editors

2016-02-21 Thread Eli Zaretskii
> From: "Doug Ewell" > Date: Sat, 20 Feb 2016 14:43:15 -0700 > > > What about language-independent character-folding: where in the > > Unicode database is the data for that? > > The OP kind of alluded to that: there is no such thing really as > language-independent character folding. Emacs is c

Re: Character folding in text editors

2016-02-21 Thread Mark Davis ☕️
On Sat, Feb 20, 2016 at 11:10 PM, Asmus Freytag (t) wrote: > Unicode, even CLDR, doesn't nearly have enough data for the purpose. > (and as a corollary of what Elias points out, it's likely to annoy users > of every language, in that it would fold essential and non-essential > distinctions indisc

Re: Character folding in text editors

2016-02-20 Thread Elias Mårtenson
On 21 February 2016 at 06:10, Asmus Freytag (t) wrote: Unicode, even CLDR, doesn't nearly have enough data for the purpose. > (and as a corollary of what Elias points out, it's likely to annoy users > of every language, in that it would fold essential and non-essential > distinctions indiscrimina

Re: Character folding in text editors

2016-02-20 Thread Philippe Verdy
It should also be noted that some kind of "folding" described/desired by Elias will likely fail his expectations, even when using collation data in CLDR tailored per language. Notably, this data, even if it is used as it weakest strength (the primary collation level only, discarding other differen

Re: Character folding in text editors

2016-02-20 Thread Asmus Freytag (t)
On 2/20/2016 9:56 AM, Eli Zaretskii wrote: From: Philippe Verdy Date: Sat, 20 Feb 2016 18:27:41 +0100 Cc: unicode Unicode Discussion Unless we have case folding tailored by language, you cannot do that based on the Unicode database alone.

Re: Character folding in text editors

2016-02-20 Thread Doug Ewell
Eli Zaretskii wrote: What about language-independent character-folding: where in the Unicode database is the data for that? The OP kind of alluded to that: there is no such thing really as language-independent character folding. About the closest approximation you can get using Unicode data

Re: Character folding in text editors

2016-02-20 Thread Mark Davis ☕️
Yes, that can be used. Easiest is using ICU. Create a collator, using the "search" keyword. That can be used to search for text, using settings you want for the strength (primary differences, secondary, etc). You can also access the collation keys from the ICU API, and build a mapping yourself of

Re: Character folding in text editors

2016-02-20 Thread Janusz S. Bien
Quote/Cytat - Philippe Verdy (Sat 20 Feb 2016 06:27:41 PM CET): Unless we have case folding tailored by language, you cannot do that based on the Unicode database alone. However CLDR provides tailored data about collation. From my point of view, it is just a matter or selecting the collatio

Re: Character folding in text editors

2016-02-20 Thread Eli Zaretskii
> From: Philippe Verdy > Date: Sat, 20 Feb 2016 18:27:41 +0100 > Cc: unicode Unicode Discussion > > Unless we have case folding tailored by language, you cannot do that based on > the Unicode database alone. What about language-independent character-folding: where in the Unicode database is th

Re: Character folding in text editors

2016-02-20 Thread Philippe Verdy
Unless we have case folding tailored by language, you cannot do that based on the Unicode database alone. However CLDR provides tailored data about collation. >From my point of view, it is just a matter or selecting the collation strength to use for searches using collation. All collations in CLD

Re: Character folding in text editors

2016-02-20 Thread Janusz S. Bien
Quote/Cytat - Elias Mårtenson (Sat 20 Feb 2016 11:23:13 AM CET): Hello Unicode, I have been involved in a rather long discussion on the Emacs-devel mailing list[1] concerning the right way to do character folding and we've reached a point where input from Unicode experts would be welcome. T

Character folding in text editors

2016-02-20 Thread Elias Mårtenson
Hello Unicode, I have been involved in a rather long discussion on the Emacs-devel mailing list[1] concerning the right way to do character folding and we've reached a point where input from Unicode experts would be welcome. The problem is the implementation of equivalence when searching for char