On 2/21/2016 9:53 AM, Doug Ewell wrote:
But that still doesn't work for a character like ø, which doesn't
decompose to o + anything
Why doesn't it, btw? Same question about ł.
I've heard an opinion that UnicodeData.txt only included
decompositions when the combining mark's glyphs don't
> From: "Doug Ewell"
> Cc:
> Date: Sun, 21 Feb 2016 10:53:23 -0700
>
> > Given that the feature can be turned off easily, do you think that it
> > will nonetheless be useful, even though language-dependent parts are
> > not available?
>
> It's probably a lot better than no folding. Just be prep
On 2/21/2016 8:22 AM, Eli Zaretskii
wrote:
From: "Asmus Freytag (t)"
Date: Sat, 20 Feb 2016 14:10:04 -0800
What about language-independent character-folding: where in the
Unicode database is the data for that?
Unico
Eli Zaretskii wrote:
About the closest approximation you can get using Unicode data alone
(not CLDR) is to normalize to NFD, then ignore the combining
diacritics.
This is what Emacs currently does, IIUC what you say. The NFD
normalization uses the decomposition data included with
UnicodeData.
Btw, are there any editors out there which support similar features?
If so, can someone please point to them, and perhaps provide a short
summary of the features they provide and how are they implemented?
Thanks.
> From: Mark Davis ☕️
> Date: Sun, 21 Feb 2016 11:47:28 +0100
> Cc: Unicode Public
>
> If you don't use ICU, you can also use the CLDR data directly, but you'll
> have to parse it yourself. You'd start with the root locale, then add in
> the mappings from the children (eg de.xml). The parsing is
> From: Philippe Verdy
> Date: Sun, 21 Feb 2016 00:19:19 +0100
> Cc: unicode Unicode Discussion
>
> Unless we have case folding tailored by language, you cannot do that based
> on the Unicode database alone.
>
> However CLDR provides tailored data about collation.
>
> From my point of view, i
> From: Philippe Verdy
> Date: Sun, 21 Feb 2016 00:19:19 +0100
> Cc: unicode Unicode Discussion
>
> It should also be noted that some kind of "folding" described/desired by
> Elias will likely fail his expectations, even when using collation data in
> CLDR tailored per language.
I don't think t
> From: "Asmus Freytag (t)"
> Date: Sat, 20 Feb 2016 14:10:04 -0800
>
> > What about language-independent character-folding: where in the
> > Unicode database is the data for that?
> >
> >
> Unicode, even CLDR, doesn't nearly have enough data for the purpose.
This seems to contradict what others
> From: "Doug Ewell"
> Date: Sat, 20 Feb 2016 14:43:15 -0700
>
> > What about language-independent character-folding: where in the
> > Unicode database is the data for that?
>
> The OP kind of alluded to that: there is no such thing really as
> language-independent character folding.
Emacs is c
On Sat, Feb 20, 2016 at 11:10 PM, Asmus Freytag (t) wrote:
> Unicode, even CLDR, doesn't nearly have enough data for the purpose.
> (and as a corollary of what Elias points out, it's likely to annoy users
> of every language, in that it would fold essential and non-essential
> distinctions indisc
On 21 February 2016 at 06:10, Asmus Freytag (t)
wrote:
Unicode, even CLDR, doesn't nearly have enough data for the purpose.
> (and as a corollary of what Elias points out, it's likely to annoy users
> of every language, in that it would fold essential and non-essential
> distinctions indiscrimina
It should also be noted that some kind of "folding" described/desired by
Elias will likely fail his expectations, even when using collation data in
CLDR tailored per language.
Notably, this data, even if it is used as it weakest strength (the primary
collation level only, discarding other differen
On 2/20/2016 9:56 AM, Eli Zaretskii
wrote:
From: Philippe Verdy
Date: Sat, 20 Feb 2016 18:27:41 +0100
Cc: unicode Unicode Discussion
Unless we have case folding tailored by language, you cannot do that based on the Unicode database alone.
Eli Zaretskii wrote:
What about language-independent character-folding: where in the
Unicode database is the data for that?
The OP kind of alluded to that: there is no such thing really as
language-independent character folding.
About the closest approximation you can get using Unicode data
Yes, that can be used.
Easiest is using ICU. Create a collator, using the "search" keyword. That
can be used to search for text, using settings you want for the strength
(primary differences, secondary, etc). You can also access the collation
keys from the ICU API, and build a mapping yourself of
Quote/Cytat - Philippe Verdy (Sat 20 Feb 2016
06:27:41 PM CET):
Unless we have case folding tailored by language, you cannot do that based
on the Unicode database alone.
However CLDR provides tailored data about collation.
From my point of view, it is just a matter or selecting the collatio
> From: Philippe Verdy
> Date: Sat, 20 Feb 2016 18:27:41 +0100
> Cc: unicode Unicode Discussion
>
> Unless we have case folding tailored by language, you cannot do that based on
> the Unicode database alone.
What about language-independent character-folding: where in the
Unicode database is th
Unless we have case folding tailored by language, you cannot do that based
on the Unicode database alone.
However CLDR provides tailored data about collation.
>From my point of view, it is just a matter or selecting the collation
strength to use for searches using collation. All collations in CLD
Quote/Cytat - Elias Mårtenson (Sat 20 Feb 2016
11:23:13 AM CET):
Hello Unicode,
I have been involved in a rather long discussion on the Emacs-devel mailing
list[1] concerning the right way to do character folding and we've reached
a point where input from Unicode experts would be welcome.
T
Hello Unicode,
I have been involved in a rather long discussion on the Emacs-devel mailing
list[1] concerning the right way to do character folding and we've reached
a point where input from Unicode experts would be welcome.
The problem is the implementation of equivalence when searching for
char
21 matches
Mail list logo