Le 6 déc. 03, à 09:20, SADAHIRO Tomoyuki a écrit :
The syntax of collation customization (tailoring) in ICU
(
http://oss.software.ibm.com/icu/userguide/Collate_Customization.html )
is character-based and may be more intuitive:
for French:
"[backwards 2]&A << \u00e6/e <<< \u00c6/E"
> Has anyone had a look at the OpenI18N/ICU locale data?
>
> The locales there are all UTF-8 and have java rule based collation data, so
> they *might* be useful for creating a more comprehensive (and accurate) set
> of sort modules? The downside is this data is pretty rough ATM but does
> seem t
Has anyone had a look at the OpenI18N/ICU locale data?
The locales there are all UTF-8 and have java rule based collation
data, so
they *might* be useful for creating a more comprehensive (and
accurate) set
of sort modules? The downside is this data is pretty rough ATM but does
seem to be improv
Sadahiro Tomoyuki wrote:
>
>> So I guess I need a Ligua:XX::Sort module for each language I operate
>> on,
>> in my original posting I was misled to believe that Unicode::Collate
>> would
>> be the tool to use.
>>
>> Thanks to all for the useful links provided in this thread.
>
> As far as I fo
> So I guess I need a Ligua:XX::Sort module for each language I operate
> on,
> in my original posting I was misled to believe that Unicode::Collate
> would
> be the tool to use.
>
> Thanks to all for the useful links provided in this thread.
As far as I found, CPAN provides at least five modu
Le 1 déc. 03, à 18:33, Jarkko Hietaniemi a écrit :
% perl -Mutf8 -e 'binmode(STDOUT, ":utf8"); print join " ", sort
qw(côte côté cote coté)'
cote coté côte côté
Is this the famous French "backwards accents" rule in action?
(http://www-clips.imag.fr/geta/gilles.serasset/tri-du-francais.html)
(no,
Eric Cholet wrote in perl.unicode :
>
> So is it just by chance that these French words are accurately sorted?
>
> % perl -Mutf8 -e 'binmode(STDOUT, ":utf8"); print join " ", sort
> qw(côte côté cote coté)'
> cote coté côte côté
Until recently, spanish dictionaries used to treat 'll' vowel
as a
Ok, this is in line with what how I understood this paragraph in
perluniintro:
The short answer is that by default, Perl compares strings
("lt",
"le", "cmp", "ge", "gt") based only on the code points of
the char-
acters. In the above case, the answer is "aft
Le 1 dÃc. 03, Ã 16:46, Jarkko Hietaniemi a Ãcrit :
Thank you both for your replies. What about sorting words in one
particular
language, is Perl's sort() good enough? I'm wondering, since language
isn't
one of sort()'s arguments.
First we need to define "good enough"... again, if you are sorting
Thank you both for your replies. What about sorting words in one
particular
language, is Perl's sort() good enough? I'm wondering, since language
isn't
one of sort()'s arguments.
First we need to define "good enough"... again, if you are sorting
"simple" English or Hawaiian, you are probably fine
Le 29 nov. 03, à 16:30, Jarkko.Hietaniemi a écrit :
I want to correctly sort words in a variety of languages, currently
French, English, Spanish, Portuguese, German and Arabic. I am using
Perl 5.8.1 and unicode. I think I need Unicode::Collate to have
*correct* sorting. Is this correct?
In additio
> -Original Message-
> From: Jarkko.Hietaniemi [mailto:[EMAIL PROTECTED]
...
> the UCA is not "correct" for any particular language ...
Not by design, no, but it fine for English and Italian, for example.
> I think it is worth pointing out that trying to sort multilingual
> data is pra
I want to correctly sort words in a variety of languages, currently
French, English, Spanish, Portuguese, German and Arabic. I am using
Perl 5.8.1 and unicode. I think I need Unicode::Collate to have
*correct* sorting. Is this correct?
In addition to the problems listed by Sadahiro (most importantl
[excuse me, I sent cc to [EMAIL PROTECTED];
I expect some helps and/or suggestions may be given there]
> Greetings,
>
> I hope you won't mind a few questions related to your module
> Unicode::Collate.
>
> I want to correctly sort words in a variety of languages, currently
> French, English, Span
14 matches
Mail list logo