On Thu, 25 Mar 2004 22:29:08 +0000 Rich <[EMAIL PROTECTED]> wrote: > Hello > > How should collation be handled in multitasking, multilingual applications - > in particular forking servers such as apache/mod_perl based web apps? > > I can assume the following: > > 1) I'll know the preferred language via a RFC2616 language tag. > 2) All data will be utf8 encoded Unicode. > 3) The required language may differ for each request. > > I guess Unicode::Collate is the way to go, so can I simply have one > Unicode::Collate instance per process using the default allkeys.txt table > file? > > Will that give sensible results for most (all?) languages, or do I need to > customise the collator on the fly when more 'exotic' (for want of a better > word) languages are requested? Are there other reasons, such as size and/or > performance issues, why the default allkeys.txt file may not be the way to > go?
I think, for a script representing usually one language, allkeys.txt defines fairly acceptable collation order. For example, order of hiragana and katakana is approximately compliant with the costom of the Japanese language. In contrast, for a script representing many languages (say, the Latin script), tailoring may be often necessary. E.g. 'Ä' is sorted as A-umlaut (sometimes as 'AE') in German, and as one of additional letters ordered after 'Z' in some northern-european languages. But according to Unicode default collation, 'Ä' is ordered as a modified 'A' and equal to 'A' at the primary level. > I must stress that I'm ok with most aspects of i18n/l10n - it's specifically > the correct use of Unicode::Collate in multitasking apps that I'm > interested in. > > Suggestions would be welcome - even more so if they don't involve having to > know the TR10 docs inside out! I write Unicode::Collate::Locale (tentatively) for linguistic tailoring of UCA. To use it, Unicode::Collate should search allkeys.txt from any directories in @iNC (at present it searchs table files only under the directory where it locates.) So Unicode::Collate::Locale should require Unicode::Collate 0.40 or later, which is not released yet, but a prerelease is available as shown below. [tarball] http://homepage1.nifty.com/nomenclator/perl/Unicode-Collate-Locale-0.01.tar.gz [doc] http://homepage1.nifty.com/nomenclator/perl/Unicode-Collate-Locale.html Sorry, now tailoring of only few languages are implemented. It may be enhanced sooner or later... [prerelease] This will be released *after* Perl 5.8.4 (or its RC) will be out. http://homepage1.nifty.com/nomenclator/perl/Unicode-Collate-0.40.tar.gz regards, SADAHIRO Tomoyuki