https://bugzilla.wikimedia.org/show_bug.cgi?id=164
--- Comment #196 from Philippe Verdy <verd...@wanadoo.fr> 2010-07-26 20:37:46 UTC --- Note that the CollatorFactory may fail to locate the specified locale for which a collator is being requested. Additionally, several locales may share exactly the same collator. In all cases, the collatorFactory will return a valid Collator object, whose locale can be identified: the Collator object returned by: $collator = $wgCollatorFactory->get($locale, $level); will have a property that will contain the effective locale code (normalized) and an other property containing the collation level from which it was effectively built. You should be able to access it simply with something like: $effectiveLocale = $collator->locale(); $effectiveLevel = $collator->level(); after just getting the collator instance from the factory. This may be useful to avoid storing duplicate equivalent binary sortkeys, or simply to determine which effective locale to use in SQL select queries (to retrieve the sorted list of pagenames ordered by a specified locale), when the SQL schema will be able to store several sortkeys for the same page in the same category. The factory will also instanciate a collator with an effective locale and an effective collation level only once, caching it in an internal array, for repeated use. This will save the complex preparation of tables, and will avoid building tables for all supported languages (for example in Commons where lots of languages may be desirable, weahc one with possibly several sort options, or supported conversions to other scripts or script variants). The factory however should probably be able to load the DUCET table associated to the CLDR "root" locale completely and immediately when it is first instanciated and stored in the global variable (there's probably no need to test this each time vecaue of lazy initializations with null member fields); and it should most probably build the default collator (for $locale=$wgContentLanguage, and $collationlevel=1) immediately, storing it in the first position of its member array of already prepared Collator instances. But you may think the opposite, in order to speedup the server startup by some (milli-)seconds or reduce the initial CPU/memory stress in the garbage collator of PHP. However I'm not convince that the server will be ready faster, and the extra tests that will be performed at each use of the $wgCollatorFactory->get() method may impact the performance at runtime... Note also the ICU uses the same approach of a CollatorFactory to build and cache reusable Collator instances, because it's a proven good design pattern for implementing and using collators. A collator object may also be used to compare to texts without even generating their sortkeys, or without mapping them, so it may help to include in the Collator interface this method: $collator->compare($text1, $text2); that will return an integer (in other words, a Collator also implements the Comparator interface), by parsing $text1 and $text2 collation element by collation element up to the end at level 1, comparing their collation weights only at theis level, before restarting with the next level. When the collator was instanciated at level 1, the successive collation elements need not be stored, but for higher levels, it helps if they are parsed only once and kept in an indexed array that will allow faster lookup for the next levels in the table of collation weights for these levels. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l