https://bugzilla.wikimedia.org/show_bug.cgi?id=164
--- Comment #228 from Philippe Verdy <verd...@wanadoo.fr> 2011-08-26 21:49:23 UTC --- I am currently working on a Javascript implementation of UCA (i.e. with full support of the DUCET, at least 4 collation levels, and support for language-based tailorings). This is a very difficult task if we need performance. The trick is to use efficient lookup tables, but the difficulty is being able to update them (for tailorings); also performance will highly depend on how the DUCET is represented in Javascript, so that it can be initialized sufficiently fast that you won't notice this init time, when a collator object gets instanciated. I'm still not sure that performance will be good enough (even with today's much faster Javascript implementations), and may be some optional cooperation with a server-side script will be needed (or possibly by using a client-side native plugin interfaced with Javascript, such as Flash) could limit the number of JSON requests to a server in order to provide the tailored lookup tables. How to detect ans use client-side plugins in an open way. I really hope that someday, the Javascript/ECMAScript standard will specify a common standard interface to collators. I've already looked for help from authors of jQuery, where such common interface would become part of its "tools" library (even if this has nothing in common with working in the HTML DOM, that is the main focus of jQuery). For now there exists absolutely no reliable implementation of UCA in Javascript working only on the client-side (but there does exist already some Javascript libraries that in fact use JSON or XML requests to a server, in order to build or retreive the necessary lookup tables). For practical reasons, a server-side JSON/XML request could be optimized to reduce the volume of data exchanged (for example by providing only some parts of the collation elements, these data being then requested on demand even after the collator has already been -partly- initialized, and then cached in the collator object). But this is a complication in the design that for now I don't want to investigate too early, before I get a good image of the performances effectively needed in practice. May be it will be just enough to initialize only a part of the DUCET for a particular language and its specific tailoring, sorting all the other many characters with default weights (not necessarily from the large DUCET). My implementation will provide at least 3 interfaced functions: - instantiating a collator for a given language & sort order - computing collation keys from strings, for more efficient sorts of large tables (even if we use a QuickSort, the same strings are compared multiple times with other strings, so it may help to compute their collation keys only once); in addition, it may not be necessary to compute the full collation key, but only keys up to the level that's sufficient to perform a specific compare; collation keys can then be computed partly, on demand, and cached, instead of being fully computed for the full string at all collation levels; in addition, not all table sorts may require the maximum collation level supported, so collation keys don't necessarily have to be long (in memory); - the tradeoff for precomputing keys instead of compring keys on the fly, is highly dependant with the table sizes: for small number of rows to be sorted, you gain little with precomputed keys, you just waste memory and get lower performance. So the interface will also include a 3rd function: comparing strings using this collator, without necessarily having to compute the collation keys. I estimate that the tradeoff limit between precomputed collation keys and direct compares is for tables of about 100 rows, for which it may be helpful to provide a very fast response time when sorting them, but I may be wrong, because these small tables will not require precomputing and storing a lot of collation keys (so this first step before the QuickSort loop may be insignificant). The real limit could be memory use in very large tables for precomputing, storing or caching the collation keys; but such very large tables will probably be very uncommon in tables generated from Wiki pages (most of these tables would not include more than 500 rows, and storing 500 collation keys is not really a problem in today's browsers, except possibly in smartphones with limited memory resources and slow processors, compared to laptops, notebooks and tablets). -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l