From: "Asmus Freytag" <[EMAIL PROTECTED]> > I have a certain sympathy for the idea of designing UCA so that the > untailored *default* works for such kind of multilingual usage. However, > the other use of the DUCET is to be the most convenient base for applying > all tailorings. I have a certain sympathy for the position that claims that > there are important, but perhaps specialized or not economically powerful > classes of users that will not likely have access to a tailored UCA for > their language or writing system. > > If that is really the case, i.e. appreciable numbers of smaller languages > would be able to survive without tailoring, then the alternative to fixing > the DUCET could be a separate publication of a common base tailoring for > multilingual data access. (A base tailoring would be applied before further > tailoring for a specific language).
I appreciate much this analysis. The DUCET has effectively two supposed usages, whose purposes are opposed. If used as a base collation from which a language-specific collation can be built simply with few rules, it's true that the other common usage needed for multilanguage searches is not easy to build. May be we could think about designing a new standard collation tailoring table which could be used as an alternative to the DUCET, but targetting multilanguage searches. And so, such tailoring would include more folding than the DUCET, putting the differences at a higher weight level. And give it a name (MUCET? for Multilanguage Unicode Collation Elements Table?) that would be supported as well. The DUCET is now quite stable and there's no need to change it, as it is now well known and certainly used in many applications that depend on it (RDBMS engines notably). But a MUCET would be certainly useful, including for users that would no more need to search for multiple words in a multilanguage database or simply for the web. Nothing forbids, in addition, to sort the matching entries by relevance using the DUCET as a secondary collation order. After all a collation elements table works exactly like a custom decomposition table that creates additional strings whose encoding is not portable as it depends on weight values. Using custom decompositions is often much simpler than implementing a multilevel collation, using existing algorithms implemented for NFD and NFKD decompositions. In such a view, some extra decompositions are needed, using non-standard Unicode characters for some elements (for example when decomposing a AE letter into a ligature with an extra custom control with a higher collation level, to be used only for full collation order but that could be ignored for searches limited at level 1 or 2).

