From: "Mark Davis" <[EMAIL PROTECTED]>

> Michael, that isn't the point. There is a problem even when you stick to
one
> language.
>
> That is, there are situations where two letters in a language, e.g. "ch"
in
> Slovak, are normally sorted as one. However, in some exceptional
> circumstances those letters should be sorted separated. It could be
because
> they come originally from another language, or it could be because they
> happen to arise when two other words are conjoined. There is no
algorithmic
> distinction. So without some special character, it would require a
> dictionary look-up to produce the right sort

I would argue that most users of the language are not expecting this type of
thing, and that when they are looking for a word that this might be the
SECOND place they look, not the first.

There are exceptions, but they are not outnumbered by the general case, by
any means.

> For example, suppose that "th" were sorted separately in English, after Z.
> Yet people would expect the following order:
>
> cast
> cathouse
> caul
> cathode
>
> because the "t" and "h" are logically separate in "cathouse".

Again, I think most people would look first in the place that does not
assume the exception -- the computer's original limitations havse trained
them. The notion of a natural language processing engine that would have all
of the specific differences (with appropriate dictionaries for exceptions to
even the NLP results) is a fascinating notion, but one that no one is even
close to, yet.

We do not even have available UCA tailorings for most of the world's
languages. Though I have high hopes for the future (if not in the UCA then
in other mechanisms).

By that time, many langauges may have TWO collations, since users have been
expecting something else for the last few decades?

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/



Reply via email to