Re: Common Locale Data Repository Project

Philippe Verdy Fri, 23 Apr 2004 17:41:48 -0700

From: "Peter Constable" <[EMAIL PROTECTED]>
> > I think that the CLDR database is extremely important for software
> > implementations, because it avoids some caveats that come from other
> unstable
> > standards such as ISO 3166 and ISO 639.
>
> ISO 639 is not unstable. It is an open code set that is being added to
> over time, but I don't think that should be referred to as unstable --
> that term suggests other things.


By unstable I mean in fact ambiguous, even for the correct designation of
languages with a code that can be recognized. Even the proposal to supercede ISO
3066 with new tags has its caveats: which code must an application use when it
already defines multiple ones (is this number bound?) to refer to the same
language.

The problem comes within Softwares when a user will specify a prefered language
in his locale with a code that will not be understood by an application that
just understands another one. This becomes worse when one software will require
one code in the user's locale to support a language and another will require
another code in the user's locale to support the same language.

Look for example the case of Norwegian: is it no, nn or nb or no-nynorks or
no-bokmal ?
Even with the algorithm based on common prefixes, you won't be able to match
them all. So there's a need to specify an algorithms that allows aliases to be
resolved. With multi-tags language identifiers the resolution order becomes
unpredictable if one supports aliases for one subtag and not the other.

What is already unstable in ISO639 is the deprecation of "iw" and the addition
of "he", same thing for "in" and "id" or for "yi" and "ji". Don't you call that
unstability? OK these codes are deprecated, not reassigned. But they still cause
problems.

Think more recently about the new codification for Serbo-Croatian, and the split
of "sh", with no definition except that it is country based (Serbian, Croatian,
Bosnian, Montenegrin), assimuming that one country uses only one language when
in fact there are several in the same one, that are shared by multiple
countries, and differ mostly by their script...

Also if ISO3166 is unstable (CS: is that the former Czechoslovakia or the newer
Serbia-Montenegro?), then it introduces unstability too within ISO 3066 or its
proposed replacement... for the indentification of languages.

For now, the only workable solution to solve these issues is found in
supplementary libraries in ICU which support locale aliases. (Yes I use the
terme Locale because this is the term that Java gives to this identification,
based on a language code consisting into a single subtag, a country/territory
code and a variant code with possibly multiple subtags, and no reference to the
needed script code; I wonder how the newer RFC 3066 model will fit here).

Re: Common Locale Data Repository Project

Reply via email to