From: "Peter Constable" <[EMAIL PROTECTED]> > > I think that the CLDR database is extremely important for software > > implementations, because it avoids some caveats that come from other > unstable > > standards such as ISO 3166 and ISO 639. > > ISO 639 is not unstable. It is an open code set that is being added to > over time, but I don't think that should be referred to as unstable -- > that term suggests other things.
By unstable I mean in fact ambiguous, even for the correct designation of languages with a code that can be recognized. Even the proposal to supercede ISO 3066 with new tags has its caveats: which code must an application use when it already defines multiple ones (is this number bound?) to refer to the same language. The problem comes within Softwares when a user will specify a prefered language in his locale with a code that will not be understood by an application that just understands another one. This becomes worse when one software will require one code in the user's locale to support a language and another will require another code in the user's locale to support the same language. Look for example the case of Norwegian: is it no, nn or nb or no-nynorks or no-bokmal ? Even with the algorithm based on common prefixes, you won't be able to match them all. So there's a need to specify an algorithms that allows aliases to be resolved. With multi-tags language identifiers the resolution order becomes unpredictable if one supports aliases for one subtag and not the other. What is already unstable in ISO639 is the deprecation of "iw" and the addition of "he", same thing for "in" and "id" or for "yi" and "ji". Don't you call that unstability? OK these codes are deprecated, not reassigned. But they still cause problems. Think more recently about the new codification for Serbo-Croatian, and the split of "sh", with no definition except that it is country based (Serbian, Croatian, Bosnian, Montenegrin), assimuming that one country uses only one language when in fact there are several in the same one, that are shared by multiple countries, and differ mostly by their script... Also if ISO3166 is unstable (CS: is that the former Czechoslovakia or the newer Serbia-Montenegro?), then it introduces unstability too within ISO 3066 or its proposed replacement... for the indentification of languages. For now, the only workable solution to solve these issues is found in supplementary libraries in ICU which support locale aliases. (Yes I use the terme Locale because this is the term that Java gives to this identification, based on a language code consisting into a single subtag, a country/territory code and a variant code with possibly multiple subtags, and no reference to the needed script code; I wonder how the newer RFC 3066 model will fit here).