From: "Antoine Leca" <[EMAIL PROTECTED]> > > Never forget that language codes and country/territory codes are > different... > > We were speaking about ccTLD. A different beast. Try to resolve ANYTHING.GB. > on a root server, or alternatively to seek UK in ISO 3166, to understand > what I mean.
I'm not speaking about ccTLD too... but a domain name ending in .gb or .fx could be valid if there's some DNS record with them. ccTLDs inherit from lagacy assignments by IANA, but even today, the IANA and RIR databases contain references to both the GB and UK country/territory codes. Look precisely into ISO 3166, and you'll see that both [UK] and [GB] are reserved even if only GB is assigned. You'll see other entries used by ITU (such as [EA] for Ceuta and Mellila, two small Spanish dependencies in Morrocco, with a status similar to Gibraltar, a British dependancy in Morocco which has an assignment in ISO 3166; look also for [DG] which is used by ITU for Diego Garcia, despite it is part of the British Territories in the Indian Ocean with ISO 3166 code [IO]) ISO 3166 has its imperfections, but at least it contains enough references to reserve all codes used in IANA and ccTLDs, but also for some non-territory codes used for groups of countries in WIPU... Now when you see that softwares actually rarely need country/territory codes for their internationalization, but rather would need some code to differentiate scripts and script variants (such as between Latin and Cyrillic Serbian, or between Traditional and Simplified Chinese, and you'll see the caveats introduced in internationalized softwares when one needs to set its locale code to zh_TW to refer to Traditional Chinese, even if this is needed to address language variants used in other areas than Taiwan). Which code must be used to create resources in Serbian Cyrillic? [sh_YU], [sh_CS], [sr_CS] ? How can we avoid the confusion with Latin script versions? In fact the problem is not in ISO 3166, but in ISO 3066 for the designation of locales. This comes from imperfections in the ISO 639 standard, which has lots of difficulties to encode languages... And even more when it needs to make distinctions between languages written with several scripts (thanks now we have codes for scripts, maintained by Unicode, but there's currently no support for them in locale identifiers...) Country/territory codes are too much instable to correctly tag the language used in documents and applications, but the combination of ISO 639 and 3166 is for now the only widely supported alternative. So within locales, the ISO 3166 country/territory code has lost its initial function to designate a territory. Instead it designates some language variants. I Also think about the case of Norwegian [no] which has two major variants: Bokmål for the traditional "book" orthograph and Nynorsk for the reformed "new" language; in ISO 639 we find new codes [nn] for Nynorsk and [nb] for Bokmål. Imagine the complication for softwares that should run with a Norwegian UI. Which code should be used? We also find [ax] for the Åland variant of Swedish spoken in Åland islands [AX] a dependancy of Finland [FI]. Some softwares assume incorrectly that this language is Finnish when it is in fact a variant of Swedish [sv]. Should softwares use [sv] or [ax]? Some softwares have chosen to use [sv_FI] to refer to the Åland language, because it is really the Swedish language spoken in a part of Finland.... How can those rules be infered in a locale-aware software or system?