From: "Peter Kirk" <[EMAIL PROTECTED]> > Windows 2000/XP and Office need no adaptation, just fonts and keyboards. > Well, the menus do need localisation, and obviously that is a > significant issue (although I guess most Tajiks know or can easily learn > the Russian for "File", "View", "Help" etc). > > Issues of localisation of non-Unicode software are off topic for this > list, surely.
Here again the localization of the interface is not the main issue. I do agree that a program interface in Russian would work for most Tajik peoples. The main problem is for the documents that people creater themselves in their language for their own use and for interchange with others. This includes all the various tools used to create personal webpages, sending emails, and instant messaging, but also creating printed documents for snail mail, publishing books, writing papers, feeding databases... And also using the various databases that have been created with various encodings well adapted for Russian but not necessarily for Tajik, and the difficulty to interchange this legacy data and use them with the tools they have (the main problem is not in the standard office programs but in the business-specific softwares, which may have been developed with Russian standards or with legacy tools developed by lazy US programmers that just considered the case of handling English and a few Western European languages, and forgot the case of Cyrillic alphabet variants). To use these softwares that are still needed but difficult or expensive to adapt, there's a need to merge data from various sources which may have used several "personal" 8-bit encodings usable in some limited domain and transcode them into a common and well-accepted 8-bit encoding. Suppose this common 8-bit encoding is the ISO-8859 Cyrillic charset, then some Tajik characters present in this legacy data won't map well and there may be alteration of the data (which may be a serious issue if this data has some legal value, or is used for identification of persons or services or marks). Going to Unicode is of course a longer term target, but for now there will remain lots of use of 8-bit processing in softwares or devices before they are replaced with more modern ones (in fact I do think that Western programmers will continue for a very long time to be lazy, until classic C or C++ development is completely deprecated and will continue to produce software processing only single-byte coded characters, simply because the OS they use themselves are processing only 8-bit coded chars in its API, notably in POSIX services and Linux/Unix kernels where a "char" is a byte, as well as in many open protocols for the Internet). Using UTF-8 is a solution but not the simplest one for programmers and they are lazy in the code they produce and test, and they will too often forget the necessary code to handle multibyte sequences correctly, notably if there are security issues like possible buffer overruns. I took the case of Tajik, but this may be true for every language that needs more than just the ISO-8859-1 character subset. In many cases, a standardized ISO-8859 variant may help solve the immediate problem found in many countries, with the notable exception of China, Korea and Japan which always need large subsets and where programmers are used to not be lazy and to process MBCS sequences (including UTF-8 for Unicode) correctly.