> On 01/03/2004 00:18, Asomiddin Atoev wrote: > > >I am emailing on behalf of the Tajikistani state > >working group on localizing software for Tajik > >language. Could you please kindly guide us to be in > >right direction. What shall be the procedure of > >standartization of alphabet symbols? Tajik alphabet > >makes use of cyrillic symbols and contains of 35 > >letters.
I think that his question is not whever Unicode supports Tajik, if works has been done (may be in other countries, for librarian purposes) to define a subset appropriate to publish and work with texts in Tajik language. The fact that Tajik orthograph has been influenced a lot from the time of USSR and Russian domination in this former Republic of the Union, may have influenced the language so that some old texts with important cultural backgrounds have lost some of their original semantic. So there may exist libraries in the world, where there remains texts in original orthograph, or adapted from the Cyrillic-based orthograph, which contain more letters than those that we commonly see. If there are attempts to reform the orthograph to better match the language needs, there may already exist some letter variants which would interest him. Also, if there are existing sets, this means that this creates an opportunity to propose an alternate 8-bit encoding for Tajik, which would be a variant of the ISO-8859 Cyrillic encoding used for Russian, except that it would contain all letters needed for Tajik. Unicode clearly seems to support this language well, but there's still a need to have a common framework for working with Tajik texts with an 8-bit encoding (which would be better than UTF-8 and as simple and efficient as ISO-8859-1 for Western European languages, or ISO-8859-4 for Russian). So this question would certainly meet some exports at the ISO Working Group working on 8-bit encodings compatible with the ISO-8859 standard (this is independant of the fact that this subset will be fully mapped and supported with Unicode. Having such a subset will certainly help unifying various sources by agreeing on a common orthograph, instead of relying on the support of the large Unicode/ISO/IEC 10646 coded set. If such a subset is then approved nationally, it will help get a decent support and mapping within many fonts, keyboard drivers, and text processing tools. After all, ISO-8859-15 was decided and standardized after a similar reform in the Euopean Union.that needed some Latin characters not present in ISO-8859-1, even if all these characters were already present in Unicode, or adopted recently in Unicode (like the Euro codepoint that was created instead of using the legacy and non standard ECU symbol with various and non distinctive forms). So why not with Tajik too?