----- Original Message ----- From: "Robert R. Chilton" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, December 29, 2002 9:34 AM Subject: [tibex] Re: PRC asking for 956 precomposed characters
> I had heard some rumors about this proposal over the past year and I was > interested to finally see n2558. Sadly, this proposal is flawed on many > counts. It seems that this proposal is motivated solely by > typographical considerations without concern for broader character data > processing needs. Although this character set might be fine for > computer-based typesetting of the modern Tibetan materials now being > printed in the Peoples' Republic of China, it is somewhat lacking as a > basis for interchange and processing of Tibetan-script data. > Most notably this proposal represents the repertoire of a particular > sub-language (modern Tibetan as used in the PRC) rather than a script. > There are many examples of Tibetan-script words in classical Tibetan > works, as well as in Dzongkha and other Tibetan-script languages of > South Asia, that cannot be represented by this character set. > Secondly, if the goal of this proposal was to facilitate processing of > Tibetan-script data for purposes other than document publishing then it > would have been more effective to provide characters for every Tibetan > initial form (including prefix letters) rather than simply for > typographical ligatures. The proposal as now written will result in > unnecessary complexities in producing a culturally expected collation of > data encoded using mixed basic Tibetan and BrdaRten characters. > More specifically, the proposal contains some errors of fact: > 1. The claim that "[the current Tibetan-script] encoding scheme is not > compatible with traditional education, publication and electronic > desktop publishing systems" is simply not true. Any system that is able > to render other complex languages, notably Arabic and the various Indic > and Indic-derived scripts of South and Southeast Asia, should be able to > accommodate Tibetan-script materials encoded using the current Tibetan > block. (It is no coincidence that the Tibetan script, which is itself > derived from ancient Indian script, should share many structural and > functional characteristics with modern Indic scripts.) > It is understandable that the Chinese would like to regard Tibetan as "a > horizontal stream of basic Tibetan characters and BrdaRten characters > without vertical combining" since this facilitates the usage of two > languages, namely Tibetan and Chinese, together in bi-lingual > documents. However, this mode of thought runs counter to the very > principle of Unicode/ISO-10646 which is to enable *any* number of > languages to be used together, seamlessly, in documents and other > computer applications. Would the Chinese also like to propose a set of > precomposed characters for each of the Indic scripts so that they > likewise can be "regarded as a horizontal stream of basic Tibetan [read > 'basic Indic'] characters and BrdaRten [read 'precomposed Indic'] > characters without vertical combining"? Or have they resigned > themselves (and the rest of us) to never mixing Chinese and Indic script > within a document? On the other hand, once there is a system that can > render Chinese together with Hindi or Tamil, rendering of Chinese > together with Tibetan (as currently encoded) is not technically > difficult. > In point of fact, the cited "problems with Tibetan information > interchange and processing" are no more difficult to solve than those > for other complex scripts -- these having already been solved for a > substantial number of complex scripts. The current lack of widespread > support for Unicode Tibetan simply reflects the fact that there are > fewer commercial and governmental resources being allocated to the > development of Unicode Tibetan as compared to other Indic and > Indic-derived complex scripts. > 2. The claim that "Up to now, there is no report showing any system > platform has implemented Tibetan processing system using dynamic > combining method" is also untrue. Inquiries can be directed to the > Dzongkha Development Commission in Bhutan which has overseen the > development of just such a system platform for Dzongkha (the national > language of Bhutan)--which is written using the letters of the Tibetan > script. > 3. The statement that "Since 1990s, from DOS to Windows, both domestic > and overseas applications have been using Tibetan BrdaRten character set > at implementation level 1. For example, the Founder desktop publishing > system for Tibetan is based on BrdaRten characters which has become the > de-facto industry standard for Tibetan information interchange and > processing in China and even outside of China" is exaggerated. > Tibetan-script computer systems have been in use in North America, > Europe, South Asia and East Asia/Pacific Rim as early as 1986 but it is > completely false to say that the character repertoire of n2558 has > become "the de-facto industry standard for Tibetan information > interchange and processing" in any place outside of the PRC. As noted > above, the character set of n2558 does not even fully support usages of > Tibetan script in regions outside of China. (The notation of > "Worldwide" in question 5 of the Part C.: Technical-Justification in the > Proposal Summary Form is thus highly misleading.) > 4. The n2558 document asserts that "Once the Tibetan BrdaRten > characters are encoded in BMP, many current systems supporting > ISO/IEC10646 will enable Tibetan processing without major modification. > Therefore, the international standard Tibetan BrdaRten characters will > speed up the standardization and digitalization of Tibetan information, > keep the consistency of implementation level of Tibetan and other > scripts, develop the Tibetan culture and make the Tibetan culture > resources shared by the world." There are a number of counter-arguments > to these assertions: > First, due to the limitations of the n2558 character set for > representing classical Tibetan, Dzongkha, and other Tibetan-script > materials it is not reasonable to expect worldwide adoption of this > character set. Since the dynamic-combining model will continue to be > used in South Asia (where complex-script systems are the norm), in > academic institutions (where research in classical Tibetan is conducted) > and elsewhere, there will always be a need to normalize Tibetan-script > data interchanged between regions that use these two differing encoding > models for encoding Tibetan-script data. Thus, the acceptance of this > character set into the ISO-10646/Unicode standard will actually be an > *obstacle* to "standardization and digitization of Tibetan information." > Second, the reference to "consistency of implementation level of Tibetan > and other scripts" would seem to presume that the "other scripts" in > question are not complex scripts. This statement is simply not relevant > when we consider the requirements of--and the already implemented > multilingual systems for the handling of--Indic and Indic-derived > complex scripts. > 5. Any claims of a pre-existing "de-facto industry standard" for > Tibetan even in China seem to be contradicted by the statement in the > Conclusion, that "After serious discussion and analysis by Tibetan > linguists, encoding experts and software developers in China, all are in > favor to establish a national and international standard Tibetan > BrdaRten character set to meet the requirement of Tibetan information > processing." This seems to indicate that a national standard for > Tibetan is yet to be established, even in China. > In summary assessment, had this proposal been comprehensive enough to > satisfy the needs of *all* users of the Tibetan-script languages and > materials, had it taken into consideration character data processing > needs of Tibetan beyond computerized typesetting (such as collation), > and had it been presented ten years ago, then it might have well been > worthy of serious consideration. As it now stands, this proposal offers > too little too late and, moreover, would simply add further confusion > and obstacles to the standardization of Tibetan-script data processing > and interchange. Furthermore, even had this proposal had been presented > for consideration ten years ago, the fact that complex-script (dynamic > combination) rendering is needed for Indic scripts would even then have > been a strong argument in favor of the current ISO-10646 encoding model > and against an encoding model of the type proposed in n2558. > Respectfully, > Robert Chilton > Technical Director > The Asian Classics Input Project =================================================