Fw: Re: PRC asking for 956 precomposed Tibetan characters

Chris Fynn Sun, 29 Dec 2002 09:30:20 -0800

----- Original Message ----- 
From: "Robert R. Chilton" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, December 29, 2002 9:34 AM
Subject: [tibex] Re: PRC asking for 956 precomposed characters



> I had heard some rumors about this proposal over the past year and I was
> interested to finally see n2558.  Sadly, this proposal is flawed on many
> counts.  It seems that this proposal is motivated solely by
> typographical considerations without concern for broader character data
> processing needs.  Although this character set might be fine for
> computer-based typesetting of the modern Tibetan materials now being
> printed in the Peoples' Republic of China, it is somewhat lacking as a
> basis for interchange and processing of Tibetan-script data.
 
> Most notably this proposal represents the repertoire of a particular
> sub-language (modern Tibetan as used in the PRC) rather than a script. 
> There are many examples of Tibetan-script words in classical Tibetan
> works, as well as in Dzongkha and other Tibetan-script languages of
> South Asia, that cannot be represented by this character set.
 
> Secondly, if the goal of this proposal was to facilitate processing of
> Tibetan-script data for purposes other than document publishing then it
> would have been more effective to provide characters for every Tibetan
> initial form (including prefix letters) rather than simply for
> typographical ligatures.  The proposal as now written will result in
> unnecessary complexities in producing a culturally expected collation of
> data encoded using mixed basic Tibetan and BrdaRten characters.
 
> More specifically, the proposal contains some errors of fact:
 
> 1.  The claim that "[the current Tibetan-script] encoding scheme is not
> compatible with traditional education, publication and electronic
> desktop publishing systems" is simply not true.  Any system that is able
> to render other complex languages, notably Arabic and the various Indic
> and Indic-derived scripts of South and Southeast Asia, should be able to
> accommodate Tibetan-script materials encoded using the current Tibetan
> block.  (It is no coincidence that the Tibetan script, which is itself
> derived from ancient Indian script, should share many structural and
> functional characteristics with modern Indic scripts.)
 
> It is understandable that the Chinese would like to regard Tibetan as "a
> horizontal stream of basic Tibetan characters and BrdaRten characters
> without vertical combining" since this facilitates the usage of two
> languages, namely Tibetan and Chinese, together in bi-lingual
> documents.  However, this mode of thought runs counter to the very
> principle of Unicode/ISO-10646 which is to enable *any* number of
> languages to be used together, seamlessly, in documents and other
> computer applications.  Would the Chinese also like to propose a set of
> precomposed characters for each of the Indic scripts so that they
> likewise can be "regarded as a horizontal stream of basic Tibetan [read
> 'basic Indic'] characters and BrdaRten [read 'precomposed Indic']
> characters without vertical combining"?  Or have they resigned
> themselves (and the rest of us) to never mixing Chinese and Indic script
> within a document?  On the other hand, once there is a system that can
> render Chinese together with Hindi or Tamil, rendering of Chinese
> together with Tibetan (as currently encoded) is not technically
> difficult.
 
> In point of fact, the cited "problems with Tibetan information
> interchange and processing" are no more difficult to solve than those
> for other complex scripts -- these having already been solved for a
> substantial number of complex scripts.  The current lack of widespread
> support for Unicode Tibetan simply reflects the fact that there are
> fewer commercial and governmental resources being allocated to the
> development of Unicode Tibetan as compared to other Indic and
> Indic-derived complex scripts.
 
> 2.  The claim that "Up to now, there is no report showing any system
> platform has implemented Tibetan processing system using dynamic
> combining method" is also untrue.  Inquiries can be directed to the
> Dzongkha Development Commission in Bhutan which has overseen the
> development of just such a system platform for Dzongkha (the national
> language of Bhutan)--which is written using the letters of the Tibetan
> script.
 
> 3.  The statement that "Since 1990s, from DOS to Windows, both domestic
> and overseas applications have been using Tibetan BrdaRten character set
> at implementation level 1. For example, the Founder desktop publishing
> system for Tibetan is based on BrdaRten characters which has become the
> de-facto industry standard for Tibetan information interchange and
> processing in China and even outside of China" is exaggerated. 
> Tibetan-script computer systems have been in use in North America,
> Europe, South Asia and East Asia/Pacific Rim as early as 1986 but it is
> completely false to say that the character repertoire of n2558 has
> become "the de-facto industry standard for Tibetan information
> interchange and processing" in any place outside of the PRC.  As noted
> above, the character set of n2558 does not even fully support usages of
> Tibetan script in regions outside of China.  (The notation of
> "Worldwide" in question 5 of the Part C.: Technical-Justification in the
> Proposal Summary Form is thus highly misleading.)
 
> 4.  The n2558 document asserts that "Once the Tibetan BrdaRten
> characters are encoded in BMP, many current systems supporting
> ISO/IEC10646 will enable Tibetan processing without major modification.
> Therefore, the international standard Tibetan BrdaRten characters will
> speed up the standardization and digitalization of Tibetan information,
> keep the consistency of implementation level of Tibetan and other
> scripts, develop the Tibetan culture and make the Tibetan culture
> resources shared by the world."  There are a number of counter-arguments
> to these assertions:
 
> First, due to the limitations of the n2558 character set for
> representing classical Tibetan, Dzongkha, and other Tibetan-script
> materials it is not reasonable to expect worldwide adoption of this
> character set.  Since the dynamic-combining model will continue to be
> used in South Asia (where complex-script systems are the norm), in
> academic institutions (where research in classical Tibetan is conducted)
> and elsewhere, there will always be a need to normalize Tibetan-script
> data interchanged between regions that use these two differing encoding
> models for encoding Tibetan-script data.  Thus, the acceptance of this
> character set into the ISO-10646/Unicode standard will actually be an
> *obstacle* to "standardization and digitization of Tibetan information."
 
> Second, the reference to "consistency of implementation level of Tibetan
> and other scripts" would seem to presume that the "other scripts" in
> question are not complex scripts.  This statement is simply not relevant
> when we consider the requirements of--and the already implemented
> multilingual systems for the handling of--Indic and Indic-derived
> complex scripts.
 
> 5.  Any claims of a pre-existing "de-facto industry standard" for
> Tibetan even in China seem to be contradicted by the statement in the
> Conclusion, that "After serious discussion and analysis by Tibetan
> linguists, encoding experts and software developers in China, all are in
> favor to establish a national and international standard Tibetan
> BrdaRten character set to meet the requirement of Tibetan information
> processing."  This seems to indicate that a national standard for
> Tibetan is yet to be established, even in China.
 
> In summary assessment, had this proposal been comprehensive enough to
> satisfy the needs of *all* users of the Tibetan-script languages and
> materials, had it taken into consideration character data processing
> needs of Tibetan beyond computerized typesetting (such as collation),
> and had it been presented ten years ago, then it might have well been
> worthy of serious consideration.  As it now stands, this proposal offers
> too little too late and, moreover, would simply add further confusion
> and obstacles to the standardization of Tibetan-script data processing
> and interchange.  Furthermore, even had this proposal had been presented
> for consideration ten years ago, the fact that complex-script (dynamic
> combination) rendering is needed for Indic scripts would even then have
> been a strong argument in favor of the current ISO-10646 encoding model
> and against an encoding model of the type proposed in n2558.
 
> Respectfully,
 
> Robert Chilton
> Technical Director
> The Asian Classics Input Project
 
=================================================

Fw: Re: PRC asking for 956 precomposed Tibetan characters

Reply via email to