You appear to be assuming that Unicode lists languages. It does not. It deals 
with characters and scripts. As mentioned before, it does not attempt to 
document all possible and preferred ways to refer to characters or scripts; 
that is well beyond the scope, purpose and requirements. All that Unicode does 
is provide a standard and universally-available means of encoding 
text--whatever text for whatever language, and referred to by whatever 
communities in whichever ways they may choose. To achieve that, it must adopt 
_some_ name for characters and scripts for reference purposes so that 
implementers of the standard have some way to refer to those things 
unambiguously. But that does not at all mean that _everybody_ is assumed to use 
those same terms, or even to think of collections of characters in the same way 
that Unicode uses the notion of script.

With that in mind, "Bengali" is used in the Unicode standard purely as an 
unambiguous way to refer to a particular collection of characters that are 
related in history and current conventional usage (across multiple language 
communities) and that share certain graphic and behavioural characteristics. It 
is mainly historical coincidence that "Bengali" is the term used in the 
Standard; as Doug Ewell and John Jenkins explained in other mail, these terms 
were adopted within the Standard based on how such collections are most 
typically referred to in English-language discussion. The term is being used to 
reference a collection of characters--a "script"--and not a language, and there 
is no intent whatsoever to suggest that any particular language should be 
considered to have any particular status relative to any other language.

Thus, what you refer to as a "glaring mistake" is not a mistake at all when 
considered in relation to what the intent and usage within the Standard is--and 
what it is _not_.


Peter

-----Original Message-----
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of delex r
Sent: Friday, September 09, 2011 3:54 PM
To: unicode@unicode.org
Subject: Continue:Glaring mistake in the code list for South Asian Script

I figure out that Unicode has not addressed the sovereignty issues of a 
language while trying to devise an ASCII like encoding system for almost all 
the characters and symbols used on earth. I am continuing with my observation 
of the glaring mistake done by Unicode by naming a South Asian Script as 
“Bengali”. Here I would like to give certain information that I think will be 
of some help for Unicode in its endeavour to faithfully represent a Universal 
Character encoding standard truer to even micro-facts.

India is believed to have at least 1652 mother tongues out of which only 22 are 
recognized by the Indian Constitution as official languages for administrative 
communication among local governments and to the citizens. And the constitution 
has not explicitly recognized any official script. As Unicode has listed the 
languages and scripts, the Indian Constitution has also listed the official 
languages ( In its 8th schedule). The first entry in that list is the Assamese 
language.  Assamese is a sovereign language with its own grammar and “script” 
that contains some unique characters that you will not find in any of the 
scripts so far discovered by Unicode. At least 30 million people call it the 
“Assamese Script” and if provided with computers and internet connection can 
bomb the Unicode e-mail address with confirmations. These characters are, I 
repeat, the one that is given a Hexcode 09F0  and the other with 09F1 by this 
universal character encoding system but unfortunat!
 ely has described both as “Bengali” Ra etc. etc. I don’t know who has advised 
Unicode to use the tag “Bengali” to name the block that includes these two 
characters. 

If you are not an Indian then just google an image of an Indian Currency note. 
There on one side of the note you will find a box inside which the value of the 
currency note is written in words in at least 15 scripts of official Indian 
languages.( I don’t know why it is not 22). At the top , the script is Assamese 
as Assamese is the first officially recognized language (script?) . Next below 
it you will find almost similar shapes. That is in Bengali. India officially 
recognises the distinction between these two scripts which although shaped 
similar but sounds very different at many points. And the standard assamese 
alphabet set has extra characters which are never bengali just like London is 
never in Germany.

Coming again to the Hexcodes 09F0 (Raw) and 09F1 (wabo). Both have nothing 
Bengali in them and interestingly 09F1 ( sounds WO or WA when used within 
words) has even nothing ‘Ra’ sound in it. Thus you know, with actual Bengali 
alphabet set one can’t write anything to produce the sound “Watt” as in James 
Watt and instead need to combine three alphabets but even then only to sound  
like “ OOYAT “ in Bengali itself. 

Therefore Unicode must consider terming the block range as “Assamese” which 
will faithfully describe the block range with 09F0 and 09F1 in it and replace 
all tags “ Bengali” with “Assamese” in the code descriptions and vice versa . 
London is in England and Berlin is in Germany. You just can’t bring London into 
Germany and then say England is in Germany. You can’t live with a lie or wrong 
too long.





Reply via email to