RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

Cathy Wissink Tue, 20 Feb 2001 08:48:30 -0800
The people who are responsible for this text have been made aware of the
problem.  This will be updated for WindowsXP.

Cathy

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, February 20, 2001 8:04 AM
To: Unicode List
Subject: Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in


In a message dated 2001-02-20 04:21:49 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

>  A little out of date, but describing correctly the state of art in 1991
>  before the merger.

Agreed, but the example was from Windows 2000.  It should at least be
current 
through Unicode 2.1.

>  Even 8-bit ASCII is a correct term meaning ISO-8859-1.

I would question that.  Understandable, yes, but not really correct.

>  A nit to pick: It's the latin alphabet, not roman. Roman is a kind of 
>  typeface, contrasting to sans serif aka grotesque.

True.  I have also heard "roman" used to mean the opposite of italic.

>  >  Exercise for the reader:  See how many misstatements about Unicode
(and 
>  >  ASCII) you can find in this text.
>
>  Fewer than you expect. Only the target described does not exist any
longer.
>  Since the merger with ISO 10646 was forseeable even at that time, there
are
>  no implementation of Unicode 1.0 anyway.

Here is my list.  Remember that I am expecting information supplied with 
Windows 2000 to be current through Unicode 2.1.

>  A 16-bit character encoding standard

Wrong; surrogates have existed since about 1993 (someone help me with the 
exact date).

>  developed by the Unicode Consortium between 1988 and 1991.

This implies that development was finished in 1991, and only new characters 
are added.  In fact, lots of new development to Unicode has taken place
since 
then (just look at all the TR's).  This might be splitting hairs.

>  By using two bytes to represent each character,

Even "16 bits" would be better than "two bytes" here, but again this is 
nit-picking.

>  Unicode enables almost all of the written languages of the world to be 
>  represented using a single character set.

Hey, they got something right!

>  By contrast, 8-bit ASCII

Mentioned above.

>  is not capable of representing all of the combinations of letters and 
diacritical
>  marks that are used just with the Roman alphabet.

I thought "Roman" was simply an alternate word for "Latin," but Jorg is 
correct.  This is also an error.

>  Approximately 39,000 of the 65,536 possible Unicode character codes have
>  been assigned to date, 21,000 of them being used for Chinese ideographs.

The count was correct once, but that was 10 years ago.

>  The remaining combinations are open for expansion.

"Combinations"?  You mean of two bytes?

Well, that's about enough.  I am not a habitual Microsoft basher, but 
somebody in their Help department really needs to update the information 
distributed with their OS.  Tex is right that we are bound to see a certain 
amount of misinformation, but it is our duty to help correct it.

-Doug Ewell
 Fullerton, California
RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

Reply via email to