Carl W. Brown wrote:
> Doug writes:
> > You might remember that I chided Microsoft for 
> > its definition of "Unicode" in
> > Windows 2000 Help, where Unicode was described 
> > as a "16-bit standard" that was "developed between 
> > 1988 and 1991," implying that the work was
> > finished.  Even at the time Windows 2000 was being 
> > developed, there was quite a bit of room for 
> > improvement in this definition.
> 
> You are right however, Unicode was officially still 16 bit when 
> Win2000 was released to manufacturing.  We though they knew about 
> surrogates and new planes, it was not official and could have 
> been changed.

Oh God... Surrogates were standardized long before they started
being used in Unicode 3.2 for new codepoint assignments out of
the BMP...

And Microsoft was already a full member of the UTC, and knew all
about the required support for GB18030 in P.R.China starting in
2000.

Unicode 3.0.0 was released in September 1999
and was superseding Unicode 2.1.9 published in April 1999
(UTR #8 version 3.0, see
http://www.unicode.org/unicode/reports/tr8/).

Note also that normalization was already published at that time
(see version 17.0 of UTR#15 in September 1999 at
http://www.unicode.org/unicode/reports/tr15/tr15-17.html)

As well as the encoding model for surrogates
(see http://www.unicode.org/reports/tr17/tr17-2.html
dated 1998-10-14, which clearly states that the
range of codepoints in 0..10FFFF and already references
UTF-8 and UTF-16 as valid encoding forms for this range,
with up to 4 bytes in UTF-8, or 2 words in UTF-16).

The character model was already known as well as the general
structure of Unicode to handle characters out of the BMP.
These new characters were not standardized magically from
nothing: the Han working group was actively working and the
GB18030 standard was already there, that clearly demonstrated
that mapping the required GB18030 repertoire in Unicode
would be unavoidable. So there were already very active
discussions between Unicode, ISO/IEC 10646, and Han working
group to integrate GB18030 within Unicode. It was clear that
many new characters would become necessary in Unicode 3.0.0
even if only Unicode 2.1.9 was published at that time.

Microsoft must have then anticipated this by working actively
to experiment the proposed models. Adding immediately the correct
support of surrogates was then a high priority, even if a
complete charset mapping to Unicode was not available at
that time to translate between GB18030 and Unicode.

So Windows 2000 should have had a full support of surrogates
immediately (and correctly handle unmatched surrogate pairs
as invalid sequences for use in filenames, as well as in its
international support libraries, simply because it was needed
for GB18030 support)...


__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE!  http://www.ellaforspam.com

<<attachment: winmail.dat>>

Reply via email to