In a message dated 2001-06-19 10:36:40 Pacific Daylight Time, 
[EMAIL PROTECTED] writes:

>  I agree with you, the problem is that the D800 to DFFF codes were never
>  defined as valid Unicode characters.

True; there were never characters assigned into these positions.

>  Encoding these into ED xx xx codes has
>  never produced valid Unicode code points in UTF-8.

False; prior to Unicode 2.0 they were perfectly valid code points (not 
characters) in the so-called O-zone.  The addition of surrogate code points 
created a hole in the O-zone (sorry, I couldn't resist :) and removed them 
from the realm of valid code points.

>  Thefore any of these
>  codes in the database were never valid Unicode characters at any point in
>  the Unicode standard.  As a consequence there is no backwards compatibility
>  issue.

True; there should be no such data in any database anywhere.  But be careful 
about "characters" vs. "code points."  The values in question were never 
characters, but they were once valid code points.  My favorite example, 
U+0220, has always been a valid code point but not an assigned character 
(until Unicode 3.2).

The only backwards compatibility issue comes from software written in the 
UCS-2 days that disregarded the UTF-8 specification as to the encoding of 
values above U+FFFF in more than 3 bytes.

-Doug Ewell
 Fullerton, California

Reply via email to