> There's no updating needed. The key is that The Unicode Standard, Version
> 3.0 recognizes UTF-16 as the default encoding. Therefore code values (or
> units) which are defined as 'minimal bit combination that can represent a
> unit of encoded text' are 16-bit. In UTF-16, one sometimes needs two of
> these, instead of just one.

>>| C1 says "A process shall interpret Unicode code values as 16-bit
>>| quantities."

>> This I find mightily confusing.  Why say something like this when
>> there are (well, will be) characters that cannot be represented with
>> 16 bits in any of the Unicode encodings?

> because the smallest unit of UTF-16 (which can represent characters outside
> the first 64K) is 16-bit. See the full text of definition D5 on page 41.

The confusion is that a 16-bit unit is referred to as a character code,
but it is not.  It's a character element code (to my way of thinking),
and one can construct a character code from one or more character
element codes.  It's sort of, semi-atomic, only not, i.e. not unitary
and complete unto itself.  And the contextual business muddies it,
as well.

It just so happens that most character codes have a single element,
but the (necessary?) inconsistency complicates matters in precisely
the ways I'd been hoping Unicode would simplify.  Well, it does
simplify it; just not as far as one would wish.

Dum spiro, spero.

John G. Otto                             Nisus Software, Engineering
www.infoclick.com  www.mathhelp.com  www.nisus.com  software4usa.com
EasyAlarms  PowerSleuth  NisusEMail  NisusWriter  MailKeeper  QUED/M
   My opinions are probably not those of Nisus Software, Inc.


Reply via email to