By the way:

>  In UTF-17, for example, the Han character sequence <U+5341, U+4E03>
>  ('17'), would be converted to:
>  
>  <38 30 31 31 31 36 30 31 38 30 31 30 37 30 30 33>

Close, but not quite.  Try:

<38 30 30 35 31 35 30 31 38 30 30 34 37 30 30 33>

>  Because all UTF-17 bytes are in the range 0x30..0x38, this
>  UTF-17 byte sequence would also be visible displayed in
>  ASCII (or Latin-1) as: "8011160180107003".

"8005150180047003".

This is what you get for doing an implementation.  But I did check my answers 
against other octal conversion routines, hand calculator, Windows Calculator, 
etc.

>  Since all UTF-17 bytes display as digits, it is programmer
>  friendly. All UTF-17 values will display visibly and correctly
>  in any debugger, and the programmer need only recall that
>  "80111601" means U+5341, for instance, to get back to the
>  original Unicode character. 

Ibid.

-Doug Ewell
 Fullerton, California

Reply via email to