Hello,
am 2012-12-11 20:16, schrieb James Lin:
If i have a code point: U+4E8C or "二"
In UTF-8, it's "E4 BA 8C" while in UTF-16, it's "4E8C".
Where is this "BA" comes from?
Cf. <http://skew.org/cumped/>.
Enclosed are the (almost original) version of “Cima’s Magic
UTF-8 Pocket encoder” (2004), and its two followers for
more UTFs. Display or print with a fixed-pitch font,
such as Lucida Console or Courier New. Enjoy!
Cheers,
Otto Stolz
Side 1 (print and cut out):
+------------+-------+-----------------------+------+
| U+0000 | yy zz | Cima's UTF-8 Magic | Hex= |
| U+007F | ! ! | Pocket Encoder | B-4 |
| YZ | . . | | |
+------------+-------+-------+ Vers. 1.1 | 0=00 |
| U+0080 | 3x xy | 2y zz | 30 June 2004 | 1=01 |
| U+07FF | 3. .. | 2. ! | | 2=02 |
| XYZ | . . | . . | M.C. | 3=03 |
+------------+-------+-------+-------+ | 4=10 |
| U+0800 | 32 ww | 2x xy | 2y zz | | 5=11 |
| U+FFFF | ! ! | 2. .. | 2. ! | | 6=12 |
| WXYZ | E . | . . | . . | | 7=13 |
+------------+-------+-------+-------+-------+ 8=20 |
| U-00010000 | 33 0v | 2v ww | 2x xy | 2y zz | 9=21 |
| U-000FFFFF | ! 0. | 2. ! | 2. .. | 2. ! | A=22 |
| VWXYZ | F . | . . | . . | . . | B=23 |
+------------+-------+-------+-------+-------+ C=30 |
| U-00100000 | 33 10 | 20 ww | 2x xy | 2y zz | D=31 |
| U-0010FFFF | ! 1. | 2. ! | 2. .. | 2. ! | E=32 |
| WXYZ | F 4 | 8 . | . . | . . | F=33 |
+------------+-------+-------+-------+-------+------+
Side 2 (print, cut out, and glue on back of side 1):
+---------------------------------------------------+
| Cima's UTF-8 Magic Pocket Encoder - User's Manual |
| (vers. 1.1, 30 June 2004, by Marco Cimarosti) |
| |
| - Left column: min and max Unicode scalar values: |
| pick the row that applies to the code point you |
| want to convert to UTF-8. Letters V..Z mark the |
| hexadecimal digits that have to be processed. |
| - Right column: hexadecimal to base-4 table. |
| - Central columns: work area to compute each octet|
| (1 to 4) that constitute UTF-8 octet sequences. |
| Convert each digit marked by V..Z from hex. to |
| b.-4. Write b.-4 digits on the dots placed under |
| letters v..z (two b.-4 digits per hex. digit). |
| Convert 2-digit base-4 number to hex. digits and |
| write them on the dots on the line. That is your |
| UTF-8 sequence in hex. ! Exclamation marks show |
| passages that may be skipped, either because the |
| digit is hard-coded, or because it may be copied |
| directly from the scalar value. |
+---------------------------------------------------+
Enjoy!
Marco
Obverse: Print with a fixed-width font, such as Lucida Console,
and cut out.
╔════════════╦═════════════╦═════════════════════════════════╗
║ U+0000 ║ W X Y Z ║ Otto’s Magic Pocket Encoder ║
║ U+D7FF ║ ! ! ! ! ║ for UTF-16 ╔═══════════════════╣
║ WXYZ ║ _ _ _ _ ║ ║ V>vv │ V>vv ║
╟────────────╫─────────────╢ Version 1.1 ║ U>uu │ U>uu ║
║ U+E000 ║ W X Y Z ║ ©2004-07-05 ║ tt<T │ tt<T ║
║ U+FFFF ║ ! ! ! ! ║ ║ _<__ │ _<__ ║
║ WXYZ ║ _ _ _ _ ║ ║ ────────┼──────── ║
╟────────────╫─────────────╚═════════════╣ 0=00 │ 13<8=20 ║
║ U-00010000 ║ 31 2t tu uv │ 31 3v Y Z ║ 00<1=01 │ 20<9=21 ║
║ U-000FFFFF ║ ! 2_ __ __ │ ! 3_ ! ! ║ 01<2=02 │ 21<A=22 ║
║ TUVYZ ║ D _ _ _ │ D _ _ _ ║ 02<3=03 │ 22<B=23 ║
╟────────────╫─────────────┼─────────────╢ 03<4=10 │ 23<C=30 ║
║ U-00100000 ║ 31 23 3u uv │ 31 3v Y Z ║ 10<5=11 │ 30<D=31 ║
║ U-0010FFFF ║ ! ! 3_ __ │ ! 3_ ! ! ║ 11<6=12 │ 31<E=32 ║
║ UVYZ ║ D B _ _ │ D _ _ _ ║ 12<7=13 │ 32<F=33 ║
╚════════════╩═════════════╧═════════════╩═══════════════════╝
....:....1....:....2....:....3....:....4....:....5....:....6..
Reverse: Cut out and paste on back of obverse.
╔════════════════════════════════════════════════════════════╗
║ Otto’s Magic Pocket Encoder for UTF-16 Version 1.1 ║
║ User’s Manual (inspired from Cima’s UTF-8 MPE) ║
╠════════════════════════════════════════════════════════════╣
║• Left column: min and max Unicode scalar values: pick the ║
║ row that applies to the code point to be converted. ║
║ T…Z mark the hexadecadic digits that have to be processed.║
║• Central column: work area to compute UTF-16BE code units. ║
║• Right column: hexadecadic to quaternary conversion tables:║
║ < for T to tt; = for U/V to uu/vv (step 1) and for step 2.║
║1. Convert each digit marked by T…V from hex to quat. Write ║
║ quat digits on the underscores placed under letters t…v. ║
║2. Convert 2-digit quat numbers to hex digits or copy digits║
║ W…Z, as indicated, and write them on the underscores on ║
║ the next line. That’s your UTF-16BE sequence in hex. ║
║! Exclamation marks indicate shortcuts. ║
╚════════════════════════════════════════════════════════════╝
Enjoy.
Side 1 (print and cut out):
╔════════════╦═══════════════════════╤═══════════════╗
║ This space ║ Mike’s UTF-32 Magic │ Vers. 1.0 ║
║ for rent ║ Pocket Encoder │ 06 July 2004 ║
║ ║ │ ║
╠════════════╬═══════╤═══════╤═══════╪═══════╗ ║
║ U-00000000 ║ 0 0 │ U V │ W X │ Y Z ║ ║
║ U-0010FFFF ║ ! ! │ ! ! │ ! ! │ ! ! ║ ║
║ UVWXYZ ║ 0 0 │ _ _ │ _ _ │ _ _ ║ ║
╚════════════╩═══════╧═══════╧═══════╧═══════╩═══════╝
Side 2 (print, cut out, and glue on back of side 1):
╔════════════════════════════════════════════════════╗
║ Mike’s UTF-32 Magic Pocket Encoder - User’s Manual ║
║ (vers. 1.0, 6 July 2004, by Mike Ayers) ║
║ ║
║ - Left column: min and max Unicode scalar values. ║
║ Letters U..Z mark the hexadecimal digits to be ║
║ processed. Read the bytes in the bottom row ║
║ left to right, or right to left for UTF-32LE. ║
╚════════════════════════════════════════════════════╝