|
You just need a mapping table from Unicode
codepoints to Shift-JIS code positions, and a very simple code point parser to
translate UTF-8 into Unicode code points.
You'll find a mapping table in the Unicode UCD, on
its FTP server. The UTF-8 form is fully documented in the Conformance section of
the Unicode standard and requires no table to convert UTF-8 to 21-bit Unicode
codepoints.
There are existing tools that perform that for you,
because they integrate both:
- Java (international edition) has a Shift-JIS
mapping to Unicode which is reversible. It is used with the Charset support in
java.io.* and java.nio.* packages and classes. You can even use the prebuilt
tool native2ascii (from the Java SDK) to do that:
native2ascii -encoding UTF-8
< filename.UTF-8.txt
|
native2ascii -reverse -encoding SHIFT-JIS >
filename.SHIFT-JIS.txt
- GNU recode on Linux/Unix may do that for you
too.
- the Open-Sourced ICU offered by IBM has
an API and support mappings for lots of charsets.
|
- Re: Relationship between Unicode... Michael Everson
- Re: Relationship between Unicode... Michael Everson
- Re: Relationship between Unicode... Peter Kirk
- Re: Relationship between Unicode... Peter Kirk
- Re: Relationship between Unicode... John Cowan
- Re: Relationship between Unicode... Peter Kirk
- Re: Relationship between Unicode... Philippe Verdy
- Re: Relationship between Unicode... Peter Kirk
- Re: Relationship between Unicode and... Philippe Verdy
- Re: Relationship between Unicode... Asmus Freytag
- Philippe Verdy

