----- Original Message ----- From: Addison Phillips [wM]
To: pragati ; [EMAIL PROTECTED]
Sent: Thursday, November 25, 2004 6:21 PM
Subject: RE: Shift-JIS conversion.



Dear Pragati,

You can write your own conversion, of course. The mapping tables of Unicode->SJIS are readily availably. You should note that there are several vendor specific variations in the mapping tables. Notably Microsoft code page 932, which is often called Shift-JIS, has more characters in its character set than "standard" Shift-JIS (and it maps a few characters differently too...)

The important fact that you should be aware of: Shift-JIS is an encoding of the JIS X0208 character set.
UTF-8 is an encoding of the Unicode character set.

More exactly, UTF-8 is an encoding of the ISO/IEC 10646 character set (the character set here designates the set of characters, i.e. the repertoire that describes characters with a name and a representative glyph and some annotations, to which a numeric code is then assigned, the code point. The char. set is


Unicode by itself is not a character set, only an implementation of the ISO/IEC 10646 character set, in which which the Unicode standard assign additional properties and behavior for characters allocated in ISO/IEC 10646. The link between Unicode and ISO/IEC 10646 is the assigned code point and character name, which are now common between the two standards.

Of course the Unicode technical commitee may propose new assignments to ISO/IEC, but this is still ISO/IEC 10646 which maintains the repertoire and approves or rejects the proposals. A new character proposal may be rejected by Unicode, but accepted by ISO/IEC 10646; and it is the ISO/IEC 10646 vote that prevails (so Unicode will have to accept this ISO/IEC decision, even if it has voted against it in a prior decision).

On the opposite, ISO/IEC 10646 says nothing about character properties or behaviors. It can suggest, but the Unicode committee will make its own decisions for the character properties and behavior that it chooses to standardize. If Unicode wants to make its decisions widely accepted by all users of the ISO/IEC 10646 repertoire, it's in the interest of Unicode of trying to make these decisions in conformance with other existing national or international standards, to maximize interoperability of national or international applications based on the ISO/IEC 10646 character set.




Reply via email to