Re: _Unicode_code_page_and_?.net

Asmus Freytag Tue, 30 Jul 2013 13:34:47 -0700

On 7/30/2013 12:26 PM, Doug Ewell wrote:

Buck Golemon <buck at yelp dot com> replied to Richard Wordingham
<richard dot wordingham at ntlworld dot com>:

There are no Unicode code pages.

Just to be pedantic, there are several on Windows.  They encode the
coding form (Unicode codes being best thought of as an assignment of
natural numbers to characters, with certain approved ways of storing
those numbers), e.g. Code pages 1200 (little-endian UTF-16), 1201
(big-endian UTF-16), 12000 (little-endian UTF-32), 12001 (big-endian
UTF-32), 65000 (UTF-7) and 65001 (UTF-8).

I shudder to imagine the circumstances that forced you to learn this
information.

Most Windows .NET developers who are concerned about proper character
handling would know this information existed, though they might not have
the numbers memorized.

Jukka was right, though: Unicode itself does not have code pages.
Rather, at least one vendor has defined some of the Unicode encoding
schemes as if they were code pages. A code page is not, in general, the
same as an encoding scheme.

What is, then, the proper definition of a "code page"?

When Unicode was first introduced, it was seen as the one thing thatwasn't a "code page", because the way the Win32 API associated one ofthe traditional code pages with Unicode (giving rise the "A" and "W"versions of all the APIs).

Later, it was realized that in order to specify what encoding data werein or, for example, to specify a conversion from UTF-7 and UTF-8 toUTF-16 (native encoding scheme) one needed some suitable ID number toidentify the mapping. Well, extending the code page id was the mostnatural way to do that, because, on several platforms, the use of anumerical ID from the IBM code page registry was established practice.

A./


--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell

Re: _Unicode_code_page_and_?.net

Reply via email to