On 7/30/2013 12:26 PM, Doug Ewell wrote:
Buck Golemon <buck at yelp dot com> replied to Richard Wordingham
<richard dot wordingham at ntlworld dot com>:

There are no Unicode code pages.
Just to be pedantic, there are several on Windows.  They encode the
coding form (Unicode codes being best thought of as an assignment of
natural numbers to characters, with certain approved ways of storing
those numbers), e.g. Code pages 1200 (little-endian UTF-16), 1201
(big-endian UTF-16), 12000 (little-endian UTF-32), 12001 (big-endian
UTF-32), 65000 (UTF-7) and 65001 (UTF-8).
I shudder to imagine the circumstances that forced you to learn this
information.
Most Windows .NET developers who are concerned about proper character
handling would know this information existed, though they might not have
the numbers memorized.

Jukka was right, though: Unicode itself does not have code pages.
Rather, at least one vendor has defined some of the Unicode encoding
schemes as if they were code pages. A code page is not, in general, the
same as an encoding scheme.
What is, then, the proper definition of a "code page"?

When Unicode was first introduced, it was seen as the one thing that wasn't a "code page", because the way the Win32 API associated one of the traditional code pages with Unicode (giving rise the "A" and "W" versions of all the APIs).

Later, it was realized that in order to specify what encoding data were in or, for example, to specify a conversion from UTF-7 and UTF-8 to UTF-16 (native encoding scheme) one needed some suitable ID number to identify the mapping. Well, extending the code page id was the most natural way to do that, because, on several platforms, the use of a numerical ID from the IBM code page registry was established practice.

A./





--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell ­






Reply via email to