Re: Encoding of old compatibility characters

Ken Whistler Mon, 27 Mar 2017 10:22:35 -0700


On 3/27/2017 7:44 AM, Charlotte Buff wrote:

Now, one of Unicode’s declared goals is to enable round-tripcompatibility with legacy encodings. We’ve accumulated a lot of weirdstuff over the years in the pursuit of this goal. So it would benatural to assume that the unencoded characters from the mentionedsets [ATASCII, PETSCII, the ZX80 set, the Atari ST set, and the TIcalculator sets] would also be eligible for inclusion in the UCS.


Actually, it wouldn't be.

The original goal was to ensure round-trip compatibility with*important* legacy character encodings, *for which there was a need toconvert legacy data, and/or an ongoing need to representation of textfor interchange*.

From Unicode 1.0: "The Unicode standard includes the character contentof all major International Standards approved and published beforeDecember 31, 1990... [long list ensues] ... and from various industrystandards in common use (such as code pages and character sets fromAdobe, Apple, IBM, Lotus, Microsoft, WordPerfect, Xerox and others)."

Even as long ago as 1990, artifacts such as the Atari ST set wereconsidered obsolete antiquities, and did not rise to the level of thekind of character listings that we considered when pulling together theoriginal repertoire.

And there are several observations to be made about the "weird stuff" wehave accumulated over the years in the pursuit of compatibility. A lotof stuff that was made up out of whole cloth, rather than beingjustified by existing, implemented character sets used in informationinterchange at the time, came from the 1991/1992 merger process betweenthe Unicode Standard and the ISO/IEC 10646 drafts. That's how Unicodeacquired blocks full of Arabic ligatures, for example.

Other, subsequent additions of small (or even largish) sets of oddball"characters" that don't fit the prototypical sets of characters forscripts and/or well-behaved punctuation and symbols, typically have comein with argued cases for the continued need in current text interchange,for complete coverage. For example, that is how we ended up filling outZapf dingbats with some glyph pieces that had been omitted in theinitial repertoire for that block. More recently, of course, thecontinued importance of Wingdings and Webdings font encodings on theWindows platform led the UTC to filling out the set of graphicaldingbats to cover those sets. And of course, we first started down theemoji track because of the need to interchange text originating fromwidely deployed Japanese carrier sets implemented as extensions toShift-JIS.

I don't think the early calculator character sets, or sets for the AtariST and similar early consumer computer electronics fit the bill,precisely because there isn't a real text data interchange case to bemade for character encoding. Many of the elements you have mentioned,for example, like the inverse/negative squared versions of letters andsymbols, are simply idiosyncratic aspects of the UI for the devices, inan era when font generators were hard coded and very primitive indeed.

Documenting these early uses, and pointing out parts of the UI andcharacter usage that aren't part of the character repertoire in theUnicode Standard seems an interesting pursuit to me. But absent a truetextual data interchange issue for these long-gone, obsolete devices, Idon't really see a case to be made for spending time in the UTC defininga bunch of compatibility characters to encode for them.


--Ken

Re: Encoding of old compatibility characters

Reply via email to