On 3/27/2017 7:44 AM, Charlotte Buff wrote:
Now, one of Unicode’s declared goals is to enable round-trip compatibility with legacy encodings. We’ve accumulated a lot of weird stuff over the years in the pursuit of this goal. So it would be natural to assume that the unencoded characters from the mentioned sets [ATASCII, PETSCII, the ZX80 set, the Atari ST set, and the TI calculator sets] would also be eligible for inclusion in the UCS.

Actually, it wouldn't be.

The original goal was to ensure round-trip compatibility with *important* legacy character encodings, *for which there was a need to convert legacy data, and/or an ongoing need to representation of text for interchange*.

From Unicode 1.0: "The Unicode standard includes the character content of all major International Standards approved and published before December 31, 1990... [long list ensues] ... and from various industry standards in common use (such as code pages and character sets from Adobe, Apple, IBM, Lotus, Microsoft, WordPerfect, Xerox and others)."

Even as long ago as 1990, artifacts such as the Atari ST set were considered obsolete antiquities, and did not rise to the level of the kind of character listings that we considered when pulling together the original repertoire.

And there are several observations to be made about the "weird stuff" we have accumulated over the years in the pursuit of compatibility. A lot of stuff that was made up out of whole cloth, rather than being justified by existing, implemented character sets used in information interchange at the time, came from the 1991/1992 merger process between the Unicode Standard and the ISO/IEC 10646 drafts. That's how Unicode acquired blocks full of Arabic ligatures, for example.

Other, subsequent additions of small (or even largish) sets of oddball "characters" that don't fit the prototypical sets of characters for scripts and/or well-behaved punctuation and symbols, typically have come in with argued cases for the continued need in current text interchange, for complete coverage. For example, that is how we ended up filling out Zapf dingbats with some glyph pieces that had been omitted in the initial repertoire for that block. More recently, of course, the continued importance of Wingdings and Webdings font encodings on the Windows platform led the UTC to filling out the set of graphical dingbats to cover those sets. And of course, we first started down the emoji track because of the need to interchange text originating from widely deployed Japanese carrier sets implemented as extensions to Shift-JIS.

I don't think the early calculator character sets, or sets for the Atari ST and similar early consumer computer electronics fit the bill, precisely because there isn't a real text data interchange case to be made for character encoding. Many of the elements you have mentioned, for example, like the inverse/negative squared versions of letters and symbols, are simply idiosyncratic aspects of the UI for the devices, in an era when font generators were hard coded and very primitive indeed.

Documenting these early uses, and pointing out parts of the UI and character usage that aren't part of the character repertoire in the Unicode Standard seems an interesting pursuit to me. But absent a true textual data interchange issue for these long-gone, obsolete devices, I don't really see a case to be made for spending time in the UTC defining a bunch of compatibility characters to encode for them.

--Ken

Reply via email to