2. The Unicode code charts are (deliberately) vague about U+0080, U+0081,
and U+0099. All other C1 control codes have aliases to the ISO 6429
set of control functions, but in ISO 6429, those three control codes don't
have any assigned functions (or names).

On 10/5/2015 3:57 PM, Philippe Verdy wrote:
Also the aliases for C1 controls were formally registered in 1983 only for the two ranges U+0084..U+0097 and U+009B..U+009F for ISO 6429.

If I may, I would appreciate another history lesson:
In ISO 2022 / 6429 land, it is apparent that the C1 controls are mainly aliases for ESC 4/0 - 5/15. ( @ through _ ) This might vary depending on what is loaded into the C1 register, but overall, it just seems like saving one byte.

Why was C1 invented in the first place?

And, why did Unicode deem it necessary to replicate the C1 block at 0x80-0x9F, when all of the control characters (codes) were equally reachable via ESC 4/0 - 5/15? I understand why it is desirable to align U+0000 - U+007F with ASCII, and maybe even U+0000 - U+00FF with Latin-1 (ISO-8859-1). But maybe Windows-1252, MacRoman, and all the other non-ISO-standardized 8-bit encodings got this much right: duplicating control codes is basically a waste of very precious character code real estate.

Sean

PS I was not able to turn up ISO 6429:1983, but I did find ECMA-48, 4th Ed., December 1986, which has the following text:
***
5.4 Elements of the C1 Set
These control functions are represented:
- In a 7-bit code by 2-character escape sequences of the form ESC Fe, where ESC is represented by bit combination 01/11 and Fe is represented by a bit combination from 04/00 to 05/15.
- In an 8-bit code by bit combinations from 08/00 to 09/15.
***

This text is seemingly repeated in many analogous standards ca. ~1974 - ~1992.

PPS I happen to have a copy of ANSI X3.41-1974 "American National Standard Code Extension Techniques for Use with the 7-Bit Coded Character Set of [ASCII]". The invention/existence of C1 goes back to this time, as does the use of ESC Fe to invoke C1 characters in a 7-bit code, and 0x80-0x9F to invoke C1 characters in an 8-bit code. (See, in particular, Clauses 5.3.3.1 and 5.3.6). In particular, Clause 7.3.1.2 says: "The use of ESC Fe sequence in an 8-bit environment is contrary to the intention of this standard but, should they occur, their meaning is the same as in the 7-bit environment."

I can appreciate why it was desirable to "fold" C1 in an 8-bit environment into a 7-bit environment with ESC Fe. (If, in fact, that was the direction of standardization: invent a new thing and then devise a coding to express the new thing in the old thing.) It is less obvious why Unicode adopted C1, however, when the trend was to jettison the 94-character Tetris block assignments in favor of a wide-open field for character assignment. Except for the trend in Unicode to "avoid assigning characters when explicitly asked, unless someone implements them without asking, and the implementation catches on, and then just assign the whole lot of them, even when they overlap with existing assignments, and then invent composite characters, which further compound the possible overlapping combinations". 😉

Reply via email to