RE: Unicode 1.0 names for control characters

Kent Karlsson Tue, 04 Dec 2001 03:36:49 -0800


None of the C0 and in particular C1 names are really from Unicode 1.0.
They are from ISO/IEC 6429.  Now from ISO/IEC 6429:1992 (Third edition),
rather than the second edition.  Technically the same standard is available
as Fifth edition of ECMA-48, 1991; ftp://ftp.ecma.ch/ecma-st/Ecma-048.pdf.
The earlier editions are as far as I can see no longer available. Most
of the name changes (in 1991!) were apparently done in an attempt to
internationalise that (those) standard(s). "HORIZONTAL TABULATION" would
be vertical if the lines are vertical, etc.  So these name changes DO add
clarity.  The old abbreviations were retained, however.  The mismatch
between ISO/IEC 6429:1992 and Unicode 2.x "names" for these has caused a
bit of a stir when someone was translating the names.


For the ISn-s it appears that a generalisation was desired (I don't know
if this was new in the 1991/1992 editions), with US, RS, GS, FS being
suitable only if such a hierarchy was used.  My reading is that the ISn-s
are not necessarily hierarchical.

                Kind regards
                /kent k


> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of [EMAIL PROTECTED]
> Sent: den 4 december 2001 06:31
> To: [EMAIL PROTECTED]
> Subject: Unicode 1.0 names for control characters
> 
> 
> I am surprised and puzzled by the "Unicode 1.0 Name" changes 
> for some of the 
> ASCII and Latin-1 control characters that were introduced in 
> the latest beta 
> version of the Unicode 3.2 data file (UnicodeData-3.2.0d5.txt):
> 
> U+0009  HORIZONTAL TABULATION  ==>  CHARACTER TABULATION
> U+000B  VERTICAL TABULATION  ==>  LINE TABULATION
> U+001C  FILE SEPARATOR  ==>  INFORMATION SEPARATOR FOUR
> U+001D  GROUP SEPARATOR  ==>  INFORMATION SEPARATOR THREE
> U+001E  RECORD SEPARATOR  ==>  INFORMATION SEPARATOR TWO
> U+001F  UNIT SEPARATOR  ==>  INFORMATION SEPARATOR ONE
> U+008B  PARTIAL LINE DOWN  ==>  PARTIAL LINE FORWARD
> U+008C  PARTIAL LINE UP  ==>  PARTIAL LINE BACKWARD
> 
> Were these "new" names (e.g. CHARACTER TABULATION) really the 
> original 
> Unicode 1.0 names?  I don't have my 1.0 book close at hand, 
> but I know that 
> they were *not* the names used in 1.1, according to the file 
> "namesall.lst" 
> from that version.  (Aha, didn't think anyone still had that 
> dusty old thing 
> lying around?)
> 
> IMHO, the new names CHARACTER TABULATION and LINE TABULATION 
> are much less 
> intuitive than HORIZONTAL TABULATION and VERTICAL TABULATION. 
>  Sometimes you 
> even see the abbrevations HT and VT for these two characters. 
>  The new names 
> appear to have been invented by someone who imagined a lack 
> of clarity in the 
> old names.
> 
> I have seen the names IS4, IS3, IS2, and IS1 before, but they 
> do not convey 
> the same information as FS, GS, RS, and US.  The latter names 
> are more 
> specific.
> 
> The "old" names for these six control characters were used as 
> far back as the 
> original 1963 version of ASCII, according to Mackenzie (pp. 245-247).
> 
> I don't know about the history of U+008B and U+008C, but 
> again it seems 
> strange that the "Unicode 1.0 name" for these characters is 
> being changed at 
> this late date.
> 
> I know this 1.0 name field is not subject to the same rule of 
> "no changes, 
> ever" that applies to the regular Character Name field, but 
> why should these 
> names be changed at all?
> 
> On this same topic, parenthesized abbreviations have been 
> added to the 1.0 
> names for U+000A LIFE FEED (LF), U+000C FORM FEED (FF), 
> U+000D CARRIAGE 
> RETURN (CR), and U+0085 NEXT LINE (NEL).  Does the addition of these 
> abbreviations mean that they are now part of the official 1.0 
> name, and if 
> so, why?  Other characters typically don't have abbreviations 
> as part of 
> their names, even if they are as meaningful and as commonly 
> used as these, 
> and again it is a change from the 1.0 name we have seen for a decade.
> 
> Perhaps I've been checking the beta files a bit TOO carefully.
> 
> -Doug Ewell
>  Fullerton, California
>

RE: Unicode 1.0 names for control characters

Reply via email to