I am surprised and puzzled by the "Unicode 1.0 Name" changes for some of the 
ASCII and Latin-1 control characters that were introduced in the latest beta 
version of the Unicode 3.2 data file (UnicodeData-3.2.0d5.txt):

U+0009  HORIZONTAL TABULATION  ==>  CHARACTER TABULATION
U+000B  VERTICAL TABULATION  ==>  LINE TABULATION
U+001C  FILE SEPARATOR  ==>  INFORMATION SEPARATOR FOUR
U+001D  GROUP SEPARATOR  ==>  INFORMATION SEPARATOR THREE
U+001E  RECORD SEPARATOR  ==>  INFORMATION SEPARATOR TWO
U+001F  UNIT SEPARATOR  ==>  INFORMATION SEPARATOR ONE
U+008B  PARTIAL LINE DOWN  ==>  PARTIAL LINE FORWARD
U+008C  PARTIAL LINE UP  ==>  PARTIAL LINE BACKWARD

Were these "new" names (e.g. CHARACTER TABULATION) really the original 
Unicode 1.0 names?  I don't have my 1.0 book close at hand, but I know that 
they were *not* the names used in 1.1, according to the file "namesall.lst" 
from that version.  (Aha, didn't think anyone still had that dusty old thing 
lying around?)

IMHO, the new names CHARACTER TABULATION and LINE TABULATION are much less 
intuitive than HORIZONTAL TABULATION and VERTICAL TABULATION.  Sometimes you 
even see the abbrevations HT and VT for these two characters.  The new names 
appear to have been invented by someone who imagined a lack of clarity in the 
old names.

I have seen the names IS4, IS3, IS2, and IS1 before, but they do not convey 
the same information as FS, GS, RS, and US.  The latter names are more 
specific.

The "old" names for these six control characters were used as far back as the 
original 1963 version of ASCII, according to Mackenzie (pp. 245-247).

I don't know about the history of U+008B and U+008C, but again it seems 
strange that the "Unicode 1.0 name" for these characters is being changed at 
this late date.

I know this 1.0 name field is not subject to the same rule of "no changes, 
ever" that applies to the regular Character Name field, but why should these 
names be changed at all?

On this same topic, parenthesized abbreviations have been added to the 1.0 
names for U+000A LIFE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE 
RETURN (CR), and U+0085 NEXT LINE (NEL).  Does the addition of these 
abbreviations mean that they are now part of the official 1.0 name, and if 
so, why?  Other characters typically don't have abbreviations as part of 
their names, even if they are as meaningful and as commonly used as these, 
and again it is a change from the 1.0 name we have seen for a decade.

Perhaps I've been checking the beta files a bit TOO carefully.

-Doug Ewell
 Fullerton, California

Reply via email to