From: Ken Whistler <
[email protected]>
Date: Sun, August 16, 2015 5:15 pm
To:
[email protected]
Cc:
[email protected]
Alex,
As far as I know, APL
definitely predates the
Unicode consortium. Do you
think that The Consortium
possibly overlooked the
pre-existing under-bar
character set?
The answer to that is no.
Initially, Unicode 1.0 attempted to
punt the entire APL complex
functional symbol
problem by encoding U+2300 APL
COMPOSE OPERATOR.
The concept was essentially that any
of the combined symbols -- the old
rack of stuff that people complained
about entering with
symbol/backspace/symbol
keying, could simply be represented
as sequences of existing symbols.
Think of 2300 as an early attempt to
introduce an APL "script"-specific
conjunct-forming virama, a la
much-later artificially introduced
script-specific
joiners. Cf. U+2D7F TIFINAGH
CONSONANT JOINER.
But U+2300 APL COMPOSE OPERATOR was
an innovation that failed.
It was fiercely opposed *by the APL
community*, who wanted it
out of 10646 and replaced with a
explicit list of pre-formed complex
functional symbols. Presumably for
the same reason we are talking
about here now: essentially that
each symbol had to work as a
"character",
and in an APL context that meant
fixed width and the same data size
as
all the other characters.
The removal of Unicode 1.0 U+2300
APL COMPOSE OPERATOR is documented
in Unicode 1.1 as of 1993:
http://www.unicode.org/versions/Unicode1.1.0/
(see page 3)
The addition of APL functional
symbols is documented in Section
5.4.8, pp. 39-41.
The exact repertoire that ended up
encoded in the standard was the
result of meetings
between some Unicode representatives
and some folks from the APL
community. The names
escape me at the moment, although it
might be possible to recover some
information eventually.
(Documentation regarding Unicode
events in late 1991 is
sparse these days.) At any rate the
agreed upon additional repertoire is
probably
that included in:
X3L2/92-035, Unicode Request for
Additional Characters in ISO/IEC
10646-1.2.
And the rest of the consequences and
processing can be dug out of the
ballot history record
for the voting on 10646 in 1992.
At any rate, a propos *this*
discussion, we agreed that the
repertoire would cover
all the complex functional symbols,
but *not* the letters
with underscores. And it is not that
they were simply overlooked.
How do I know? Well, first, there
were APL specialists involved in
coming up
(and promoting) the repertoire that
was carried into the 10646 balloting
at
the time. It isn't as if a bunch of
ignorant Unicoders just grabbed one
APL
book off the shelf and coded up the
table, not noticing that some stuff
was
missing.
Second, the text that is currently
in the core specification about this
issue,
to wit:
" ... All other APL extensions can e
encoded by composition of other
Unicode characters. For example, the
APL symbol a underbar can be
represented by U+0061 LATIN SMALL
LETTER A + U+0332 COMBINING LOW
LINE."
(Unicode 7.0, Section 22.7, p. 772)
is *ancient* text. It was first
printed on p. 6-83 of Unicode 2.0 in
1996,
with exactly the same wording. And
the only reason it took until 1996
to appear,
instead of 1993, was that the
editing of Unicode 2.0 and its code
charts
was such a massive task at the time.
So the clear intent in *1993* was to
represent any APL letter with
underbar
as a combining character sequence --
as noted. The only problem I see
there
is that the text in the core spec
mistakenly used U+0061 (the
lowercase "a")
instead of U+0041 (the uppercase
"A") for the exemplification.
Third, I can attest that at least
some of us at the time -- as early
as 1989, had
printed copies of IBM EBCDIC code
page 293 for APL, which had
the EBCDIC uppercase Latin letters
with underscores (italicized, by the
way),
together with the regular EBCDIC
upper and lowercase letters. [Dates
from 1984.]
*And* IBM EBCDIC code page 310 for
APL, which dropped all the
regular upper- and lowercase letters
but added more symbols.
*And* IBM PC code page 907 (with the
underscored uppercase Latin
letters) and PC code page 909 (CP437
hacked up for APL, without the
underscored uppercase Latin
letters), which was quickly
superseded by
PC code page 910, which also did not
use the uppercase Latin letters
with underscores.
So yeah, we knew about these.
Encoding them as combining character
sequences instead of as atomic
characters was a deliberate decision
taken in 1992. And that decision
made it through both UTC and
international balloting for
publication in 1993.
--Ken