Marco, I'll answer as many of your questions as I can, and will cc this to the unicode list (in part to forestall a gazillion "Well, I think maybe X" responses).
--Ken > - When did the Unicode project start, and who started it? The detailed history for this will soon be available on the Unicode website. The short answer is that Joe Becker (Xerox) and Lee Collins (Apple) were highly instrumental in getting the ball rolling on this, and the preliminary work they did, primarily on Han unification, dated from 1987. However, "the Unicode project" had many beginnings -- many points where you could mark a milestone in its early development. And the Unicode Consortium celebrated a number of 10-year anniversaries, starting from 1998 and continuing through last year. > > - Is it true Han Unification was the core of Unicode, and the idea of an > universal encoding come afterwards? The effort by Xerox and Apple to do a Han unification was key to the motivation that eventually led to a serious effort to actually *do* Unicode and then to establish the Unicode Consortium to standardize and promote it. However, the idea of a universal encoding predated that considerably. In some respects the Xerox Character Code Standard (XCCS) was a serious attempt at providing a universal character encoding (although it did not include a unified Han encoding, but only Japanese kanji). XCCS 2.0 (1980) contained, in addition to Japanese kanji: Latin (with IPA), Hiragana, Bopomofo, Katakana, Greek, Cyrillic, Runic, Gothic, Arabic, Hebrew, Georgian, Armenian, Devanagari, Hangul jamo, and a wide variety of symbols. The early Unicoders mined XCCS 2.0 heavily for the early drafts of Unicode 1.0, and always regarded it as the prototype for a universal encoding. Additionally, you have to consider that the beginning of the ISO project for a Multi-octet Universal Character Set (10646) predated the formal establishment of Unicode. Part of the impetus for the serious work to standardize Unicode was, of course, discontent with the then architecture of the early drafts of 10646. > > - Who and when invented the name "Unicode"? This one has a definitive answer: Joe Becker coined the term, for "unique, universal, and uniform character encoding", in 1987. First documented use is in December, 1987. > > - When did the ISO 10646 project start? Unfortunately, the document register for early WG2 documents doesn't have dates for all the early documents, and I don't have all the early documents to check. But... The 4th meeting of WG2 was held in London in February, 1986. The first three meetings were in Geneva, Turin, and London, respectively. That puts the likely timeframe for the Geneva meeting, and the establishment of WG2 by SC2 at about 1984. The *only* project for WG2 was 10646. Some of the older oldtimers on the list may have more exact information about the early WG2 work. > > - When did Unicode and ISO 10646 merge? It wasn't a single date that can be pointed to, like the signing of an armistice. In some respects, Unicode and ISO 10646 are *still* merging, as modifications and amendments to deal with niggling little architectural edge cases are worked out. However the key dates were: January 3, 1991. Incorporation of the Unicode Consortium, which signalled to SC2 that the Unicoders were serious in their intentions. May, 1991. Meeting #19 of WG2 in San Francisco. An ad hoc meeting took place between WG2 members and some Unicoders, which paved the way for the later "merger" of the standards. June, 1991. The 10646 DIS 1 was defeated in its ballotting. This left the only reasonable way forward an architectural compromise with the Unicode Standard, which at that point was in copy edit and about to go to press. June 3, 1991. The date of "10646M proposal draft to merge Unicode and 10646", by Ed Hart. This was a key document in the resulting merger of features. August, 1991. The Geneva WG2 meeting accepted Han unification, combining marks, dropped byte-by-byte restrictions on code values for UCS-2, and accepted Unicode repertoire additions. From that point forward, the overall aspect of what became ISO/IEC 10646-1:1993 was clear. > > - What is the name of the GB and JIS standards that have the same repertoire > as Unicode? GB 13000 has the same repertoire as ISO/IEC 10646-1:1993. JIS X 0221 has the same repertoire as ISO/IEC 10646-1:1993. Those two were effectively national publications of 10646. You can work out the correlations with Unicode from that. GB 18030:2000 in principle has the same repertoire (but different encoding) as ISO/IEC 10646-1:2000, i.e. the same as Unicode 3.0. (But there were small problems in it.) However, the 4-byte form of GB 18030 maps all Unicode code points, assigned or not, so it will (in theory, at least) always have the same repertoire as Unicode. > > - When did Unicode stop to be "16 bits"? (I.e., when were surrogates added?) In terms of publication, with Unicode 2.0 in 1996. However, the decision was taken by the UTC considerably before publication. Amendment 1 to 10646-1 (UTF-16) was proposed to WG2 in WG2 N970, dated 7 February 1994. Mark Davis was the project editor for that amendment. > > - I can't remember the version when some scripts were added: Syriac, Thaana, > Sinhala, Tibetan, Myanmar, Ethiopic, Cherokee, Canadian Syllabics, Ogham, > Runes, Khmer, Mongolian, Yi, Etruscan, Gothic, Deseret, CJK ext. A, CJK ext. > B. See pp. 968-969 of TUS 3.0. Tibetan was in Unicode 1.0, then was removed. It was readded, in a new encoding, in Unicode 2.0. Syriac, Thaana, Sinhala, Myanmar, Ethiopic, Cherokee, Canadian Syllabics, Ogham, Runic, Khmer, Mongolian, Yi, CJK Extension A were added in Unicode 3.0. Old Italic (including Etruscan), Gothic, Deseret, and CJK Extension B were added in Unicode 3.1. > - Roughly, how many ideographs are in modern use in extensions A and B? Not many. I'll refer to the IRG experts to make a guess there. > > - Roughly, when will version 3.2 become official? March, 2002. > > - Roughly, when will the version 4 book be published? Currently still scheduled for March, 2003, but schedule slip is always a possibility on a major publication project like this. > I also have a few non-Unicode questions: > > > - When was ASCII first published and by whom? 1967. By ANSI X3.4. Actually, that was preceded by ASCII per se, the earliest form of which was published as a standard in 1963 by ASA (American Standards Association -- the predecessor to ANSI). But the 1963 version of ASCII had some differences from what we now know as ASCII. > > - What standard was current before ASCII? (BAUDOT, is it?) How many bits did > it use? I'll let the ancient computer and terminal mavens have at that one. There is lots of early character encoding history available on the web -- it's not too hard to find information about it, actually. > > - Did the ASCII standard expire, and when? No, it is still a standard. > > - When was ISO 646 published? 1972. > > - I think that ISO 646 expired. When? No, it is still a standard. The current version is the ISO-646-IRV, revised in 1991. > > - When was ISO 8859 published? It comes in many parts, each of which has a separate publication date. > > - When did the first double-byte encoding appear? Dunno. Maybe one of the IBMers will know when IBM first started implementing double-byte Asian character sets. --Ken