I don't know if William Overington is still a subscriber to this mailing list -- he may have gone away to find (or form) a new group more sympathetic to his "novel" applications of Unicode -- but one of the issues he raised about two weeks ago, right about the time the chromatic-code and precomposed-ligature debates were coming to a head, was an insinuation that Unicode is unduly influenced by large corporate interests.
William based this claim, at least in part, on the $12,000 fee required for "full" membership in the Unicode Consortium, a membership level described on the Unicode Web site as being appropriate for "your company or organization" rather than for individuals. I promised (or maybe threatened) to discuss this issue, from the standpoint of an individual who is interested in Unicode but has yet to join the Consortium due to financial considerations. First of all, the figure that William (or any other individual) really should be looking at is not $12,000 for a full membership, but $600 for a "specialist" membership or $120 for an "individual" membership. (BTW, I would be interested in hearing -- perhaps off-line -- from individuals who hold or have held such memberships, to find out how they felt their memberships benefited them and Unicode.) Second, many good ideas have come from this list, and we have to assume that UTC listens to some of them and can be influenced by some of them. It wouldn't be smart to ignore truly good ideas just because they come from a free mailing list. Some list members have already pointed out that the character repertoire of Unicode/10646 can hardly be said to be reflective of the interests of big business. It is hard to imagine how big business would have benefited from "pushing through" scripts like Tagbanwa or Old Italic, or non-script blocks like Byzantine Musical Symbols. The American Mathematical Society, largely responsible for the big chunk of math symbols added to Unicode 3.1, doesn't seem like a stereotypical "large corporate interest" either. Indeed, if big business interests were at the heart of the Unicode character repertoire, we would probably be seeing a lot more of the precomposed ligatures that William favored so strongly. They would have given Microsoft and Apple a cheap, easy way to claim "support" for ligatures without the additional pain and complexity of performing ligation in a more general, productive way. And in fact, I had originally planned to write this post to debunk the entire notion that corporate interest plays any part at all in the development of Unicode. But there's more to Unicode than its character repertoire; as Ken and others remind us, character properties and technical reports and usage guidelines are what separate Unicode from 10646. And it is here that some corporate influences do appear to seep in, and where the Consortium and UTC may want to be careful to avoid either the appearance or the reality of inextricable corporate tie-ins to Unicode. The precomposed-ligature debate brought forth several responses phrased in terms of, "No, you don't need a precomposed ligature at U+E7xx, or even a ZWJ hint, because Technology Such-and-So will automatically handle it." Technology Such-and-So could be an application like InDesign or FrameMaker, or it could be a font architecture like OpenType or AAT; in the latter case there were frequent discussions of GSUB and GPOS entries and <rlig> tables, as though those were part and parcel of Unicode. In either case, one could reasonably infer that a particular vendor's product or a particular technology is necessary to implement some aspect of Unicode properly, which isn't -- or shouldn't be -- the case. Just today (Sunday), Mark Davis responded to a question about the Unicode Collation Algorithm in part by pointing out how ICU ("a particular implementation of the UCA") solves the problem. The solution was followed shortly with links to ICU-related sites. Now, even though ICU is an open-source library and thus not a money-making product of IBM, and even though ICU may be easy to use and may greatly facilitate the use of UCA, it's still important to realize that neither UCA nor any other aspect of Unicode *requires* ICU. I could roll my own UCA implementation if I wanted to, and assuming it was correct and followed the Unicode Standard and UTS #10, it would be just as legitimate and just as "Unicode" as if I used ICU or any other library or tool. The Unicode FTP site includes sample implementations for algorithms such as UTF-8, SCSU, UCA, and the Bidi algorithm. (UTF-7 was once on this list as well; thankfully, nobody talks about UTF-7 much any more). At some point, the Binary-Ordered Compression for Unicode ("BOCU") algorithm -- implemented in ICU and already mentioned in the SCSU Technical Standard, despite having no official status in Unicode -- may be added to this list as well. It would be highly desirable for Unicode to continue to provide reference implementations rather than directing users to proprietary implementations on other companies' Web sites, to avoid the perception that these Unicode algorithms require the use of corporate products. Some algorithms described in Unicode Technical Reports, such as UTF-EBCDIC and CESU-8, were quite obviously promoted by corporations that would stand to benefit from their adoption. In each case, however, the algorithms (however simple) are completely specified in the TR, without any requirement to rely on (e.g.) IBM's or Oracle's products to get the job done. I've implemented both of these on my own (holding my nose in the case of CESU-8), partly to uphold my belief that it should be possible to implement *anything* in Unicode without relying on any vendor's product. In fact, I would have sworn there was something in the Unicode Standard itself to the effect that Unicode "shall not require the use of any particular vendor's" products or tools. But I looked through the 3.0 book and can't find any such claim. Was this removed sometime in the last 10 years, or am I just imagining it? Corporate references certainly aren't a bad thing in and of themselves. We need to know, and perhaps more importantly *others* need to know, how successful Unicode is in terms of its adoption in important software systems. The more people know about the level of Unicode support in operating systems and applications by Microsoft, Apple, Oracle, etc., the more positively that will reflect on both Unicode *and* the products. But it's also important to know that you, or I, or Fred's Software Solutions and Sports Bar can implement the Unicode Standard just as well as the big boys, without undue dependence on the big boys. That has to be the perception as well as the fact. -Doug Ewell Fullerton, California