Markus Scherer wrote as follows. quote
It has been suggested many times to build a database (list, document, XML, ...) where each designated/assigned code point and each character gets its "story": Comments on the glyphs, from what codepage it was inherited, usage comments and examples, alternate names, etc. I am talking about both code points and "characters" on purpose, and I would go a step beyond documenting what's there. All the "characters" that can be represented by a sequence of assigned Unicode characters should be listed, with that sequence (or those sequences), and with further explanation if necessary. end quote Yes, that is a very good point. I have become interested in the languages of the Indian subcontinent from the standpoint of trying to ensure that they can be displayed properly using interactive television using portable font technology, however I am not a linguist and I find it strange that the Unicode Standard does not codify the ligatures which can be produced with the languages of the Indian subcontinent at display time using specific sequences of regular Unicode characters so that someone skilled in the art of font design may design a font from the code charts. Later he wrote. quote Now we just need to - find someone to sponsor this effort technically and with humanpower - squeeze the existing information out of the standard, the mailing lists, FAQs, and of course out of the Unicode veterans before they retire by Unicode 6... end quote Well, how about an approach like Project Gutenberg uses for proofreading transcripts of classic books. If there were a database where people could post items about particular characters and people could read them and either confirm what is said or put some other view or just add some other information, then maybe the database could just sort of gradually become generated over a period of years. How big would that be? About 100 thousand code points at, say, 200 words for each on average at about 5 or 6 characters per word on average with a space following each word would be about 130 megabytes in total. I fully realize that the phrase "sort of gradually" might easily be quoted in a response to this posting, yet if the database facility were there, accessible directly from the web, there may well be many people who would stop by for a while and review what has been entered and add a little more to the database. >PS: Sorry, I am not in a position to volunteer... Well, it could be more of an informal thing. If the facility were set up, then people who are interested could simply visit the web site when they felt like participating. Certainly there might be a core of people who had the ability to throw out rubbish and to convert fragments of text into a good English narrative so that there was some overall structure to it all, yet it does not necessarily need to be as formal and rigid as if it were a commercial project with a time deadline, particularly if the alternative is that it does not get done at all. William Overington 14 March 2003