On Tue, Feb 05, 2002 at 01:27:49PM +0900, Gaspar Sinai wrote: > Talking about characters: I think bi-di should not be in > Unicode Standard because it is not a character. > It is an algorithm.
Why would that fix the problem? Then everyone would just choose their own algorithim, and instead of a couple different renderings, with the ability to check it against the standard, you'd get a thousand, each equally correct. > I feel sorry for interrupting in the "Let's praise and > celebrate Unicode" mood of this mailing list. Head over to the POSIX list and start complaining about the maldesign of fixed width buffers and see how long they listen to you. This is the Unicode list - that means people here are interested in working in Unicode. The BIDI algorithm is frozen - seriously changing it would break way too many implementations to be considered. (Note that gets - so broken that the GNU linker will complain if you use it - is a standard part of a POSIX system. There's no evidence that the BIDI algorithm is anywhere near that broken.) > I wish there was another world character standard besides > Unicode and not only half-hearted attempts like bytext. Unicode has its problems, but it works. It takes a lot of work to build to create a character standard, and it's hard to find a bunch of people to work on a project to go against the industry leader without serious problems in that leader. Anyone on this list could produce a better Unicode than Unicode, just like any Unix person could produce a better Unix than Unix. But it's not going to be enough better that it's worth losing backward compatibility, and any serious changes will never get consensus. So a standard is entrenched. (Cf. Fortran, Unix, ASCII). The result is that you get the bizarre ideas of individuals, like Bytext and Rosetta, never really fully fleshed out or implemented, and the Japanese-centric "universal" charsets like Tron and ISO-2022-INT-1. (I've heard rumors of other cultures producing "universal" charsets that "fix" Unicode's bugs for their language only. I'm not familiar with them, though.) The first are too quirky to be useful. (Bytext's author compared it lambda calculus and Unicode to arithmetic. In some ways, it's an accurate comparison; while Church numbers are interesting, every real system directly supports arithmetic on binary numbers, as that's much more efficent and simple.) The later don't support non-Japanese scripts as well as Unicode, and don't sell well to non-Japanese audiences. ISO-2022-INT-1 supports 7 94x94 character charsets for CJK audiences (roughly 60,000 characters before any sort of unification), and ISO-8859-1 and ISO-8859-7*, leaving the Russians, the Hungarians, the Arabs and many more out in the cold. To the best of my knowledge, there's not enough information avalaible to the non-Japanese speaker to implement Tron. (Not only is information available about ISO-10646-1/Unicode in more languages, English is also more generally known than Japanese.) Again to the best of knowledge, there has no improvements to non-CJK sections of Tron (besides Braille) after the Unicode 2.0, whereas Unicode has continually updated to keep up - Unicode 3.2 handles more archaic documents, more languages and more scripts than ever before, as well as better linguistic and mathematic support. In all honesty, I only care the CJK parts of Unicode in that they convince people to implement Unicode so I can play with the Latin, Greek, Cherokee, IPA and Mathematics sections. Encoding 50,000 more Han ideographs produces a lot less interest in me than encoding Gothic. A lot of the audience is the same - "who cares about ancient Greek? Will it handle the Dhammapada in Pali without error?". It appears the serious attempts to topple Unicode - Tron, for example - forgot that, and looked to their own issues, leaving Unicode to be the only real attempt to serve the needs of everyone and hence victor. * It seems there's disprepancy in what ISO-2022-INT-1 encodes. Another source adds ISO-8859-2 and ISO-8859-5, still leaving the Arabs, the residents of the Baltic states, Hindi and a lot of the rest of the world out. -- David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber) Pointless website: http://dvdeug.dhis.org What we've got is a blue-light special on truth. It's the hottest thing with the youth. -- Information Society, "Peace and Love, Inc."