Re: Unicode and Security

David Starner Tue, 05 Feb 2002 00:32:20 -0800

On Tue, Feb 05, 2002 at 01:27:49PM +0900, Gaspar Sinai wrote:
> Talking about characters: I think  bi-di should not be in
> Unicode  Standard because it is not a character.
> It is an algorithm.


Why would that fix the problem? Then everyone would just choose their
own algorithim, and instead of a couple different renderings, with the
ability to check it against the standard, you'd get a thousand, each
equally correct.

> I feel sorry for interrupting in the "Let's praise and
> celebrate Unicode" mood of this mailing list.

Head over to the POSIX list and start complaining about the maldesign of
fixed width buffers and see how long they listen to you. This is the
Unicode list - that means people here are interested in working in
Unicode. The BIDI algorithm is frozen - seriously changing it would
break way too many implementations to be considered. (Note that gets -
so broken that the GNU linker will complain if you use it - is a standard
part of a POSIX system. There's no evidence that the BIDI algorithm is
anywhere near that broken.)

> I wish there was another world character standard besides
> Unicode and not only  half-hearted attempts like bytext.

Unicode has its problems, but it works. It takes a lot of work to build
to create a character standard, and it's hard to find a bunch of people
to work on a project to go against the industry leader without serious
problems in that leader. Anyone on this list could produce a better
Unicode than Unicode, just like any Unix person could produce a better
Unix than Unix. But it's not going to be enough better that it's worth
losing backward compatibility, and any serious changes will never get
consensus. So a standard is entrenched. (Cf. Fortran, Unix, ASCII).

The result is that you get the bizarre ideas of individuals, like Bytext
and Rosetta, never really fully fleshed out or implemented, and the
Japanese-centric "universal" charsets like Tron and ISO-2022-INT-1.
(I've heard rumors of other cultures producing "universal" charsets that
"fix" Unicode's bugs for their language only. I'm not familiar with
them, though.) 

The first are too quirky to be useful.  (Bytext's author compared it
lambda calculus and Unicode to arithmetic.  In some ways, it's an
accurate comparison; while Church numbers are interesting, every real
system directly supports arithmetic on binary numbers, as that's much
more efficent and simple.) 

The later don't support non-Japanese scripts as well as Unicode, and
don't sell well to non-Japanese audiences. ISO-2022-INT-1 supports 7
94x94 character charsets for CJK audiences (roughly 60,000 characters
before any sort of unification), and ISO-8859-1 and ISO-8859-7*, leaving
the Russians, the Hungarians, the Arabs and many more out in the cold.
To the best of my knowledge, there's not enough information avalaible to
the non-Japanese speaker to implement Tron. (Not only is information
available about ISO-10646-1/Unicode in more languages, English is also
more generally known than Japanese.) Again to the best of knowledge,
there has no improvements to non-CJK sections of Tron (besides Braille)
after the Unicode 2.0, whereas Unicode has continually updated to keep
up - Unicode 3.2 handles more archaic documents, more languages and more
scripts than ever before, as well as better linguistic and mathematic
support. 

In all honesty, I only care the CJK parts of Unicode in that they
convince people to implement Unicode so I can play with the Latin,
Greek, Cherokee, IPA and Mathematics sections. Encoding 50,000 more Han
ideographs produces a lot less interest in me than encoding Gothic. A
lot of the audience is the same - "who cares about ancient Greek? Will
it handle the Dhammapada in Pali without error?". It appears the serious
attempts to topple Unicode - Tron, for example - forgot that, and looked
to their own issues, leaving Unicode to be the only real attempt to
serve the needs of everyone and hence victor.

* It seems there's disprepancy in what ISO-2022-INT-1 encodes. Another
source adds ISO-8859-2 and ISO-8859-5, still leaving the Arabs, the
residents of the Baltic states, Hindi and a lot of the rest of the world
out.

-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, "Peace and Love, Inc."

Re: Unicode and Security

Reply via email to