Probably a dumb question, but how come nobody's invented "UTF-24" yet? I just made that up, it's not an official standard, but one could easily define UTF-24 as UTF-32 with the most-significant byte (which is always zero) removed, hence all characters are stored in exactly three bytes and all are treated equally. You could have UTF-24LE and UTF-24BE variants, and even UTF-24 BOMs. Of course, I'm not suggesting this is a particularly brilliant idea, but I just wonder why no-one's suggested it before.

(And then of course, there's UTF-21, in which blocks of 21 bits are concatenated, so that eight Unicode characters will be stored in every 21 bytes - and not to mention UTF-20.087462841250343, in which a plain text document is simply regarded as one very large integer expressed in radix 1114112, and whose UTF-20.087462841250343 representation is simply that number expressed in binary. But now I'm getting /very/ silly - please don't take any of this seriously.) :-)

The "UTF-24" thing seems a reasonably sensible question though. Is it just that we don't like it because some processors have alignment restrictions or something?

Arcane Jill


-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Marcin 'Qrczak' Kowalczyk Sent: 02 December 2004 16:59 To: [EMAIL PROTECTED] Subject: Re: Nicest UTF

"Arcane Jill" <[EMAIL PROTECTED]> writes:
Oh for a chip with 21-bit wide registers!
Not 21-bit but 20.087462841250343-bit :-)
--
__("< Marcin Kowalczyk
\__/ [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/





Reply via email to