On Friday, July 18, 2003 10:18 PM, Michael Everson <[EMAIL PROTECTED]> wrote:

> I *prefer* Unicode to any subset thereof.

Why such preference? Unicode does not define the charset (which are defined by 
ISO10646), but character properties and related algorithms, and (in cooperation with 
ISO10646) their codepoint assignments.

For me, Unicode is NOT a character set, but an encoded character set, with a small but 
important nuance: You need to specify a version after Unicode to indicate the 
character set. So Unicode 4.0 is a character set, and a superset of Unicode 3.2, but 
Unicode alone is not.

If you just look at this definition, you cannot "prefer Unicode to any subset", 
because Unicode is just a name of a collection of standards and a collection of 
character sets and algorithms, and already is a subset of the next version... If you 
cannot support the idea of subsets, then don't use Unicode, or wait that the Unicode 
standard is definitely closed, or permanently consider that is repertoire is now 
closed and no more characters will be added... Of course you would be wrong.

MES-2 or its MES extension is a character set (like most legacy encodings in IANA 
which are also encoded character sets). In practice, nobody can live and implement any 
software without clearly bounded sets of characters. So versioning is absolutely 
necessary to fix these bounds in terms of implementation levels.

-- 
Philippe.
Spams non tolérés: tout message non sollicité sera
rapporté à vos fournisseurs de services Internet.


Reply via email to