All of this makes sense to me, apart from one or two tiny niggling points...

I confess, I hadn't read ch14.pdf, and I probably should have done. My
fault. But I still believe that there should be something in the
machine-readable code charts themselves that says, of the Roman numerals,
"Don't use these characters - use the the normal Latin letters instead". If
they really are there _SOLELY_ for round trip compliance with East Asian
standards, then, if I wish to put the year MMIII in a web page, I should
_NOT_ use the Roman letters. Furthermore, if I write software to interpret
Roman Numbers, I only need to interpret the Basic Latin letters, not the
Roman ones. My life as a webmaster and programmer is made so much SIMPLER by
not having to use the Roman letters. I would really like it if these, and
every single other character which is "only there for reasons of round trip
compatibility" with something else, were explicity marked in the
machine-readable charts with something meaning "Don't introduce this
character, at all, ever. Don't try to interpret it. Just preserve it, in
case it ever gets turned back to its original character set".

Secondly, I believe that the code charts SHOULD provide machine-readable
information about the hexadecimal values of the letters "A" to "F".
Codepoint FF21, for example, has the property "Hex_Digit". Now, I _could_
parse the textual description in the rest of the line ("FULLWIDTH LATIN
CAPITAL LETTER A"), deduce that this can be replaced by "A", and then use
the ASCII algorithm to convert this to ten ... but it would be SO MUCH NICER
if _every_ character (or range of characters) which had the "Hex_Digit"
property ALSO had a simple, straightforward, lookup table, which immediately
told me that, when interpretted as hex, this symbol means ten.

Thirdly, as Jim pointed out, specialist disciplines should not expect
characters to be cloned all over the place just because they have a
different meaning in their particular discipline. I do agree with this, but
what confuses me is what APPEAR to be the large number of violations of this
rule already present in Unicode. For example:
        U+2212 (minus sign) - an obvious clone of U+002D (hyphen-minus). Who
uses this?
        U+2217 (asterisk operator) - an equally obvious clone of U+002A
(asterisk)
        U+223C (tilde operator) - a clone of U+007E (tilde)
and then there's:
        U+2223 (divides) - hell, that looks to me remarkably like U+007C
(vertical line)

Conversely, there are also things that look different, but mean the same.
For example:
        U+2264 (less than or equal to) - compare with U+2A7D (less than or
slanted equal to)

The last example is interesting (to me) because the difference between the
two seems like a font difference - like the difference between "g" with a
tail and "g" with a loop. In defence of this argument, I point out that the
complementary relation, NOT equal to, has codepoint U+2270, and this is
represented in the code charts as having a slanted equal to, so it OUGHT to
be the complement of U+2A7D. (Unless I've missed it, there appears to be no
"not equal to with horizontal equals" character).

So, yes, I agree with Jim. Let's not have too many duplicates. But I still
have to ask why there are so many already?




-----Original Message (1)-----
From: Doug Ewell [mailto:[EMAIL PROTECTED]
Sent: Saturday, August 16, 2003 9:14 PM
To: Unicode mailing list
Cc: Pim Blokland
Subject: Re: Hexadecimal

Not exactly.  The character U+216E ROMAN NUMERAL FIVE HUNDRED came from
an East Asian double-byte character set, and was carried over into
Unicode for round-tripping reasons.  It is a compatibility equivalent of
U+0044.



AND...
-----Original Message (2)-----
From: Jim Allan [mailto:[EMAIL PROTECTED]
Sent: Saturday, August 16, 2003 9:13 PM
To: [EMAIL PROTECTED]
Subject: Re: Hexadecimal

.... from an explanation as to why Unicode 
coded Roman numerals separately. See 14.3 at 
http://www.unicode.org/versions/Unicode4.0.0/ch14.pdf:

<< Number form characters are encoded solely for compatibility with 
existing standards. >>

Also

<< Roman Numerals. The Roman numerals can be composed of sequences of 
the appropriate Latin letters. Upper- and lowercase variants of the 
Roman numerals through 12, plus L, C, D, and M, have been encoded for 
compatibility with East Asian standards. >>



AND FINALLY...
-----Original Message (3)-----
From: Jim Allan [mailto:[EMAIL PROTECTED]
Sent: Saturday, August 16, 2003 9:13 PM
To: [EMAIL PROTECTED]
Subject: Re: Hexadecimal

Anyone at any time in any descipline can assign a special meaning to a 
Latin letter without waiting for this meaning to be encoded in Unicode 
and should not expect that a clone of the character with that special 
meaning would ever be encoded in Unicode.


Reply via email to