All of this makes sense to me, apart from one or two tiny niggling points...
I confess, I hadn't read ch14.pdf, and I probably should have done. My fault. But I still believe that there should be something in the machine-readable code charts themselves that says, of the Roman numerals, "Don't use these characters - use the the normal Latin letters instead". If they really are there _SOLELY_ for round trip compliance with East Asian standards, then, if I wish to put the year MMIII in a web page, I should _NOT_ use the Roman letters. Furthermore, if I write software to interpret Roman Numbers, I only need to interpret the Basic Latin letters, not the Roman ones. My life as a webmaster and programmer is made so much SIMPLER by not having to use the Roman letters. I would really like it if these, and every single other character which is "only there for reasons of round trip compatibility" with something else, were explicity marked in the machine-readable charts with something meaning "Don't introduce this character, at all, ever. Don't try to interpret it. Just preserve it, in case it ever gets turned back to its original character set". Secondly, I believe that the code charts SHOULD provide machine-readable information about the hexadecimal values of the letters "A" to "F". Codepoint FF21, for example, has the property "Hex_Digit". Now, I _could_ parse the textual description in the rest of the line ("FULLWIDTH LATIN CAPITAL LETTER A"), deduce that this can be replaced by "A", and then use the ASCII algorithm to convert this to ten ... but it would be SO MUCH NICER if _every_ character (or range of characters) which had the "Hex_Digit" property ALSO had a simple, straightforward, lookup table, which immediately told me that, when interpretted as hex, this symbol means ten. Thirdly, as Jim pointed out, specialist disciplines should not expect characters to be cloned all over the place just because they have a different meaning in their particular discipline. I do agree with this, but what confuses me is what APPEAR to be the large number of violations of this rule already present in Unicode. For example: U+2212 (minus sign) - an obvious clone of U+002D (hyphen-minus). Who uses this? U+2217 (asterisk operator) - an equally obvious clone of U+002A (asterisk) U+223C (tilde operator) - a clone of U+007E (tilde) and then there's: U+2223 (divides) - hell, that looks to me remarkably like U+007C (vertical line) Conversely, there are also things that look different, but mean the same. For example: U+2264 (less than or equal to) - compare with U+2A7D (less than or slanted equal to) The last example is interesting (to me) because the difference between the two seems like a font difference - like the difference between "g" with a tail and "g" with a loop. In defence of this argument, I point out that the complementary relation, NOT equal to, has codepoint U+2270, and this is represented in the code charts as having a slanted equal to, so it OUGHT to be the complement of U+2A7D. (Unless I've missed it, there appears to be no "not equal to with horizontal equals" character). So, yes, I agree with Jim. Let's not have too many duplicates. But I still have to ask why there are so many already? -----Original Message (1)----- From: Doug Ewell [mailto:[EMAIL PROTECTED] Sent: Saturday, August 16, 2003 9:14 PM To: Unicode mailing list Cc: Pim Blokland Subject: Re: Hexadecimal Not exactly. The character U+216E ROMAN NUMERAL FIVE HUNDRED came from an East Asian double-byte character set, and was carried over into Unicode for round-tripping reasons. It is a compatibility equivalent of U+0044. AND... -----Original Message (2)----- From: Jim Allan [mailto:[EMAIL PROTECTED] Sent: Saturday, August 16, 2003 9:13 PM To: [EMAIL PROTECTED] Subject: Re: Hexadecimal .... from an explanation as to why Unicode coded Roman numerals separately. See 14.3 at http://www.unicode.org/versions/Unicode4.0.0/ch14.pdf: << Number form characters are encoded solely for compatibility with existing standards. >> Also << Roman Numerals. The Roman numerals can be composed of sequences of the appropriate Latin letters. Upper- and lowercase variants of the Roman numerals through 12, plus L, C, D, and M, have been encoded for compatibility with East Asian standards. >> AND FINALLY... -----Original Message (3)----- From: Jim Allan [mailto:[EMAIL PROTECTED] Sent: Saturday, August 16, 2003 9:13 PM To: [EMAIL PROTECTED] Subject: Re: Hexadecimal Anyone at any time in any descipline can assign a special meaning to a Latin letter without waiting for this meaning to be encoded in Unicode and should not expect that a clone of the character with that special meaning would ever be encoded in Unicode.