John Cowan beat me to the punch with some of this, but anyway... Pim Blokland <pblokland at planet dot nl> wrote:
>> Basically, thousands of implementations, for decades now, >> have been using ASCII 0x30..0x39, 0x41..0x46, 0x61..0x66 to >> implement hexadecimal numbers. That is also specified in >> more than a few programming language standards and other >> standards. Those characters map to Unicode U+0030..U+0039, >> U+0041..U+0046, U+0061..U+0066. > > That's not a good reason for deciding to not implement something in > the future. > If everybody thought like that, there would never have been a > Unicode. If the founding designers of Unicode had tried to disunify the letters A through F and a through f in this way, so that converters had to map the letter D in "Delta" differently from the D in "U+200D", there would not be a Unicode today. > Besides, your example is proof that the implementation can change; > has to change. Where applications could use 8-bit characters to > store hex digits in the old days, they now have to use 16-bit > characters to keep up with Unicode... This has nothing to do with creating clones of the letters A-F and a-f for use with hexadecimal numbers. >> There is also a HUGE semantic difference between D meaning the >> letter D >> and Roman numeral D meaning 500. > > and those have different code points! So you're saying Jill is > right, right? Not exactly. The character U+216E ROMAN NUMERAL FIVE HUNDRED came from an East Asian double-byte character set, and was carried over into Unicode for round-tripping reasons. It is a compatibility equivalent of U+0044. If such a legacy standard had separate characters for the hexadecimal digits 10 through 15, we'd probably see them in Unicode for the same reason. But none did. > You seem to define "meaning" differently than what we're talking > about here. > In the abbreviation "mm" the two m's have different meanings: the > first is "milli" and the second is "meter". No one is asking to > encode those two letters with different codepoints! > What we're talking about is different general categories, different > numeric values and even, oddly enough, different BiDi categories. > Doesn't that qualify for creating new characters? You could make a case for proposing numeric values of 10 through 15 to be added to U+0044 through U+0049 and U+0064 through U+0069, based on their undeniably widespread use as hexadecimal digits. (No, I don't want to get into a debate about the word "digit" implying "ten.") But the differences in the other categories are less convincing. Latin letters are L& (strong LTR) while the digits are EN (weak LTR), but you may have a difficult time finding a non-pathological context in which European numerals are legitimately used RTL. John is right. Any proposal to disunify standard, common uses of the characters in the Basic Latin block would require unimaginable volumes of existing data to be recoded. (Thanks to UTF-8, even the move from 8-bit character sets to Unicode, which you cited earlier, didn't require this.) See the "Decimal Separator" example in the ISO "Principles and Procedures" document to see how this burden can override other, well-meaning motivations to disunify common characters: http://www.dkuug.dk/JTC1/SC2/WG2/docs/n2352r.pdf > On a related note, can anybody tell me why U+212A Kelvin sign was > put in the Unicode character set? > I have never seen any acknowledgement of this symbol anywhere in the > real world. (That is, using U+212A instead of U+004B.) Round-trip compatibility with East Asian legacy character sets, so nobody could say that data converted to Unicode and back had been "corrupted." > And even the UCD calls it a letter rather than a symbol. I'd expect > if it was put in for completeness, to complement the degrees > Fahrenheit and degree Celcius, it would have had the same category > as those two? The "degrees Celsius" and "degrees Fahrenheit" symbols (U+2103 and U+2109) are imaged as a degree sign followed by a letter. Neither could be considered equivalent to a letter by itself, as U+212A can. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/