If the semantic difference between (for example) uppercase D and mathemematical bold uppercase D was considered sufficiently great so as to require a new codepoint, then I am tempted to wonder if the same might be considered true of hexadecimal digits.
 
What I mean is, it seems (to me) that there is a HUGE semantic difference between the hexadecimal digit thirteen, and the letter D. The former is a numerical digit. The latter is a letter of the Basic Latin alphabet. The symbol in the middle of the word "hides" is semantically a letter. It forms part of a word. Whereas the symbol in the middle of the number "3AD29" (base 16)  is semantically a digit, having the numerical value thirteen. It forms part of a number (the number 240937, in fact). So far as I can see, every single character in the "3AD29" string should be in general category N* (either Nd or Nl).
 
Sure, you can tell them apart by context, in most circumstances, in the same way that you can tell the difference between a hyphen and a minus sign by context, but since the meanings are so clearly distinct, I wonder if there is a case for distinguishing hex digits from letters without requiring context.
 
A few years back, when I was programming in assembler, the particular assembler I was using (can't remember which one, sorry) assumed that all numbers were hexadecimal - a reasonable assumption, given what it did. However - if the first digit was greater than nine, you had to supply a leading zero, so that the assembler could distinguish it from an identifier. If hex digits were characters distinct from letters, it wouldn't have needed to make that rule.
 
I notice that there are Unicode properties "Hex_Digit" and "ASCII_Hex_Digit" which some Unicode characters possess. I may have missed it, but what I don't see in the charts is a mapping from characters having these property to the digit value that they represent. Is it assumed that the number of characters having the "Hex_Digit" properties is so small that implementation is trivial? That everyone knows it? Or have I just missed the mapping by looking in the wrong place?
 
And incidently, from a mathematician's point of view - or indeed a programmer's point of view - there is really no semantic difference between uppercase hex digit thirteen and lowercase hex digit thirteen, any more than there is a semantic difference between uppercase hex digit three and lowercase hex digit three. It is only because we re-use the letters of the alphabet to fill this semantic void that the artifical distinction arises. (I think the Romans had this problem. Unicode does provide upper and lowercase variants of Roman numbers, but then again, all Roman numbers are cased (apart from the really big ones) so maybe that's irrelevant).
 
Thoughts anyone?
 
Jill
 
 

Reply via email to