If the semantic
difference between (for example) uppercase D and mathemematical bold
uppercase D was considered sufficiently great so as to require a new
codepoint, then I am tempted to wonder if the same might be considered true of
hexadecimal digits.
What I mean is, it
seems (to me) that there is a HUGE semantic difference between the hexadecimal
digit thirteen, and the letter D. The former is a numerical digit. The latter is
a letter of the Basic Latin alphabet. The symbol in the middle of the word
"hides" is semantically a letter. It forms part of a word. Whereas the
symbol in the middle of the number "3AD29" (base 16) is semantically a
digit, having the numerical value thirteen. It forms part of a number (the
number 240937, in fact). So far as I can see, every single character in the
"3AD29" string should be in general category N* (either Nd or
Nl).
Sure, you
can tell them apart by context, in most circumstances, in the same way
that you can tell the difference between a hyphen and a minus sign by context,
but since the meanings are so clearly distinct, I wonder if there is a case for
distinguishing hex digits from letters without requiring
context.
A few years back,
when I was programming in assembler, the particular assembler I was using (can't
remember which one, sorry) assumed that all numbers were hexadecimal - a
reasonable assumption, given what it did. However - if the first digit was
greater than nine, you had to supply a leading zero, so that the assembler could
distinguish it from an identifier. If hex digits were characters distinct from
letters, it wouldn't have needed to make that rule.
I notice that there
are Unicode properties "Hex_Digit" and "ASCII_Hex_Digit" which some Unicode
characters possess. I may have missed it, but what I don't see in the charts is
a mapping from characters having these property to the digit value that they
represent. Is it assumed that the number of characters having the "Hex_Digit"
properties is so small that implementation is trivial? That everyone knows it?
Or have I just missed the mapping by looking in the wrong
place?
And incidently, from
a mathematician's point of view - or indeed a programmer's point of view - there
is really no semantic difference between uppercase hex digit thirteen
and lowercase hex digit thirteen, any more than there is a semantic difference
between uppercase hex digit three and lowercase hex digit three. It is only
because we re-use the letters of the alphabet to fill this semantic void that
the artifical distinction arises. (I think the Romans had this
problem. Unicode does provide upper and lowercase variants of Roman
numbers, but then again, all Roman numbers are cased (apart from the
really big ones) so maybe that's irrelevant).
Thoughts
anyone?
Jill