(long argument deleted)
If you are suggesting that the natural sort algorithm won't work without separate codepoints for hex digits then you are of course correct, but that is an argument in favor of hex-digit-characters, not against them.
Ordering natural numbers (whole numbers >= 0)) expressed as
numerals,
usually sequences of digits, can be made to work for any base as
long as
one can write the digits in a convenient way. (That does NOT mean
digit
clones of A-Z.).
If you like you can lobby OS/UI makers (or sort order
implementation
providers in general) to supply a "hackers's option" where A-F and
a-f
are regarded as digits (possibly with some heuristic to determine
which
As are hex digits and which are not). I would have that "off" by
default
though; most users would not find hexadecimal very
uncomfortable, and
indeed surprising. They would
be even more surprised to find some
As not sort like other As (if
there were such clones), looking just the
same. Note that all the existing clones of
A-Z and a-z are ordered just
like the ordinary letters in the default
order of the UCA (and the CTT
of 14651). Likewise the roman number
compatibility characters are
ordered as the letters that constitute them; not
in any numeric
order.
The natural sort algorithm works identically in all radices. There is nothing special about radix ten. Furthermore, the same sort order is guaranteed in all radices. An implementation of a natural sort algorithm does NOT need to "know" the radix. It does not need to guess. It does not need to assume. It does not need to infer. It does not even need to care. All it needs are the functions IsDigit(codepoint) and GetDigitValue(codepoint). The return value of the latter is only required to be defined if the return value of the former is true. That's ALL it needs.
That's one way of doing it. Another is to prehandle the string,
as
explained in annex C.3 of ISO/IEC 14651, and use
suitable weighting
for the characters used in the numerals, and then
just apply the ordinary
collation key calculation (by demand or complete) and compare
the
strings as "usual" (for 14651 or UCA
comparisons). Incidentally, that
annex also considers negative
numerals, and numerals with a fraction
part. It only considers decimal
base in the examples, but there is no
problem in generalising to other
integer bases >= 2, just as long as you
have enough characters to express
the digits (which could in principle be
expressed with multiple characters each, even a
varying number).
(If you use a base greater than decimal, then your right that
decimal
numerals orders in the expected way, having done the
prehandling,
as long as you stick to decimal digits in
the actual strings.)
/kent
k
smime.p7s
Description: S/MIME cryptographic signature