RE: Hexadecimal digits?

Kent Karlsson Tue, 11 Nov 2003 05:54:20 -0800

Title: Message

(long argument deleted)

If you are suggesting that the natural sort algorithm won't work without separate codepoints for hex digits then you are of course correct, but that is an argument in favor of hex-digit-characters, not against them.

Ordering natural numbers (whole numbers >= 0)) expressed as numerals,

usually sequences of digits, can be made to work for any base as long as

one can write the digits in a convenient way. (That does NOT mean digit

clones of A-Z.).

If you like you can lobby OS/UI makers (or sort order implementation

providers in general) to supply a "hackers's option" where A-F and a-f

are regarded as digits (possibly with some heuristic to determine which

As are hex digits and which are not). I would have that "off" by default

though; most users would not find hexadecimal very uncomfortable, and

indeed surprising. They would be even more surprised to find some

As not sort like other As (if there were such clones), looking just the

same. Note that all the existing clones of A-Z and a-z are ordered just

like the ordinary letters in the default order of the UCA (and the CTT

of 14651). Likewise the roman number compatibility characters are

ordered as the letters that constitute them; not in any numeric order.

The natural sort algorithm works identically in all radices. There is nothing special about radix ten. Furthermore, the same sort order is guaranteed in all radices. An implementation of a natural sort algorithm does NOT need to "know" the radix. It does not need to guess. It does not need to assume. It does not need to infer. It does not even need to care. All it needs are the functions IsDigit(codepoint) and GetDigitValue(codepoint). The return value of the latter is only required to be defined if the return value of the former is true. That's ALL it needs.

That's one way of doing it. Another is to prehandle the string, as

explained in annex C.3 of ISO/IEC 14651, and use suitable weighting

for the characters used in the numerals, and then just apply the ordinary

collation key calculation (by demand or complete) and compare the

strings as "usual" (for 14651 or UCA comparisons). Incidentally, that

annex also considers negative numerals, and numerals with a fraction

part. It only considers decimal base in the examples, but there is no

problem in generalising to other integer bases >= 2, just as long as you

have enough characters to express the digits (which could in principle be

expressed with multiple characters each, even a varying number).

(If you use a base greater than decimal, then your right that decimal

numerals orders in the expected way, having done the prehandling,

as long as you stick to decimal digits in the actual strings.)

/kent k

smime.p7s
Description: S/MIME cryptographic signature

RE: Hexadecimal digits?

Reply via email to