Lots of useful and sensible opinions to which to reply, quoted below. I'll try to reply to all of them at once.

In summary then, suggestions which seem to cause considerably less objection than the Ricardo Cancho Niemietz proposal are:
(1) Invent a new DIGIT COMBINING LIGATURE character, which allows you to construct any digit short of infinity
(2) Use ZWJ for the same purpose
(3) Invent two new characters BEGIN NUMERIC and END NUMERIC which force reinterpretation of intervening letters as digits

I infer some confusion among contributors to this thread, some of whom are still talking to me as though I'm only interested in a sort algorithm and nothing else. I thought I'd made it clear that that was merely an insignificant example of a more general overall concept, so I'm going to ignore as irrelevant any suggestions as to how to make a sort work, and focus instead on how to make digits >9 work.

To address Peter's question, "why not just use ZWJ"?, the answer is partly ignorance, and partly concern over how a high-digit-unaware renderer would handle things. It would of course be COMPLETELY DISASTEROUS if the hex string "2F" were to be (correctly, in this scheme) represented as ('2' + '1' + ZWJ + '5') and then rendered as "215" by an unaware renderer. I would also be concerned about ambiguity. I'd want the combined character to be unambiguously a single digit with a computable value. Ignorance came into play also because I just didn't realise you could do that with ZWJ, and I'm not convinced that ('1' + ZWJ + '5') would be universally understood as the hex digit we normally write as F. I guess I see the option of DIGIT COMBINING LIGATURE as maybe a bit like FRACTION SLASH, in that it makes clear that the thing you are composing is a number (a digit, in the case of DIGIT COMBINING LIGATURE, and a fraction in the case of FRACTION SLASH). The existence of DIGIT COMBINING LIGATURE would also give us a place in the code charts where its exact usage algorithm could be specified. For all of these reasons, I don't think that ZWJ fits the bill, though I'd be happy to be convinced otherwise if my reasoning is flawed.

The option of BEGIN NUMERIC and END NUMERIC is also a pretty good one, and has the staggering backward compatibility property that if the hex string "2F" were to be (correctly, in this scheme) represented as (BEGIN NUMERIC + '2' + 'F' + END NUMERIC) it would be rendered as "2F" by an unaware renderer, which is of course, perfect. It does have the disadvantage, however, that there appears to be no way to specify in the existing code charts what the numeric value of a given letter ought to be. For example, how should a hex-aware interpretter interpret (BEGIN NUMERIC + 'j' + END NUMERIC)? This is still a good option, of course, but it would need to supplemented by an additional code chart. This is because everything between BEGIN NUMERIC and END NUMERIC would have different properties. However, there is another reason why I don't think this is the best solution - it's not stateless. From a random point in a string, you'd have to parse backwards and forwards to figure out how to interpret everything. It also creates problems for concatenation and substringing. What's more, it perpetuates the appallingly monstrous meme that the case of hex "2F" is somehow important, when in fact we should be clear that all digits are caseless, and that the apparent case of digits ten to fifteen is merely an artifact.

Finally, there's Mark's observation that there may be some legitimate use for digits >15.

For all of these reasons, my preference is for DIGIT COMBINING LIGATURE.

So it would seem I now have the choice of either contacting Ricardo and suggesting this alternative to him, or arguing against him and then submitting a counter-proposal. I don't know which approach is likely to be most productive.

Jill



> -----Original Message-----
> From: Philippe Verdy [mailto:[EMAIL PROTECTED]]

> Another solution could be a formatting control that overrides the
> interpretation of a sequence of characters as digits rather
> than as letters

> Here I just suggested a few things for your problem of natural sort or
> semantic analysis, but I don't need it and I won't defend
> this idea. It's up
> to you to defned your opinion and make an alternate proposal for WG2.
> Clearly you take your distance from the other very
> problematic proposal to
> encode figure-width letters...

> -----Original Message-----
> From: Peter Kirk [mailto:[EMAIL PROTECTED]]

> So, Jill, could you get much of what you want by encoding your hex
> digits as ligatures between regular digits, e.g. <U+0031, ZWJ,
> U+0030...0035>? They would have the properties of digits, and
> could be
> tailored for collation, as contractions, where you need them. I'm not
> sure why you suggest a special DIGIT COMBINING LIGATURE, why not just
> use ZWJ?

> -----Original Message-----
> From: Mark E. Shoulson [mailto:[EMAIL PROTECTED]]

> If/when Tengwar gets coded, it will have digits for 10 and 11, as it
> uses base-12.
> I would say that to the extent that all this is a
> good idea, we
> shouldn't code lots of different ones (A,B for the computer
> crowd, X,E
> for the Dozenal crowd); let glyph-variants handle it.
> (as an oddball addition: if the maximum base we're really trying to
> support is 16, it might be handy to have a "16" digit as well,
>
> ~mark

Reply via email to