[email protected] wrote:
From: Kenneth Whistler ([email protected])
--------------------------------------------------------------------------------
C. E. Whitehead said:
I've not gone through many character charts though so I can't really speak as an expert as you all can; sorry I've not gotten to more; I will try to ...
For people who wish to pursue this issue further, the relevant information is neatly summarized in the extracted property data file: http://www.unicode.org/Public/UNIDATA/extracted/DerivedNumericType.txt That is what you should look at for efficiency, and is basically what the UTC would be using for discussion about this matter. --Ken

C.E.

Specifically, notice New Tai Lue numbers (U+19D0-U+19DA). We have a sequence of 
eleven gc=Nd, that absolutely cannot be arranged so that consecutive code points 
have ascending numeric values. I doubt that if Arabic were encoded today that there 
would be a full set of Eastern digits, only 4-7, with 0-3 and 8 & 9 sharing 
with the regular Arabic digits.

I don't understand what you mean here by the "Arabic" digits. Please give a code point number example. This leads me to the conclusion that any formal policy is inviting definitionally insoluble problems in future encodings - collision between encoding each character only once, and having a mathematically pure digit sequence.

That having been said, I have absolutely no problem with reserving a code point 
for zero, especially when a script is still in current use by a modern language 
community. Even if usage has not been place-value before, it is a simple 
adaptation for a script when its user community is exposed to global business, 
scientific, and standards communities.

Even though I have no official say, as a script encoder, my vote would be to 
simply recommend that decimal digits be sequentially ordered 0-9, and to leave 
a reserved code point if the system is in modern use but does not currently use 
place-value, and hence have a digit zero. I would explicitly fight against 
anything more formal, as it would unnecessarily encumber script encoders who 
have to balance a lot more interests than just programmers who won't provide 
for an exception branch for non-sequential number arrangements. You've gotta do 
it anyway, for CJK and New Tai Lue. I would also question any programmer who 
wouldn't allow for mixing of the two blocks of Arabit digits. Just leave the 
code open for future additions, just as you do for the sequential/ascending 
numbers.


The original proposal also called for leaving empty a couple of code points after '9' to allow things like New Tai Lue having duplicate '1' digits to be adjacent to the block of 9 digits. Do you have a problem with that?
-Van Anderson




Reply via email to