[email protected] wrote:
From: Kenneth Whistler ([email protected])
--------------------------------------------------------------------------------
C. E. Whitehead said:
I've not gone through many character charts though so I can't
really speak as an expert as you all can; sorry I've not gotten
to more; I will try to ...
For people who wish to pursue this issue further, the relevant
information is neatly summarized in the extracted property
data file:
http://www.unicode.org/Public/UNIDATA/extracted/DerivedNumericType.txt
That is what you should look at for efficiency, and
is basically what the UTC would be using for discussion
about this matter.
--Ken
C.E.
Specifically, notice New Tai Lue numbers (U+19D0-U+19DA). We have a sequence of
eleven gc=Nd, that absolutely cannot be arranged so that consecutive code points
have ascending numeric values. I doubt that if Arabic were encoded today that there
would be a full set of Eastern digits, only 4-7, with 0-3 and 8 & 9 sharing
with the regular Arabic digits.
I don't understand what you mean here by the "Arabic" digits. Please
give a code point number example.
This leads me to the conclusion that any formal policy is inviting
definitionally insoluble problems in future encodings - collision
between encoding each character only once, and having a mathematically
pure digit sequence.
That having been said, I have absolutely no problem with reserving a code point
for zero, especially when a script is still in current use by a modern language
community. Even if usage has not been place-value before, it is a simple
adaptation for a script when its user community is exposed to global business,
scientific, and standards communities.
Even though I have no official say, as a script encoder, my vote would be to
simply recommend that decimal digits be sequentially ordered 0-9, and to leave
a reserved code point if the system is in modern use but does not currently use
place-value, and hence have a digit zero. I would explicitly fight against
anything more formal, as it would unnecessarily encumber script encoders who
have to balance a lot more interests than just programmers who won't provide
for an exception branch for non-sequential number arrangements. You've gotta do
it anyway, for CJK and New Tai Lue. I would also question any programmer who
wouldn't allow for mixing of the two blocks of Arabit digits. Just leave the
code open for future additions, just as you do for the sequential/ascending
numbers.
The original proposal also called for leaving empty a couple of code
points after '9' to allow things like New Tai Lue having duplicate '1'
digits to be adjacent to the block of 9 digits. Do you have a problem
with that?
-Van Anderson