Mark Davis ☕ wrote:
Mark
/— Il meglio è l’inimico del bene —/
On Thu, Jul 29, 2010 at 05:57, Philippe Verdy <verd...@wanadoo.fr
<mailto:verd...@wanadoo.fr>> wrote:
"Martin J. Dürst" <due...@it.aoyama.ac.jp
<mailto:due...@it.aoyama.ac.jp>> wrote:
>
> On 2010/07/29 13:33, karl williamson wrote:
> > Asmus Freytag wrote:
> >> On 7/25/2010 6:05 PM, Martin J. Dürst wrote:
>
> >>> Well, there actually is such a script, namely Han. The digits
(一、
> >>> 二、三、四、五、六、七、八、九、〇) are used both as letters
and as
> >>> decimal place-value digits, and they are scattered widely, and of
> >>> course there are is a lot of modern living practice.
>
> >> The situation is worse than you indicate, because the same
characters
> >> are also used as elements in a system that doesn't use
place-value,
> >> but uses special characters to show powers of 10.
>
> No. Sequences of numeric Kanji are also used in names and word-plays,
> and as sequences of individual small numbers.
(1) Existing exception :
There's one example of a digit which has a numeric type = decimal, AND
is encoded in a "scattered" way:
19DA;6618;᧚;New Tai Lue Tham Digit One;Nd;0;L;...;1;1;1;N
The other decimal nine digits for the Tham variant of the New Tai Lue
digits are borrowed from another sequence of decimal digits, starting
at U+19D0 (for digit zero) with the exception of U+19D1 which is
replaced (for digit one). Both sets are assigned in the same
"New_Tai_Lue" script property value.
So the additional stability proposal will not be enforceable.
On the contrary. Were we do want such a policy, the implication would be
either to:
(a) change the type of 19DA from Nd to No (what I think would be the
right thing to do)
(b) grandfather in the character.
This discussion doesn't make sense to me. The original proposal to
encode 19DA says that there is one set of digits in New Tai Lue, but
there is an extra digit '1' (the one that got put at 19DA), used when
the other digit '1' is visually confusable with another character in the
script, which it resembles. That makes it sound like the two are
essentially used as glyph variants of each other, and are
interchangeable as far as the computer recognizing an input number.
Thus, it is appropriate to keep it as Nd, and it isn't scattered,
because it is adjacent to the block of 10 digits. My original proposal
accounted for this case, asking that the slot or two immediately above
the digit '9' be unassigned initially in a new script encoding, just in
case a situation like this one arises again.
One thing that I should have brought up earlier in this discussion is
that, as an implementor, I can deal with existing exceptions. I may not
want to, and may choose not to if my subjective calculation of
benefit/cost indicates it's not worthwhile. Given the existing pattern
of code point assignments, I saw an efficient way to implement things.
And, if future Unicode versions retain this pattern, neither I nor my
successors will have to change our code to move to that new version.
Changing code takes a significant amount of time and effort. Keeping
new versions of Unicode using the same paradigms as previous versions
means that implementations of those new versions will be available
sooner than otherwise, and even that they get adopted at all. I was
unaware of the subtleties in Han and Arabic, but those can be handled as
exceptions, but making new exceptions is really contrary to Unicode's
interests. So it really isn't about current counter examples; there's
nothing much that can be done about them. It's about adopting
guidelines to keep from unnecessarily creating new exceptions.