karl williamson wrote:

Asmus Freytag wrote:
The situation is worse than you indicate, because the same characters
are also used as elements in a system that doesn't use place-value, but
uses special characters to show powers of 10.


I would think I wouldn't support these numbers, since we couldn't be
unambiguously sure of what was intended.

Another issue that I brought up a while back on this list is Tamil
numbers, where western practice seems to have infiltrated enough that
Unicode gave them Gc=Nd, but IIRC from the responses I got back then,
they can appear in older style with other characters meaning 10, 100,
1000.  In implementing this, if any of the other characters were
encountered in parsing such a number, it would disqualify it.

I think you could treat the Han digits the same way: In some of the Chinese news corpora I work with, the ten Han digits are frequently used Western-style, especially for years, phone numbers, and other identifiers.

- John D. Burger
  MITRE


Reply via email to