On Mon, Oct 8, 2012 at 4:11 PM, Prasad, Ramit <ramit.pra...@jpmorgan.com> wrote: > >> for ch in text: >> if '0' <= ch <= '9': >> doSomething(ch) > > I am not sure that will work very well with Unicode numbers. I would > assume (you know what they say about assuming) that str.isdigit() > works better with international characters/numbers.
In my tests below, isdigit() matches both decimal digits ('Nd') and other digits ('No'). None of the 'No' category digits works with int(). Python 2.7.3 >>> chars = [unichr(i) for i in xrange(sys.maxunicode + 1)] >>> digits = [c for c in chars if c.isdigit()] >>> digits_d = [d for d in digits if category(d) == 'Nd'] >>> digits_o = [d for d in digits if category(d) == 'No'] >>> len(digits), len(digits_d), len(digits_o) (529, 411, 118) Decimal >>> nums = [int(d) for d in digits_d] >>> [nums.count(i) for i in range(10)] [41, 42, 41, 41, 41, 41, 41, 41, 41, 41] Other >>> print u''.join(digits_o[:3] + digits_o[12:56]) ²³¹⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉①②③④⑤⑥⑦⑧⑨⑴⑵⑶⑷⑸⑹⑺⑻⑼⒈⒉⒊⒋⒌⒍⒎⒏⒐ >>> print u''.join(digits_o[67:94]) ❶❷❸❹❺❻❼❽❾➀➁➂➃➄➅➆➇➈➊➋➌➍➎➏➐➑➒ >>> print u''.join(digits_o[3:12]) ፩፪፫፬፭፮፯፰፱ >>> int(digits_o[67]) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'decimal' codec can't encode character u'\u2776' in position 0: invalid decimal Unicode string Python 3.2.3 >>> chars = [chr(i) for i in range(sys.maxunicode + 1)] >>> digits = [c for c in chars if c.isdigit()] >>> digits_d = [d for d in digits if category(d) == 'Nd'] >>> digits_o = [d for d in digits if category(d) == 'No'] >>> len(digits), len(digits_d), len(digits_o) (548, 420, 128) Decimal >>> nums = [int(d) for d in digits_d] >>> [nums.count(i) for i in range(10)] [42, 42, 42, 42, 42, 42, 42, 42, 42, 42] Other >>> print(*(digits_o[:3] + digits_o[13:57]), sep='') ²³¹⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉①②③④⑤⑥⑦⑧⑨⑴⑵⑶⑷⑸⑹⑺⑻⑼⒈⒉⒊⒋⒌⒍⒎⒏⒐ >>> print(*digits_o[68:95], sep='') ❶❷❸❹❺❻❼❽❾➀➁➂➃➄➅➆➇➈➊➋➌➍➎➏➐➑➒ >>> print(*digits_o[3:12], sep='') ፩፪፫፬፭፮፯፰፱ >>> int(digits_o[68]) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid literal for int() with base 10: '❶' _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor