New submission from Nguyen Quan Son:

There is an inconsistency in handling RE inline flags ( e.g. '(?iu)' ) 
when pattern consists of some unicode characters, for example 
characters in range from '\u1ea0' to '\u1ef9'.

Please see code attached for a demonstration of the problem.

----------
components: Regular Expressions
files: re_unicode_flag.py
messages: 58993
nosy: sonnq
severity: normal
status: open
title: Regular Expression inline flags not handled correctly for some unicode 
characters
type: behavior
versions: Python 2.3, Python 2.4, Python 2.5
Added file: http://bugs.python.org/file9028/re_unicode_flag.py

__________________________________
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1700>
__________________________________
import re

upper_char = unichr(0x1ea0) # Latin Capital Letter A with Dot Bellow
lower_char = unichr(0x1ea1) # Latin Small Letter A with Dot Bellow
    
p = re.compile(upper_char, re.I | re.U)
print p.match(lower_char)

p = re.compile(lower_char, re.I | re.U)
print p.match(upper_char)

p = re.compile('(?i)' + upper_char, re.U)
print p.match(lower_char)

p = re.compile('(?i)' + lower_char, re.U)
print p.match(upper_char)

p = re.compile('(?iu)' + upper_char)
print p.match(lower_char)

p = re.compile('(?iu)' + lower_char)
print p.match(upper_char)   # Error: got None
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to