Re: regular expression, unicode

2009-04-30 Thread Simon Strobl
Thanks for your hints. Usually, all my files are utf-8. Obviously, I somehow managed to inadvertently switch the encoding when creating this specific file. I have no idea how this could happen. Simon -- http://mail.python.org/mailman/listinfo/python-list

regular expression, unicode

2009-04-29 Thread Simon Strobl
Hello, why can't I use this pattern good = re.compile(^[A-ZÄÖÜ].*) in python3. According to the documentation, patterns may be unicode strings. I get this error message: Traceback (most recent call last): File ./get.py, line 8, in module for line in sys.stdin: File

Re: regular expression, unicode

2009-04-29 Thread Rhodri James
On Wed, 29 Apr 2009 12:44:12 +0100, Simon Strobl simon.str...@gmail.com wrote: why can't I use this pattern good = re.compile(^[A-ZÄÖÜ].*) in python3. According to the documentation, patterns may be unicode strings. I get this error message: Traceback (most recent call last): File

Re: regular expression, unicode

2009-04-29 Thread MRAB
Simon Strobl wrote: Hello, why can't I use this pattern good = re.compile(^[A-ZÄÖÜ].*) in python3. According to the documentation, patterns may be unicode strings. I get this error message: Traceback (most recent call last): File ./get.py, line 8, in module for line in sys.stdin:

regular expression, unicode

2009-04-29 Thread Simon Strobl
Hello, why can't I use this statement in python3: good = re.compile(^[A-ZÄÖÜ].*) According to the documentation, patterns can be unicode strings. I get this error message: Traceback (most recent call last): File ./get.py, line 8, in module for line in sys.stdin: File

Re: regular expression unicode character class trouble

2005-09-05 Thread Diez B. Roggisch
Steven Bethard wrote: I'd use something like r[^_\d\W], that is, all things that are neither underscores, digits or non-alphas. In action: py re.findall(r'[^_\d\W]+', '42badger100x__xxA1BC') ['badger', 'x', 'xxA', 'BC'] HTH, Seems so, great! Diez --

regular expression unicode character class trouble

2005-09-04 Thread Diez B. Roggisch
Hi, I need in a unicode-environment the character-class set(\w) - set([0-9]) or aplha w/o num. Any ideas how to create that? And what performance implications do I have to fear? I mean I guess that the characterclasses aren't implementet as sets, but as comparison-function that compares a

Re: regular expression unicode character class trouble

2005-09-04 Thread Steven Bethard
Diez B. Roggisch wrote: Hi, I need in a unicode-environment the character-class set(\w) - set([0-9]) or aplha w/o num. Any ideas how to create that? I'd use something like r[^_\d\W], that is, all things that are neither underscores, digits or non-alphas. In action: py