Re: unicode "em space" in regex

2005-04-17 Thread "Martin v. LÃwis"
Xah Lee wrote: > Thanks. Is it true that any unicode chars can also be used inside regex > literally? > > e.g. > re.search(ur'â+',mystring,re.U) > > I tested this case and apparently i can. Yes. In fact, when you write u"\u2003" or u"â" doesn't matter to re.search. Either way you get a Unicode

Re: unicode "em space" in regex

2005-04-17 Thread Fredrik Lundh
Xah Lee wrote: "Regular expression pattern strings may not contain null bytes, but can specify the null byte using the \number notation." What is meant by null bytes here? Unprintable chars?? no, null bytes. "\0". "\x00". ord(byte) == 0. chr(0). and the "\number" is meant to be decimal? octal.

Re: unicode "em space" in regex

2005-04-17 Thread Reinhold Birkenfeld
Xah Lee wrote: > "Regular expression pattern strings may not contain null bytes, but can > specify the null byte using the \number notation." > > What is meant by null bytes here? Unprintable chars?? and the "\number" > is meant to be decimal? and in what encoding? The null byte is a byte with t

Re: unicode "em space" in regex

2005-04-17 Thread Xah Lee
Thanks. Is it true that any unicode chars can also be used inside regex literally? e.g. re.search(ur'â+',mystring,re.U) I tested this case and apparently i can. But is it true that any unicode char can be embedded in regex literally. (does this apply to the esoteric ones such as other non-printin

Re: unicode "em space" in regex

2005-04-16 Thread "Martin v. LÃwis"
Xah Lee wrote: > how to represent the unicode "em space" in regex? You will have to pass a Unicode literal as the regular expression, e.g. fracture=re.split(u'\u2003*\\|\u2003*',myline,re.U) Notice that, in raw Unicode literals, you can still use \u to escape char

Re: unicode "em space" in regex

2005-04-16 Thread Klaus Alexander Seistrup
Xah Lee : > how to represent the unicode "em space" in regex? > > e.g. i want do something like this: > > fracture=re.split(r'\342371*\|\342371*',myline,re.U) I'm not sure what you're trying to do, but would it help you to use it&

unicode "em space" in regex

2005-04-16 Thread Xah Lee
how to represent the unicode "em space" in regex? e.g. i want do something like this: fracture=re.split(r'\342371*\|\342371*',myline,re.U) Xah [EMAIL PROTECTED] â http://xahlee.org/ -- http://mail.python.org/mailman/listinfo/python-list