[Tutor] regex: matching unicode

Albert-Jan Roskam Sat, 22 Dec 2012 13:07:45 -0800

Hi,

Is the code below the only/shortest way to match unicode characters? I would 
like to match whatever is defined as a character in the unicode reference 
database. So letters in the broadest sense of the word, but not digits, 
underscore or whitespace. Until just now, I was convinced that the re.UNICODE 
flag generalized the [a-z] class to all unicode letters, and that the absence 
of re.U was an implicit 're.ASCII'. Apparently that mental model was *wrong*.
But [^\W\s\d_]+ is kind of hard to read/write.


import re
s = unichr(956)  # mu sign
m = re.match(ur"[^\W\s\d_]+", s, re.I | re.U)

Regards,
Albert-Jan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public 
order, irrigation, roads, a 
fresh water system, and public health, what have the Romans ever done for us?

 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] regex: matching unicode

Reply via email to