Hi, Is the code below the only/shortest way to match unicode characters? I would like to match whatever is defined as a character in the unicode reference database. So letters in the broadest sense of the word, but not digits, underscore or whitespace. Until just now, I was convinced that the re.UNICODE flag generalized the [a-z] class to all unicode letters, and that the absence of re.U was an implicit 're.ASCII'. Apparently that mental model was *wrong*. But [^\W\s\d_]+ is kind of hard to read/write.
import re s = unichr(956) # mu sign m = re.match(ur"[^\W\s\d_]+", s, re.I | re.U) Regards, Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor