2011/4/3 "Andrés Chandía" <and...@chandia.net>: > > > I continue working with RegExp, but I have reached a point for wich I can't > find > documentation, maybe there is no possible way to do it, any way I throw the > question: > > This is my code: > > contents = re.sub(r'Á', > "A", contents) > contents = re.sub(r'á', "a", > contents) > contents = re.sub(r'É', "E", contents) > contents = re.sub(r'é', "e", contents) > contents = re.sub(r'Í', "I", contents) > contents = re.sub(r'í', "i", contents) > contents = re.sub(r'Ó', "O", contents) > contents = re.sub(r'ó', "o", contents) > contents = re.sub(r'Ú', "U", contents) > contents = re.sub(r'ú', "u", contents) > > It is > clear that I need to convert any accented vowel into the same not accented > vowel, > The > qestion is : is there a way to say that whenever you find an accented > character this > one > has to change into a non accented character, but not every character, it must > be only > this vowels and accented this way, because at the language I am working with, > there are > letters > like ü, and ñ that should remain the same. >
Okay, first thing, forget about regexes for this problem.They're too complicated and not suited to it. Encoding issues make this a somewhat complicated problem. In Unicode, There's two ways to encode most accented characters. For example, the character "Ć" can be encoded both by U+0106, "LATIN CAPITAL LETTER C WITH ACUTE", and a combination of U+0043 and U+0301, being simply 'C' and the 'COMBINING ACUTE ACCENT', respectively. You must remove both forms to be sure every accented character is gone from your string. using unicode.translate, you can craft a translation table to translate the accented characters to their non-accented counterparts. The combining characters can simply be removed by mapping them to None. HTH, Hugo _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor