Re: [Tutor] Regex question

Hugo Arts Sun, 03 Apr 2011 06:06:04 -0700

2011/4/3 "Andrés Chandía" <[email protected]>:
>
>
> I continue working with RegExp, but I have reached a point for wich I can't 
> find
> documentation, maybe there is no possible way to do it, any way I throw the 
> question:
>
> This is my code:
>
>     contents = re.sub(r'Á',
> "A", contents)
>     contents = re.sub(r'á', "a",
> contents)
>     contents = re.sub(r'É', "E", contents)
>     contents = re.sub(r'é', "e", contents)
>     contents = re.sub(r'Í', "I", contents)
>     contents = re.sub(r'í', "i", contents)
>     contents = re.sub(r'Ó', "O", contents)
>     contents = re.sub(r'ó', "o", contents)
>     contents = re.sub(r'Ú', "U", contents)
>     contents = re.sub(r'ú', "u", contents)
>
> It is
> clear that I need to convert any accented vowel into the same not accented 
> vowel,
> The
> qestion is : is there a way to say that whenever you find an accented 
> character this
> one
> has to change into a non accented character, but not every character, it must 
> be only
> this vowels and accented this way, because at the language I am working with, 
> there are
> letters
> like ü, and ñ that should remain the same.
>


Okay, first thing, forget about regexes for this problem.They're too
complicated and not suited to it.

Encoding issues make this a somewhat complicated problem. In Unicode,
There's two ways to encode most accented characters. For example, the
character "Ć" can be encoded both by U+0106, "LATIN CAPITAL LETTER C
WITH ACUTE", and a combination of U+0043 and U+0301, being simply 'C'
and the 'COMBINING ACUTE ACCENT', respectively. You must remove both
forms to be sure every accented character is gone from your string.

using unicode.translate, you can craft a translation table to
translate the accented characters to their non-accented counterparts.
The combining characters can simply be removed by mapping them to
None.

HTH,
Hugo
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Regex question

Reply via email to