[web2py] problem with character encoding

peter Thu, 02 Feb 2012 14:24:57 -0800

I am reading some text from a web site, using f=urllib.urlopen(....),
and then s=f.read()


I then extract a bit of 's' as s1, s1 contains "Na Ponta Do Pé"

The é is encoded in a single byte as 0XE9.

If I do IS_SLUG.urlify(s1) it throws and error because 0XE9 is not a
valid character. I believe the encoding is ansii. I have tried all
manner of encoding and decoding but cannot get anything to work. If I
print s1 to the console or a file, then it works fine. But most python
character operations fail, presumably because they are expecting utf-8
which encodes é as two bytes.


If I do
s1="Na Ponta Do Pé"
IS_SLUG.urlify(s1)

There is no error.

Clearly I could check for 0XE9 and convert it uniquely, but I wonder
if anyone could suggest a conversion that would work for any ansii
character. I have googled and experimented a lot on this with no
success.

Thanks
Peter

[web2py] problem with character encoding

Reply via email to