I am reading some text from a web site, using f=urllib.urlopen(....), and then s=f.read()
I then extract a bit of 's' as s1, s1 contains "Na Ponta Do Pé" The é is encoded in a single byte as 0XE9. If I do IS_SLUG.urlify(s1) it throws and error because 0XE9 is not a valid character. I believe the encoding is ansii. I have tried all manner of encoding and decoding but cannot get anything to work. If I print s1 to the console or a file, then it works fine. But most python character operations fail, presumably because they are expecting utf-8 which encodes é as two bytes. If I do s1="Na Ponta Do Pé" IS_SLUG.urlify(s1) There is no error. Clearly I could check for 0XE9 and convert it uniquely, but I wonder if anyone could suggest a conversion that would work for any ansii character. I have googled and experimented a lot on this with no success. Thanks Peter