William O'Higgins Witteman wrote: > I have several programs which traverse a Windows filesystem with French > characters in the filenames. > > I have having trouble dealing with these filenames when outputting these > paths to an XML file - I get UnicodeDecodeError: 'ascii' codec can't > decode byte 0xe9 ... etc. That happens when I try to convert to UTF-8. > > I know what os will give me UFT-8 if I give it UTF-8, and I am trying to > do that, but somewhere down the line it seems like it reverts to ASCII, > and then I get these errors. > > Has anyone found a silver bullet for ensuring that all the filenames > encountered by os.walk are treated as UTF-8? Thanks.
Some code would help here, there are so many ways people get confused by UTF-8 and stumble over the subtleties of Python's use of Unicode. Particularly the code that gives you the error. The error you quote is a decode error, whereas converting to UTF-8 is encoding. Also it would be helpful to figure out for sure what you are getting from os.walk() - is it UTF-8 or Unicode? The best way to find out is to print repr(filename) and see what you get on output. Kent _______________________________________________ Tutor maillist - [email protected] http://mail.python.org/mailman/listinfo/tutor
