On Wed, Jul 04, 2007 at 02:47:45PM -0400, Kent Johnson wrote: >encode() really wants a unicode string not a byte string. If you call >encode() on a byte string, the string is first converted to unicode >using the default encoding (usually ascii), then converted with the >given encoding.
Aha! That helps. Something else that helps is that my Python code is generating output that is received by several other tools. Interesting facts: Not all .NET XML parsers (nor IE6) accept valid UTF-8 XML. I am indeed seeing filenames in cp1252, even though the Microsoft docs say that filenames are in UTF-8. Filenames in Arabic are in UTF-8. What I have to do is to check the encoding of the filename as received by os.walk (and thus os.listdir) and convert them to Unicode, continue to process them, and then encode them as UTF-8 for output to XML. In trying to work around bad 3rd party tools and inconsistent data I introduced errors in my Python code. The problem was in treating all filenames the same way, when they were not being created the same way by the filesystem. Thanks for all the help and suggestions. -- yours, William _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor