On Wed, Sep 5, 2012 at 5:42 AM, Ray Jones <crawlz...@gmail.com> wrote: > I have directory names that contain Russian characters, Romanian > characters, French characters, et al. When I search for a file using > glob.glob(), I end up with stuff like \x93\x8c\xd1 in place of the > directory names. I thought simply identifying them as Unicode would > clear that up. Nope. Now I have stuff like \u0456\u0439\u043e.
This is just an FYI in case you were manually decoding. Since glob calls os.listdir(dirname), you can get Unicode output if you call it with a Unicode arg: >>> t = u"\u0456\u0439\u043e" >>> open(t, 'w').close() >>> import glob >>> glob.glob('*') # UTF-8 output ['\xd1\x96\xd0\xb9\xd0\xbe'] >>> glob.glob(u'*') [u'\u0456\u0439\u043e'] Regarding subprocess.Popen, just use Unicode -- at least on a POSIX system. Popen calls an exec function, such as posix.execv, which handles encoding Unicode arguments to the file system encoding. On Windows, the _subprocess C extension in 2.x is limited to calling CreateProcessA with char* 8-bit strings. So Unicode characters beyond ASCII (the default encoding) trigger an encoding error. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor