Tony Cappellini wrote: > Using Windows XP, SP2 and Python 2.3 > > I've written a script which walks through a bunch of directories and > replaces characters which are typically illegals as filenames, with an > '_' character.
[...] > When my script encounters a directory with the unwanted characters, > it's easy to detect them and filter them out. The next step is to > rename the file to get rid of the problem characters. [...] > However, recently when I called os.rename(oldname, newname) an OS > exception was thrown with "Illegal filename". I was able to narrow it > down to oldname being the cause of the problem. > Some of the characters showed up as ? in the Python strings. > > Oddly enough, os.rename() cannot perform the renaming of the > directories, but I can do this manually in File Explorer or even in a > CMD console using "rename" > > So what is os.renaming() actually calling on a Windows system, that > won't allow me to rename dirs with illegal characters? Well, the simple answer to that is (cut-and-pasted and snipped a bit) from the posixmodule.c source: if (unicode_file_names()) { ... result = MoveFileW(PyUnicode_AsUnicode(o1), PyUnicode_AsUnicode(o2)); ... result = MoveFileA(p1, p2); so it's using the MoveFileW with two unicode filenames, or the MoveFileA with two non-unicode filenames. So... are you calling os.rename with unicode or non-unicode filenames? If you're using, say, os.walk or os.listdir to walk your tree, pass it a unicode path to start with, and the filenames coming back will also be unicode. Try this, for example: <code> import os, sys # # filename with random non-ascii char # filename = u"abc\u0123.txt" open (filename, "w").close () for i in os.listdir (u"."): print i.encode (sys.stdout.encoding, "replace") new_filename = unicode (filename.encode ("ascii", "replace").replace ("?", "_")) os.rename (filename, new_filename) for i in os.listdir (u"."): print i </code> The filename with the random unicode char is shown (with the fill-in question-mark) in the initial list. It's then renamed with the non-ascii char replaced by "_" and appears without an encoding in the final list. I think this is what you're after. TJG _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor