Tony Cappellini wrote:
> Using Windows XP, SP2 and Python 2.3
> I've written a script which walks through a bunch of directories and
> replaces characters which are typically illegals as filenames, with an
> '_' character.


> When my script encounters a directory with the unwanted characters,
> it's easy to detect them and filter them out. The next step is to
> rename the file to get rid of the problem characters.


> However, recently when I called os.rename(oldname, newname) an OS
> exception was thrown with "Illegal filename". I was able to narrow it
> down to oldname being the cause of the problem.
> Some of the characters showed up as ? in the Python strings.
> Oddly enough, os.rename() cannot perform the renaming of the
> directories, but I can do this manually in File Explorer or even in a
> CMD console using "rename"
> So what is os.renaming() actually calling on a Windows system, that
> won't allow me to rename dirs with illegal characters?

Well, the simple answer to that is (cut-and-pasted and snipped a bit)
from the posixmodule.c source:

        if (unicode_file_names()) {
        result = MoveFileW(PyUnicode_AsUnicode(o1),
        result = MoveFileA(p1, p2);

so it's using the MoveFileW with two unicode filenames, or
the MoveFileA with two non-unicode filenames. So... are you
calling os.rename with unicode or non-unicode filenames?

If you're using, say, os.walk or os.listdir to walk your tree,
pass it a unicode path to start with, and the filenames coming
back will also be unicode. Try this, for example:

import os, sys

# filename with random non-ascii char
filename = u"abc\u0123.txt"
open (filename, "w").close ()

for i in os.listdir (u"."):
   print i.encode (sys.stdout.encoding, "replace")

new_filename = unicode (filename.encode ("ascii", "replace").replace ("?", "_"))
os.rename (filename, new_filename)

for i in os.listdir (u"."):
   print i


The filename with the random unicode char is
shown (with the fill-in question-mark) in the
initial list. It's then renamed with the non-ascii
char replaced by "_" and appears without an encoding
in the final list.

I think this is what you're after.

Tutor maillist  -

Reply via email to