Re: LC_ALL and os.listdir()

2005-02-24 Thread "Martin v. Löwis"
Duncan Booth wrote: Windows (when using NTFS) stores all the filenames in unicode, and Python uses the unicode api to implement listdir (when given a unicode path). This means that the filename never gets encoded to a byte string either by the OS or Python. If you use a byte string path than the

Re: LC_ALL and os.listdir()

2005-02-24 Thread Serge Orlov
Duncan Booth wrote: > Martin v. Löwis wrote: > > > Serge Orlov wrote: > >> Shouldn't os.path.join do that? If you pass a unicode string > >> and a byte string it currently tries to convert bytes to > >> characters > >> but it makes more sense to convert the unicode string to bytes > >> and return t

Re: LC_ALL and os.listdir()

2005-02-24 Thread Duncan Booth
Martin v. Löwis wrote: > Serge Orlov wrote: >> Shouldn't os.path.join do that? If you pass a unicode string >> and a byte string it currently tries to convert bytes to characters >> but it makes more sense to convert the unicode string to bytes >> and return two byte strings concatenated. > > Sou

Re: LC_ALL and os.listdir()

2005-02-23 Thread Kenneth Pronovici
On Wed, Feb 23, 2005 at 10:07:19PM +0100, "Martin v. Löwis" wrote: > So we have three options: > 1. skip this string, only return the ones that can be >converted to Unicode. Give the user the impression >the file does not exist. > 2. return the string as a byte string > 3. refuse to listdir

Re: LC_ALL and os.listdir()

2005-02-23 Thread "Martin v. Löwis"
Serge Orlov wrote: Shouldn't os.path.join do that? If you pass a unicode string and a byte string it currently tries to convert bytes to characters but it makes more sense to convert the unicode string to bytes and return two byte strings concatenated. Sounds reasonable. OTOH, this would be the onl

Re: LC_ALL and os.listdir()

2005-02-23 Thread Serge Orlov
"Martin v. Löwis" wrote: >> My goal is to build generalized code that consistently works with all >> kinds of filenames. > > Then it is best to drop the notion that file names are > character strings (because some file names aren't). You > do so by converting your path variable into a byte > string

Re: LC_ALL and os.listdir()

2005-02-23 Thread "Martin v. Löwis"
Kenneth Pronovici wrote: 1) Why LC_ALL has any effect on the os.listdir() result? The operating system (POSIX) does not have the inherent notion that file names are character strings. Instead, in POSIX, file names are primarily byte strings. There are some bytes which are interpreted as charact

Re: LC_ALL and os.listdir()

2005-02-23 Thread "Martin v. Löwis"
Kenneth Pronovici wrote: I think that I can solve my problem by just converting any unicode strings from configuration into utf-8 simple strings using encode(). Using this solution, all of my existing regression tests still pass, and my code seems to make it past the unusual directory. See my other

Re: LC_ALL and os.listdir()

2005-02-23 Thread Kenneth Pronovici
On Wed, Feb 23, 2005 at 01:03:56AM -0600, Kenneth Pronovici wrote: [snip] > Today, I accidentally ran across a directory containing three "normal" > files (with ASCII filenames) and one file with a two-character unicode > filename. My code, which was doing something like this: > >for entry

LC_ALL and os.listdir()

2005-02-22 Thread Kenneth Pronovici
I have some confusion regarding the relationship between locale, os.listdir() and unicode pathnames. I'm running Python 2.3.5 on a Debian system. If it matters, all of the files I'm dealing with are on an ext3 filesystem. The real code this problem comes from takes a configured set of directorie