Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
Change the default file system encoding to store bytes in Unicode is like introducing a new Python type: fake Unicode for filename hacks. Exactly. Seems like the best solution to me, despite your polemics. Regards, Martin ___ Python-Dev mailing

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Adam Olsen
On Tue, Sep 30, 2008 at 12:22 AM, Georg Brandl [EMAIL PROTECTED] wrote: Victor Stinner schrieb: Le Monday 29 September 2008 18:45:28 Georg Brandl, vous avez écrit : If I had to choose, I'd still argue for the modified UTF-8 as filesystem encoding (if it were UTF-8 otherwise), despite possible

Re: [Python-Dev] Status of MS Windows CE port

2008-09-30 Thread Ulrich Eckhardt
On Tuesday 30 September 2008, Martin v. Löwis wrote: Ulrich Eckhardt wrote: Well, currently it does make a difference. Simple example: CreateFile(). It's not so simple: Python doesn't actually call CreateFile Martin, CreateFile() was just used as an example. You can substitute it

Re: [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Nick Coghlan
Adam Olsen wrote: Lossy conversion just moves around what gets treated as garbage. As all valid unicode scalars can be round tripped, there's no way to create a valid unicode file name without being lossy. The alternative is not be valid unicode, but since we can't use such objects with

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread M.-A. Lemburg
On 2008-09-30 08:00, Martin v. Löwis wrote: Change the default file system encoding to store bytes in Unicode is like introducing a new Python type: fake Unicode for filename hacks. Exactly. Seems like the best solution to me, despite your polemics. Not a bad idea... have os.listdir()

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Stephen J. Turnbull
Adam Olsen writes: [1] You could argue that Unicode should add new scalars to handle all currently invalid UTF-8 sequences. AFAIK there are about 2^31 of these, though! ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] Python security team

2008-09-30 Thread Steve Holden
Jan Mate wrote: Guido van Rossum napsal(a): [...] know you personally -- but perhaps other current members of the PSRT do and that could be enough to secure an invitation. No, i don't think that i'm known well enough to earn the invitation (yet), this was more of a so how the hell does it

[Python-Dev] when is path==NULL?

2008-09-30 Thread Ulrich Eckhardt
Hi! I'm looking at trunk/Python/sysmodule.c, function PySys_SetArgv(). In that function, there is code like this: PyObject* path = PySys_GetObject(path); ... if (path != NULL) { ... } My intuition says that if path==NULL, something is very wrong. At least I would expect to get

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Adam Olsen
On Tue, Sep 30, 2008 at 5:24 AM, Stephen J. Turnbull [EMAIL PROTECTED] wrote: Adam Olsen writes: [1] You could argue that Unicode should add new scalars to handle all currently invalid UTF-8 sequences. AFAIK there are about 2^31 of these, though! They've promised to never allocate above

Re: [Python-Dev] when is path==NULL?

2008-09-30 Thread Christian Heimes
Ulrich Eckhardt wrote: Hi! I'm looking at trunk/Python/sysmodule.c, function PySys_SetArgv(). In that function, there is code like this: PyObject* path = PySys_GetObject(path); ... if (path != NULL) { ... } My intuition says that if path==NULL, something is very wrong. At least

Re: [Python-Dev] when is path==NULL?

2008-09-30 Thread Thomas Lee
Ulrich Eckhardt wrote: Hi! I'm looking at trunk/Python/sysmodule.c, function PySys_SetArgv(). In that function, there is code like this: PyObject* path = PySys_GetObject(path); ... if (path != NULL) { ... } My intuition says that if path==NULL, something is very wrong. At least

Re: [Python-Dev] when is path==NULL?

2008-09-30 Thread Thomas Lee
Ulrich Eckhardt wrote: Hi! I'm looking at trunk/Python/sysmodule.c, function PySys_SetArgv(). In that function, there is code like this: PyObject* path = PySys_GetObject(path); ... if (path != NULL) { ... } My intuition says that if path==NULL, something is very wrong. At least

Re: [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread glyph
On 12:47 am, [EMAIL PROTECTED] wrote: This is the most sane contribution I've seen so far :). See attached patch: python3_bytes_filename.patch Using the patch, you will get: - open() support bytes - listdir(unicode) - only unicode, *skip* invalid filenames (as asked by Guido) Forgive me

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Mon, Sep 29, 2008 at 8:55 PM, Terry Reedy [EMAIL PROTECTED] wrote: Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit : I know I keep flipflopping on this one, but the more I think about it the more I believe it is better to drop those names than to raise an exception.

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Mon, Sep 29, 2008 at 11:00 PM, Martin v. Löwis [EMAIL PROTECTED] wrote: Change the default file system encoding to store bytes in Unicode is like introducing a new Python type: fake Unicode for filename hacks. Exactly. Seems like the best solution to me, despite your polemics. Martin, I

Re: [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Victor Stinner
Hi, This is the most sane contribution I've seen so far :). Oh thanks. Do I understand properly that (listdir(bytes) - bytes)? Yes, os.listdir(bytes)-bytes. It's already the current behaviour. But with Python3 trunk, os.listdir(str) - str ... or bytes (if unicode conversion fails). If

Re: [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 2:45 AM, Nick Coghlan [EMAIL PROTECTED] wrote: Adam Olsen wrote: Lossy conversion just moves around what gets treated as garbage. As all valid unicode scalars can be round tripped, there's no way to create a valid unicode file name without being lossy. The alternative

Re: [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 3:52 AM, Hrvoje Nikšić [EMAIL PROTECTED] wrote: On Tue, 2008-09-30 at 19:45 +1000, Nick Coghlan wrote: To my mind, there are two kinds of app in the world when it comes to file paths: 1) Normal apps (e.g. a word processor), that are only interested in files with sane,

Re: [Python-Dev] when is path==NULL?

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 5:48 AM, Christian Heimes [EMAIL PROTECTED] wrote: Ulrich Eckhardt wrote: Hi! I'm looking at trunk/Python/sysmodule.c, function PySys_SetArgv(). In that function, there is code like this: PyObject* path = PySys_GetObject(path); ... if (path != NULL) { ...

Re: [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 6:21 AM, [EMAIL PROTECTED] wrote: On 12:47 am, [EMAIL PROTECTED] wrote: This is the most sane contribution I've seen so far :). Thanks. I'll review it later today (after coffee+breakfast :) and will apply it assuming the code is reasonably sane, otherwise I'll go

Re: [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Hrvoje Nikšić
On Tue, 2008-09-30 at 07:26 -0700, Guido van Rossum wrote: I am not convinced that a word processor can just ignore files with (what it thinks are) undecodable file names. In countries with a history of incompatible national encodings, such file names crop up very often, sometimes as a

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Steven D'Aprano
On Tue, 30 Sep 2008 11:50:10 pm Guido van Rossum wrote: To avoid silent skipping, is it possible to drop 'unreadable' names, issue a warning (instead of exception), and continue to completion? Warning: unreadable filename skipped; see PyWiki/UnreadableFilenames That would be annoying as

Re: [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Bill Janssen
Victor Stinner [EMAIL PROTECTED] wrote: - listdir(unicode) - only unicode, *skip* invalid filenames (as asked by Guido) Is there an option listdir(bytes) which will return *all* filenames (as byte sequences)? Otherwise, this seems troubling to me; *something* should be returned for

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 7:53 AM, Steven D'Aprano [EMAIL PROTECTED] wrote: On Tue, 30 Sep 2008 11:50:10 pm Guido van Rossum wrote: To avoid silent skipping, is it possible to drop 'unreadable' names, issue a warning (instead of exception), and continue to completion? Warning: unreadable

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 8:20 AM, M.-A. Lemburg [EMAIL PROTECTED] wrote: In the end, I think it's better not to be clever and just return the filenames that cannot be decoded as bytes objects in os.listdir(). Unfortunately that's going to break most code that is using os.listdir(), so it's

Re: [Python-Dev] [Python-3000] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 8:47 AM, Bill Janssen [EMAIL PROTECTED] wrote: Victor Stinner [EMAIL PROTECTED] wrote: - listdir(unicode) - only unicode, *skip* invalid filenames (as asked by Guido) Is there an option listdir(bytes) which will return *all* filenames (as byte sequences)?

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Georg Brandl
Guido van Rossum schrieb: With the filenames decoded by UTF-8, your files named têste, ô, dossié will be displayed and handled correctly. The others are *invalid* in the filesystem encoding UTF-8 and therefore would be represented by something like u'dir\uXXffname' where XX is some private

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Georg Brandl
Steven D'Aprano schrieb: On Tue, 30 Sep 2008 11:50:10 pm Guido van Rossum wrote: To avoid silent skipping, is it possible to drop 'unreadable' names, issue a warning (instead of exception), and continue to completion? Warning: unreadable filename skipped; see PyWiki/UnreadableFilenames

Re: [Python-Dev] [Python-3000] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Bill Janssen
Guido van Rossum [EMAIL PROTECTED] wrote: On Tue, Sep 30, 2008 at 8:47 AM, Bill Janssen [EMAIL PROTECTED] wrote: Victor Stinner [EMAIL PROTECTED] wrote: - listdir(unicode) - only unicode, *skip* invalid filenames (as asked by Guido) Is there an option listdir(bytes) which will

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread M.-A. Lemburg
On 2008-09-30 18:46, Guido van Rossum wrote: On Tue, Sep 30, 2008 at 8:20 AM, M.-A. Lemburg [EMAIL PROTECTED] wrote: In the end, I think it's better not to be clever and just return the filenames that cannot be decoded as bytes objects in os.listdir(). Unfortunately that's going to break

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Georg Brandl
Guido van Rossum schrieb: On Tue, Sep 30, 2008 at 10:28 AM, Georg Brandl [EMAIL PROTECTED] wrote: How can it *regularly* drive you crazy when the majority of fie names [...] encoded correctly (as you assert above)? Because Office files are a) often named with long, seemingly descriptive

Re: [Python-Dev] [Python-3000] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 10:41 AM, Bill Janssen [EMAIL PROTECTED] wrote: Guido van Rossum [EMAIL PROTECTED] wrote: On Tue, Sep 30, 2008 at 8:47 AM, Bill Janssen [EMAIL PROTECTED] wrote: Victor Stinner [EMAIL PROTECTED] wrote: - listdir(unicode) - only unicode, *skip* invalid filenames

Re: [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread glyph
On 02:32 pm, [EMAIL PROTECTED] wrote: On Tue, Sep 30, 2008 at 6:21 AM, [EMAIL PROTECTED] wrote: On 12:47 am, [EMAIL PROTECTED] wrote: It sounds like maybe there should be some 2to3 fixers in here somewhere, too? Not necessarily as part of this patch, but somewhere related? I don't know

Re: [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 10:59 AM, [EMAIL PROTECTED] wrote: On 02:32 pm, [EMAIL PROTECTED] wrote: If 2.6 weren't pretty much released already I'd ask to add os.getcwdb() there, as an alias for os.getcwd(), and add a 2to3 fixer that converts os.getcwdu() to os.getcwd(), leaves os.getcwd() alone

Re: [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread glyph
On 02:39 pm, [EMAIL PROTECTED] wrote: For example, implementing os.listdir to return the file names as Unicode subclasses with ability to access the underlying bytes (automatically recognized by open and friends) sounds like a good compromise that allows the word processor to both have the cake

Re: [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 11:12 AM, [EMAIL PROTECTED] wrote: The one thing it doesn't do is expose the decoding rules for the higher- level applications to deal with. I am pretty sure I don't understand how the interaction between filesystem encoding and user locale works in that case,

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl [EMAIL PROTECTED] wrote: Victor Stinner schrieb: On Windows, we might reject bytes filenames for all file operations: open(), unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) Since I've seen no objections to this yet: please no.

Re: [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread glyph
On 06:16 pm, [EMAIL PROTECTED] wrote: On Tue, Sep 30, 2008 at 11:12 AM, [EMAIL PROTECTED] wrote: The one thing it doesn't do is expose the decoding rules for the higher- level applications to deal with. I am pretty sure I don't understand how the interaction between filesystem encoding and

Re: [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread glyph
On 05:56 pm, [EMAIL PROTECTED] wrote: On Tue, Sep 30, 2008 at 10:59 AM, [EMAIL PROTECTED] wrote: On 02:32 pm, [EMAIL PROTECTED] wrote: In the absence of a 2.6 getcwdb, perhaps the fixer could just drop the benefit of the doubt case? It could always be added to 2.7, and the parity release

Re: [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Simon Cross
On Tue, Sep 30, 2008 at 7:56 PM, Guido van Rossum [EMAIL PROTECTED] wrote: (since os.getcwdb() is a Unix-only thing). I would be happier if all the Unix byte functions existed on Windows fell back to something like encoding the filenames to/from UTF-8. Then at least it would be possible for

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
Guido van Rossum wrote: On Mon, Sep 29, 2008 at 11:00 PM, Martin v. Löwis [EMAIL PROTECTED] wrote: Change the default file system encoding to store bytes in Unicode is like introducing a new Python type: fake Unicode for filename hacks. Exactly. Seems like the best solution to me, despite your

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
I didn't get an answer to my question: what is the result bytes (fake characters) stored in unicode + real unicode? I guess that the result is mixed bytes and characters in unicode instead of raising an error (invalid types). So again: why introducing a new type instead of reusing

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
Guido van Rossum wrote: However the *proposed* behavior (returns bytes if the arg was bytes, and returns str when the arg was str) is IMO sane, and no different than the polymorphism found in len() or many builtin operations. My concern still is that it brings the bytes type into the status

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
I'm not sure either way. I've heard it claim that Windows filesystem APIs use Unicode natively. Does Python 3.0 on Windows currently support filenames expressed as bytes? Yes, it does (at least, os.open, os.stat support them, builtin open doesn't). Are they encoded first before passing to

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 12:42 PM, Terry Reedy [EMAIL PROTECTED] wrote: Guido van Rossum wrote: On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl [EMAIL PROTECTED] wrote: Victor Stinner schrieb: On Windows, we might reject bytes filenames for all file operations: open(), unlink(),

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 1:04 PM, Martin v. Löwis [EMAIL PROTECTED] wrote: Guido van Rossum wrote: On Mon, Sep 29, 2008 at 11:00 PM, Martin v. Löwis [EMAIL PROTECTED] wrote: Change the default file system encoding to store bytes in Unicode is like introducing a new Python type: fake Unicode

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 1:12 PM, Terry Reedy [EMAIL PROTECTED] wrote: Terry Reedy wrote: Guido van Rossum wrote: I'm not sure either way. I've heard it claim that Windows filesystem APIs use Unicode natively. Does Python 3.0 on Windows currently support filenames expressed as bytes? Are

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 1:29 PM, Martin v. Löwis [EMAIL PROTECTED] wrote: Guido van Rossum wrote: However the *proposed* behavior (returns bytes if the arg was bytes, and returns str when the arg was str) is IMO sane, and no different than the polymorphism found in len() or many builtin

Re: [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 11:42 AM, [EMAIL PROTECTED] wrote: There are other ways to glean this knowledge; for example, looking at the 'iocharset' or 'nls' mount options supplied to mount various filesystems. I thought maybe Python (or some C library call) might be invoking some logic that did

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
On Windows, we might reject bytes filenames for all file operations: open(), unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) Since I've seen no objections to this yet: please no. If we offer a lower-level bytes filename API, it should work for all platforms.

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
Oh, ok. I had assumed Windows just uses a fixed encoding without the problem of misencoded filenames. It's the other way 'round: On Windows, Unicode file names are the natural choice, and byte strings have limitations. In a sense, Windows got it right - but then, they started later. Unix missed

Re: [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Nick Coghlan
Guido van Rossum wrote: The callback would either be an extra argument to all system calls (bad, ugly etc., and why not go with the existing unicode encoding and error flags if we're adding extra args?) or would be global, where I'd be worried that it might interfere with the proper operation

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Marcin 'Qrczak' Kowalczyk
2008/9/30 Glenn Linderman [EMAIL PROTECTED]: So the problem is that a Unicode file system interface can't deal with non-UTF-8 byte streams as file names. So it seems there are four suggested approaches, all of which have aspects that are inconvenient. Let's not forget what happens when a

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread James Y Knight
On Sep 30, 2008, at 5:40 PM, Martin v. Löwis wrote: On Windows, we might reject bytes filenames for all file operations: open(), unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) Since I've seen no objections to this yet: please no. If we offer a lower-level bytes filename

Re: [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 11:47 AM, [EMAIL PROTECTED] wrote: On 05:56 pm, [EMAIL PROTECTED] wrote: On Tue, Sep 30, 2008 at 10:59 AM, [EMAIL PROTECTED] wrote: On 02:32 pm, [EMAIL PROTECTED] wrote: In the absence of a 2.6 getcwdb, perhaps the fixer could just drop the benefit of the doubt

Re: [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Adam Olsen
On Tue, Sep 30, 2008 at 3:43 PM, Nick Coghlan [EMAIL PROTECTED] wrote: Guido van Rossum wrote: The callback would either be an extra argument to all system calls (bad, ugly etc., and why not go with the existing unicode encoding and error flags if we're adding extra args?) or would be global,

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Jack Jansen
On 30-Sep-2008, at 23:42 , Martin v. Löwis wrote: It's the other way 'round: On Windows, Unicode file names are the natural choice, and byte strings have limitations. In a sense, Windows got it right - but then, they started later. Unix missed the opportunity of declaring that all file APIs

Re: [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 12:07 PM, Simon Cross [EMAIL PROTECTED] wrote: On Tue, Sep 30, 2008 at 7:56 PM, Guido van Rossum [EMAIL PROTECTED] wrote: (since os.getcwdb() is a Unix-only thing). I would be happier if all the Unix byte functions existed on Windows fell back to something like

Re: [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Nick Coghlan
Adam Olsen wrote: On Tue, Sep 30, 2008 at 3:43 PM, Nick Coghlan [EMAIL PROTECTED] wrote: Of the suggestions I've seen so far, I like Marcin's Mono-inspired NULL-escape codec idea the best. Since these strings all come from parts of the environment where NULLs are not permitted, a simple '\0'

Re: [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 2:43 PM, Nick Coghlan [EMAIL PROTECTED] wrote: Of the suggestions I've seen so far, I like Marcin's Mono-inspired NULL-escape codec idea the best. Since these strings all come from parts of the environment where NULLs are not permitted, a simple '\0' in text check will

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
My concern still is that it brings the bytes type into the status of another character string type, which is really bad, and will require further modifications to Python for the lifetime of 3.x. I'd like to understand why this is really bad. I though it was by design that the str and bytes

Re: [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 3:18 PM, Nick Coghlan [EMAIL PROTECTED] wrote: That said, I don't think this is something we (or, more to the point, Guido) need to make a decision on right now - for 3.0, having bytes-level APIs that can see everything, and Unicode APIs that ignore badly encoded

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
Yes! If there is a byte-string access method for Windows, pretty please make it decode from UTF-8 internally and call the Unicode version of the Windows APIs. The non-unicode windows APIs are pretty much just broken -- Ideally, Python should never be calling those. I don't think we will

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
How does windows (and Python on windows) handle NFC versus NFD issues? That's left to the application. Can I have two files called ümlaut.txt, one in NFD and one NFC form? Yes, you can. It sounds confusing, but only in a theoretical way. You never have combining characters on Windows (at

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 3:21 PM, Martin v. Löwis [EMAIL PROTECTED] wrote: My concern still is that it brings the bytes type into the status of another character string type, which is really bad, and will require further modifications to Python for the lifetime of 3.x. I'd like to understand

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread James Y Knight
On Sep 30, 2008, at 6:21 PM, Martin v. Löwis wrote: IOW, Java hasn't solved the problem in the last 10 years. Java is already really bad at being a small little language to write cooperating tools in. I'd never even attempt to write a little pipeline filter in Java -- I've already pretty

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Jack Jansen
On 1-Oct-2008, at 00:32 , Martin v. Löwis wrote: How does windows (and Python on windows) handle NFC versus NFD issues? That's left to the application. Can I have two files called ümlaut.txt, one in NFD and one NFC form? Yes, you can. It sounds confusing, but only in a theoretical

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Steven D'Aprano
On Wed, 1 Oct 2008 07:40:01 am Martin v. Löwis wrote: On Windows, we might reject bytes filenames for all file operations: open(), unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) Since I've seen no objections to this yet: please no. If we offer a lower-level bytes

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 4:08 PM, Steven D'Aprano [EMAIL PROTECTED] wrote: On Wed, 1 Oct 2008 07:40:01 am Martin v. Löwis wrote: On Windows, we might reject bytes filenames for all file operations: open(), unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) Since I've seen

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Steven D'Aprano
On Wed, 1 Oct 2008 09:21:37 am you wrote: On Tue, Sep 30, 2008 at 4:08 PM, Steven D'Aprano [EMAIL PROTECTED] wrote: On Wed, 1 Oct 2008 07:40:01 am Martin v. Löwis wrote: On Windows, we might reject bytes filenames for all file operations: open(), unlink(), os.path.join(), etc. (raise a

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Michael Urman
On Tue, Sep 30, 2008 at 7:04 PM, Steven D'Aprano [EMAIL PROTECTED] wrote: I believe on disk it uses UTF-16. Which is made up of bytes. There may be byte sequences that are illegal UTF-16, but that's not what Martin said. I don't understand how there can be UTF-16 sequences which don't

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Victor Stinner
Le Wednesday 01 October 2008 00:28:22 Martin v. Löwis, vous avez écrit : I don't think we will manage to release Python 3.0 this year if that change is to be implemented. And then, I don't think the release manager will agree to such a delay. The minimum change is to disallow bytes/str mix: -

Re: [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread glyph
On 30 Sep, 09:37 pm, [EMAIL PROTECTED] wrote: On Tue, Sep 30, 2008 at 11:42 AM, [EMAIL PROTECTED] wrote: There are other ways to glean this knowledge; for example, looking at the 'iocharset' or 'nls' mount options supplied to mount various filesystems. I know we could do a better job, but

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread glyph
On 30 Sep, 09:22 pm, [EMAIL PROTECTED] wrote: On Tue, Sep 30, 2008 at 1:04 PM, Martin v. Löwis [EMAIL PROTECTED] wrote: Guido van Rossum wrote: On Mon, Sep 29, 2008 at 11:00 PM, Martin v. Löwis [EMAIL PROTECTED] wrote: Martin, I don't understand why you are in favor of storing raw bytes

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Adam Olsen
On Tue, Sep 30, 2008 at 8:06 PM, [EMAIL PROTECTED] wrote: The proposal of using U+ seems like it would have been almost the same from such a wrapper's perspective, except (A) people using the filesystem APIs without the benefit of such a wrapper would have been even more screwed, and (B)

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread James Y Knight
On Sep 30, 2008, at 10:06 PM, [EMAIL PROTECTED] wrote: However, Martin, I can promise you that I will _never_ ask for any convenience functions related to bytes as a result of this decision. I want bytes to come back from filesystem APIs because I intend to have a wrapper layer which

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Terry Reedy
Guido van Rossum wrote: No, that's because bytes is missing from the explicit list of allowable types in io.open. Victor has a one-line trivial patch for this. Could you try this though? import _fileio _fileio._FileIO(b'tem') import _fileio _fileio._FileIO(b'tem') _fileio._FileIO(3, 'r')

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread glyph
On 03:32 am, [EMAIL PROTECTED] wrote: On Sep 30, 2008, at 10:06 PM, [EMAIL PROTECTED] wrote: Can you clarify what proposal you are supporting for Python: Sure. Neither of your descriptions is terribly accurate, but I'll try to explain. 1) Two sets of APIs, one returning unicode strings,

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
Sorry, maybe I'm just being thick here, but I don't understand how that is possible. On the physical disk, each Windows file name must be represented by a byte string, yes? So how is it possible that there are Windows files with names that can't be represented as a byte string? What have

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
However, Martin, I can promise you that I will _never_ ask for any convenience functions related to bytes as a result of this decision. :-) Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org