On Mon, Sep 29, 2008 at 6:07 AM, Victor Stinner
<[EMAIL PROTECTED]> wrote:
> The default behaviour should be to use unicode and raise an error if
> conversion to unicode fails. It should also be possible to use bytes using
> bytes arguments and optional arguments (for getcwd).
>
> - listdir(unicod
Le Monday 29 September 2008 17:16:47 Steven Bethard, vous avez écrit :
> > - getcwd() -> unicode
> > - getcwd(bytes=True) -> bytes
>
> Please let's not introduce boolean flags like this. How about
> ``getcwdb`` in parallel with the old ``getcwdu``?
Yeah, you're right. So i wrote a new patch: os_
> Victor Stinner schrieb:
(Thanks Victor for moving this to the list. Having a discussion in the
tracker is really painful, I find.)
>> POSIX OS
>>
>>
>> The default behaviour should be to use unicode and raise an error if
>> conversion to unicode fails. It should also be possible to use
On Mon, Sep 29, 2008 at 10:00 AM, Victor Stinner
<[EMAIL PROTECTED]> wrote:
> Le Monday 29 September 2008 17:16:47 Steven Bethard, vous avez écrit :
>> > - getcwd() -> unicode
>> > - getcwd(bytes=True) -> bytes
>>
>> Please let's not introduce boolean flags like this. How about
>> ``getcwdb`` in
On Mon, Sep 29, 2008 at 11:06 AM, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On Mon, Sep 29, 2008 at 9:45 AM, Georg Brandl <[EMAIL PROTECTED]> wrote:
>
>> This approach (changing all path-handling functions to accept either bytes
>> or string, but not both) is doomed in my eyes. First, there are
On Sep 29, 2008, at 6:17 PM, Adam Olsen wrote:
I suspect linux will eventually take this route as well. If ext3 had
an option for UTF-8 validation I know I'd want it on. That'd move the
error to the program creating bogus file names, rather than those
trying to read, display, and manage them.
Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit :
> >> - listdir(unicode) -> unicode and raise an error on invalid filename
>
> I know I keep flipflopping on this one, but the more I think about it
> the more I believe it is better to drop those names than to raise an
> exce
On Mon, Sep 29, 2008 at 4:29 PM, Victor Stinner
<[EMAIL PROTECTED]> wrote:
> Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit :
>> >> - listdir(unicode) -> unicode and raise an error on invalid filename
>>
>> I know I keep flipflopping on this one, but the more I think about
Le Monday 29 September 2008 18:45:28 Georg Brandl, vous avez écrit :
> If I had to choose, I'd still argue for the modified UTF-8 as filesystem
> encoding (if it were UTF-8 otherwise), despite possible surprises when a
> such-encoded filename escapes from Python.
If I understand correctly this sol
> import os
> import os.path
> import sys
> if os.path.supports_unicode_filenames:
> cwd = getcwd()
> else:
> cwd = getcwdb()
> encoding = sys.getfilesystemencoding()
> for filename in os.listdir(cwd):
> if os.path.supports_unicode_filenames:
> text = str(fil
On Mon, Sep 29, 2008 at 5:29 PM, Victor Stinner
<[EMAIL PROTECTED]> wrote:
> Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit :
>> >> - listdir(unicode) -> unicode and raise an error on invalid filename
>>
>> I know I keep flipflopping on this one, but the more I think about
Guido van Rossum writes:
> On Mon, Sep 29, 2008 at 4:29 PM, Victor Stinner
> <[EMAIL PROTECTED]> wrote:
> > It would be hard for a newbie programmer to understand why he's
> > unable to find his very important file ("important r?port.doc")
> > using os.listdir().
> *Every* failure in this s
Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit :
I know I keep flipflopping on this one, but the more I think about it
the more I believe it is better to drop those names than to raise an
exception. Otherwise a "naive" program that happens to use
os.listdir() can be re
> Change the default file system encoding to store bytes in Unicode is like
> introducing a new Python type: .
Exactly. Seems like the best solution to me, despite your polemics.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http:/
On Tue, Sep 30, 2008 at 12:22 AM, Georg Brandl <[EMAIL PROTECTED]> wrote:
> Victor Stinner schrieb:
>> Le Monday 29 September 2008 18:45:28 Georg Brandl, vous avez écrit :
>>> If I had to choose, I'd still argue for the modified UTF-8 as filesystem
>>> encoding (if it were UTF-8 otherwise), despite
On 2008-09-30 08:00, Martin v. Löwis wrote:
>> Change the default file system encoding to store bytes in Unicode is like
>> introducing a new Python type: .
>
> Exactly. Seems like the best solution to me, despite your polemics.
Not a bad idea... have os.listdir() return Unicode subclasses that
Adam Olsen writes:
> [1] You could argue that Unicode should add new scalars to handle all
> currently invalid UTF-8 sequences.
AFAIK there are about 2^31 of these, though!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mail
On Tue, Sep 30, 2008 at 5:24 AM, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> Adam Olsen writes:
>
> > [1] You could argue that Unicode should add new scalars to handle all
> > currently invalid UTF-8 sequences.
>
> AFAIK there are about 2^31 of these, though!
They've promised to never alloc
On Mon, Sep 29, 2008 at 8:55 PM, Terry Reedy <[EMAIL PROTECTED]> wrote:
>
>> Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit :
>
>>> I know I keep flipflopping on this one, but the more I think about it
>>> the more I believe it is better to drop those names than to raise an
On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
>> Change the default file system encoding to store bytes in Unicode is like
>> introducing a new Python type: .
>
> Exactly. Seems like the best solution to me, despite your polemics.
Martin, I don't understand why you
On Mon, Sep 29, 2008 at 11:22 PM, Georg Brandl <[EMAIL PROTECTED]> wrote:
> No, that was not what I meant (although it is another possibility). As I
> wrote,
> Martin's proposal that I support here is using the modified UTF-8 codec that
> successfully roundtrips otherwise invalid UTF-8 data.
I th
On Tue, Sep 30, 2008 at 3:31 AM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> On 2008-09-30 08:00, Martin v. Löwis wrote:
>>> Change the default file system encoding to store bytes in Unicode is like
>>> introducing a new Python type: .
>>
>> Exactly. Seems like the best solution to me, despite your
Le Tuesday 30 September 2008 15:53:09 Guido van Rossum, vous avez écrit :
> On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. Löwis" <[EMAIL PROTECTED]>
wrote:
> >> Change the default file system encoding to store bytes in Unicode is
> >> like introducing a new Python type: .
> >
> > Exactly. Seems lik
On Tue, 30 Sep 2008 11:50:10 pm Guido van Rossum wrote:
> > To avoid silent skipping, is it possible to drop 'unreadable'
> > names, issue a warning (instead of exception), and continue to
> > completion? "Warning: unreadable filename skipped; see
> > PyWiki/UnreadableFilenames"
>
> That would be
On 2008-09-30 16:05, Guido van Rossum wrote:
> On Tue, Sep 30, 2008 at 3:31 AM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
>> On 2008-09-30 08:00, Martin v. Löwis wrote:
Change the default file system encoding to store bytes in Unicode is like
introducing a new Python type: .
>>> Exactly. S
On Tue, Sep 30, 2008 at 7:53 AM, Steven D'Aprano <[EMAIL PROTECTED]> wrote:
> On Tue, 30 Sep 2008 11:50:10 pm Guido van Rossum wrote:
>
>> > To avoid silent skipping, is it possible to drop 'unreadable'
>> > names, issue a warning (instead of exception), and continue to
>> > completion? "Warning: u
On Tue, Sep 30, 2008 at 8:20 AM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> In the end, I think it's better not to be clever and just return
> the filenames that cannot be decoded as bytes objects in os.listdir().
Unfortunately that's going to break most code that is using
os.listdir(), so it's ha
Guido van Rossum schrieb:
>> With the filenames decoded by UTF-8, your files named têste, ô, dossié will
>> be displayed and handled correctly. The others are *invalid* in the
>> filesystem
>> encoding UTF-8 and therefore would be represented by something like
>>
>> u'dir\uXXffname' where XX is s
Steven D'Aprano schrieb:
> On Tue, 30 Sep 2008 11:50:10 pm Guido van Rossum wrote:
>
>> > To avoid silent skipping, is it possible to drop 'unreadable'
>> > names, issue a warning (instead of exception), and continue to
>> > completion? "Warning: unreadable filename skipped; see
>> > PyWiki/Unread
On Tue, Sep 30, 2008 at 10:28 AM, Georg Brandl <[EMAIL PROTECTED]> wrote:
>> How can it *regularly* drive you crazy when "the majority of fie names
>> [...] encoded correctly" (as you assert above)?
>
> Because Office files are a) often named with long, seemingly descriptive
> filenames, which inva
On 2008-09-30 18:46, Guido van Rossum wrote:
> On Tue, Sep 30, 2008 at 8:20 AM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
>> In the end, I think it's better not to be clever and just return
>> the filenames that cannot be decoded as bytes objects in os.listdir().
>
> Unfortunately that's going to b
Guido van Rossum schrieb:
> On Tue, Sep 30, 2008 at 10:28 AM, Georg Brandl <[EMAIL PROTECTED]> wrote:
>>> How can it *regularly* drive you crazy when "the majority of fie names
>>> [...] encoded correctly" (as you assert above)?
>>
>> Because Office files are a) often named with long, seemingly des
On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl <[EMAIL PROTECTED]> wrote:
> Victor Stinner schrieb:
>> On Windows, we might reject bytes filenames for all file operations: open(),
>> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)
>
> Since I've seen no objections to this yet: pl
Guido van Rossum wrote:
> On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
>>> Change the default file system encoding to store bytes in Unicode is like
>>> introducing a new Python type: .
>> Exactly. Seems like the best solution to me, despite your polemics.
>
> Mar
> I didn't get an answer to my question: what is the result characters) stored in unicode> + ? I guess that the result is
> instead of raising an error
> (invalid types). So again: why introducing a new type instead of reusing
> existing Python types?
I didn't mean to introduce a new data typ
Guido van Rossum wrote:
> However
> the *proposed* behavior (returns bytes if the arg was bytes, and
> returns str when the arg was str) is IMO sane, and no different than
> the polymorphism found in len() or many builtin operations.
My concern still is that it brings the bytes type into the statu
> I'm not sure either way. I've heard it claim that Windows filesystem
> APIs use Unicode natively. Does Python 3.0 on Windows currently
> support filenames expressed as bytes?
Yes, it does (at least, os.open, os.stat support them, builtin open
doesn't).
> Are they encoded first before
> passing
On Tue, Sep 30, 2008 at 12:42 PM, Terry Reedy <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
>>
>> On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl <[EMAIL PROTECTED]> wrote:
>>>
>>> Victor Stinner schrieb:
On Windows, we might reject bytes filenames for all file operations:
open
On Tue, Sep 30, 2008 at 1:04 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
>> On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. Löwis" <[EMAIL PROTECTED]>
>> wrote:
Change the default file system encoding to store bytes in Unicode is like
introducing a new Python
On Tue, Sep 30, 2008 at 1:12 PM, Terry Reedy <[EMAIL PROTECTED]> wrote:
> Terry Reedy wrote:
>>
>> Guido van Rossum wrote:
>
>>> I'm not sure either way. I've heard it claim that Windows filesystem
>>> APIs use Unicode natively. Does Python 3.0 on Windows currently
>>> support filenames expressed a
On Tue, Sep 30, 2008 at 1:29 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
>> However
>> the *proposed* behavior (returns bytes if the arg was bytes, and
>> returns str when the arg was str) is IMO sane, and no different than
>> the polymorphism found in len() or many b
>> On Windows, we might reject bytes filenames for all file operations: open(),
>> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)
>
> Since I've seen no objections to this yet: please no. If we offer a
> "lower-level" bytes filename API, it should work for all platforms.
Unfo
> Oh, ok. I had assumed Windows just uses a fixed encoding without the problem
> of misencoded filenames.
It's the other way 'round: On Windows, Unicode file names are the
natural choice, and byte strings have limitations. In a sense, Windows
got it right - but then, they started later. Unix misse
2008/9/30 Glenn Linderman <[EMAIL PROTECTED]>:
> So the problem is that a Unicode file system interface can't deal with
> non-UTF-8 byte streams as file names.
>
> So it seems there are four suggested approaches, all of which have aspects
> that are inconvenient.
Let's not forget what happens whe
On Sep 30, 2008, at 5:40 PM, Martin v. Löwis wrote:
On Windows, we might reject bytes filenames for all file
operations: open(),
unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)
Since I've seen no objections to this yet: please no. If we offer a
"lower-level" bytes filename
On 30-Sep-2008, at 23:42 , Martin v. Löwis wrote:
It's the other way 'round: On Windows, Unicode file names are the
natural choice, and byte strings have limitations. In a sense, Windows
got it right - but then, they started later. Unix missed the
opportunity
of declaring that all file APIs
>> My concern still is that it brings the bytes type into the status of
>> another character string type, which is really bad, and will require
>> further modifications to Python for the lifetime of 3.x.
>
> I'd like to understand why this is "really bad". I though it was by
> design that the str
> Yes! If there is a byte-string access method for Windows, pretty please
> make it decode from UTF-8 internally and call the Unicode version of the
> Windows APIs. The non-unicode windows APIs are pretty much just broken
> -- Ideally, Python should never be calling those.
I don't think we will ma
> How does windows (and Python on windows) handle NFC versus NFD issues?
That's left to the application.
> Can I have two files called "ümlaut.txt", one in NFD and one NFC form?
Yes, you can. It sounds confusing, but only in a theoretical way. You
never have combining characters on Windows (at
On Tue, Sep 30, 2008 at 3:21 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
>>> My concern still is that it brings the bytes type into the status of
>>> another character string type, which is really bad, and will require
>>> further modifications to Python for the lifetime of 3.x.
>>
>> I'd like
On Sep 30, 2008, at 6:21 PM, Martin v. Löwis wrote:
IOW, Java hasn't solved the problem in the last 10 years.
Java is already really bad at being a small little language to write
cooperating tools in. I'd never even attempt to write a little
pipeline filter in Java -- I've already pretty mu
On 1-Oct-2008, at 00:32 , Martin v. Löwis wrote:
How does windows (and Python on windows) handle NFC versus NFD
issues?
That's left to the application.
Can I have two files called "ümlaut.txt", one in NFD and one NFC
form?
Yes, you can. It sounds confusing, but only in a theoretical
On Wed, 1 Oct 2008 07:40:01 am Martin v. Löwis wrote:
> >> On Windows, we might reject bytes filenames for all file
> >> operations: open(), unlink(), os.path.join(), etc. (raise a
> >> TypeError or UnicodeError)
> >
> > Since I've seen no objections to this yet: please no. If we offer a
> > "lower
On Tue, Sep 30, 2008 at 4:08 PM, Steven D'Aprano <[EMAIL PROTECTED]> wrote:
> On Wed, 1 Oct 2008 07:40:01 am Martin v. Löwis wrote:
>> >> On Windows, we might reject bytes filenames for all file
>> >> operations: open(), unlink(), os.path.join(), etc. (raise a
>> >> TypeError or UnicodeError)
>> >
On Wed, 1 Oct 2008 09:21:37 am you wrote:
> On Tue, Sep 30, 2008 at 4:08 PM, Steven D'Aprano <[EMAIL PROTECTED]>
wrote:
> > On Wed, 1 Oct 2008 07:40:01 am Martin v. Löwis wrote:
> >> >> On Windows, we might reject bytes filenames for all file
> >> >> operations: open(), unlink(), os.path.join(), e
On Tue, Sep 30, 2008 at 7:04 PM, Steven D'Aprano <[EMAIL PROTECTED]> wrote:
>> I believe on disk it uses UTF-16.
>
> Which is made up of bytes. There may be byte sequences that are illegal
> UTF-16, but that's not what Martin said. I don't understand how there
> can be UTF-16 sequences which don't
Le Wednesday 01 October 2008 00:28:22 Martin v. Löwis, vous avez écrit :
> I don't think we will manage to release Python 3.0 this year if that
> change is to be implemented. And then, I don't think the release manager
> will agree to such a delay.
The minimum change is to disallow bytes/str mix:
M.-A. Lemburg wrote:
In the end, I think it's better not to be clever and just return
the filenames that cannot be decoded as bytes objects in os.listdir().
But since it's a rare occurrence, most applications are
just going to ignore the issue, and then fail unexpectedly
one day on some unsuspe
On 30 Sep, 09:22 pm, [EMAIL PROTECTED] wrote:
On Tue, Sep 30, 2008 at 1:04 PM, "Martin v. Löwis" <[EMAIL PROTECTED]>
wrote:
Guido van Rossum wrote:
On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. Löwis"
<[EMAIL PROTECTED]> wrote:
Martin, I don't understand why you are in favor of storing raw by
On Tue, Sep 30, 2008 at 8:06 PM, <[EMAIL PROTECTED]> wrote:
> The proposal of using U+ seems like it would have been almost the same
> from such a wrapper's perspective, except (A) people using the filesystem
> APIs without the benefit of such a wrapper would have been even more
> screwed, and
On Sep 30, 2008, at 10:06 PM, [EMAIL PROTECTED] wrote:
However, Martin, I can promise you that I will _never_ ask for any
convenience functions related to bytes as a result of this
decision. I want bytes to come back from filesystem APIs because I
intend to have a wrapper layer which knows
Guido van Rossum wrote:
No, that's because bytes is missing from the explicit list of
allowable types in io.open. Victor has a one-line trivial patch for
this. Could you try this though?
import _fileio
_fileio._FileIO(b'tem')
>>> import _fileio
>>> _fileio._FileIO(b'tem')
_fileio._FileIO(3,
On 03:32 am, [EMAIL PROTECTED] wrote:
On Sep 30, 2008, at 10:06 PM, [EMAIL PROTECTED] wrote:
Can you clarify what proposal you are supporting for Python:
Sure. Neither of your descriptions is terribly accurate, but I'll try
to explain.
1) Two sets of APIs, one returning unicode strings, an
> Sorry, maybe I'm just being thick here, but I don't understand how that
> is possible. On the physical disk, each Windows file name must be
> represented by a byte string, yes? So how is it possible that there are
> Windows files with names that can't be represented as a byte string?
> What h
> However, Martin, I can promise you that I will _never_ ask for any
> convenience functions related to bytes as a result of this decision.
:-)
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/pyt
On Tuesday 30 September 2008, M.-A. Lemburg wrote:
> On 2008-09-30 08:00, Martin v. Löwis wrote:
> >> Change the default file system encoding to store bytes in Unicode is
> >> like introducing a new Python type: .
> >
> > Exactly. Seems like the best solution to me, despite your polemics.
>
> Not a
On 2008-10-01 09:54, Ulrich Eckhardt wrote:
> On Tuesday 30 September 2008, M.-A. Lemburg wrote:
>> On 2008-09-30 08:00, Martin v. Löwis wrote:
Change the default file system encoding to store bytes in Unicode is
like introducing a new Python type: .
>>> Exactly. Seems like the best solut
[EMAIL PROTECTED] wrote:
> The reasoning is that a lot of software doesn't care if it's wrong for
> edge cases, it's really hard to come up with something that's correct
> with respect to all of those edge cases (absurdly difficult, if you need
> to stay in the straightjacket of string / bytes type
M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> On 2008-10-01 09:54, Ulrich Eckhardt wrote:
> > On Tuesday 30 September 2008, M.-A. Lemburg wrote:
> >> On 2008-09-30 08:00, Martin v. Löwis wrote:
> Change the default file system encoding to store bytes in Unicode is
> like introducing a new Py
On 03:54 pm, [EMAIL PROTECTED] wrote:
I'm actually sort of liking this idea. A Pathname class, for
convenience
a subtype of String, but containing the underlying binary
representation
used by the OS. Even non-unicode pathnames could be represented.
On the one hand, I agree with you - excep
[EMAIL PROTECTED] wrote:
> > I'm actually sort of liking this idea. A Pathname class, for
> > convenience
> > a subtype of String, but containing the underlying binary
> > representation
> >used by the OS. Even non-unicode pathnames could be represented.
>
> On the one hand, I agree with you -
>> SQLite has a similar problem with NULLs, and I'm definitely sticking
>> paths in there, too.
>
> I think that you can say "all C libraries".
Just for the sake of nit-picking: the socket library, and the regular
POSIX stream IO library (as well as C standard "unformatted" IO) deal
just fine wit
Bill Janssen wrote:
> Perhaps PEP 355 just went too far.
That was certainly one of the major objections to it. A filesystem path
object which didn't try to combine a half-dozen different modules into
methods on a single object, but instead focused on solving a few
specific problems with using raw
73 matches
Mail list logo