> I've taken the liberty of explicitly CCing Martin just incase he missed
> the thread with all the noise regarding PEP383.
>
> If there are no objections from Martin
It's fine with me - I just won't have time to look into the details of
that change.
Regards,
Martin
_
Folks:
My use case (Tahoe-LAFS [1]) requires that I am *able* to read arbitrary
binary names from the filesystem and store them so that I can regenerate
the same byte string later, but it also requires that I *know* whether
what I got was a valid string in the expected encoding (which might be
utf
On 30 Apr, 2009, at 21:33, Piet van Oostrum wrote:
Ronald Oussoren (RO) wrote:
RO> For what it's worth, the OSX API's seem to behave as follows:
RO> * If you create a file with an non-UTF8 name on a HFS+
filesystem the
RO> system automaticly encodes the name.
RO> That is, open(chr(255
On Fri, 1 May 2009 06:55:48 am Thomas Breuel wrote:
> You can get the same error on Linux:
>
> $ python
> Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
> [GCC 4.3.3] on linux2
> Type "help", "copyright", "credits" or "license" for more
> information.
>
> >>> f=open(chr(255),'w')
>
> Traceb
Larry Hastings wrote:
Counting the votes for http://bugs.python.org/issue5799 :
+1 from Mark Hammond (via private mail)
+1 from Paul Moore (via the tracker)
+1 from Tim Golden (in Python-ideas, though what he literally said
was "I'm up for it")
+1 from Michael Foord
+1 from E
On Tue, Apr 28, 2009 at 8:03 PM, Larry Hastings wrote:
>
> EXECUTIVE SUMMARY
>
> I've written a patch against py3k trunk creating a new function-based
> API for creating extension types in C. This allows PyTypeObject to
> become a (mostly) private structure.
>
>
> THE PROBLEM
>
> Here's how you
Benjamin Peterson wrote:
Hi everyone!
In the interest of letting Martin implement PEP 383 for 3.1, I am
deferring the release of the 3.1 beta until next Wednesday, May 6th.
That might also give time for Larry Hastngs' UNC path patch.
(and anything else essentially ready ;-)
___
Thomas Breuel wrote:
> Not for me (I am using Python 2.6.2).
>
> >>> f = open(chr(255), 'w')
> Traceback (most recent call last):
> File "", line 1, in
> IOError: [Errno 22] invalid mode ('w') or filename: '\xff'
> >>>
>
>
> You can get the same error on Linux:
>
> $ p
James Y Knight wrote:
On Apr 30, 2009, at 5:42 AM, Martin v. Löwis wrote:
I think you are right. I have now excluded ASCII bytes from being
mapped, effectively not supporting any encodings that are not ASCII
compatible. Does that sound ok?
Yes. The practical upshot of this is that users who br
Hi everyone!
In the interest of letting Martin implement PEP 383 for 3.1, I am
deferring the release of the 3.1 beta until next Wednesday, May 6th.
Thank you,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listi
On 30 Apr 2009, at 21:06, Martin v. Löwis wrote:
How do get a printable unicode version of these path strings if
they
contain none unicode data?
Define "printable". One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
What I mean by pr
>
> Not for me (I am using Python 2.6.2).
>
> >>> f = open(chr(255), 'w')
> Traceback (most recent call last):
> File "", line 1, in
> IOError: [Errno 22] invalid mode ('w') or filename: '\xff'
> >>>
You can get the same error on Linux:
$ python
Python 2.6.2 (release26-maint, Apr 19 2009, 01:5
On Apr 30, 2009, at 5:42 AM, Martin v. Löwis wrote:
I think you are right. I have now excluded ASCII bytes from being
mapped, effectively not supporting any encodings that are not ASCII
compatible. Does that sound ok?
Yes. The practical upshot of this is that users who brokenly use
"ja_JP.SJI
Barry Scott wrote:
On 30 Apr 2009, at 05:52, Martin v. Löwis wrote:
How do get a printable unicode version of these path strings if they
contain none unicode data?
Define "printable". One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
>>> How do get a printable unicode version of these path strings if they
>>> contain none unicode data?
>>
>> Define "printable". One way would be to use a regular expression,
>> replacing all codes in a certain range with a question mark.
>
> What I mean by printable is that the string must be va
In article , Piet van Oostrum
wrote:
> > Ronald Oussoren (RO) wrote:
> >RO> For what it's worth, the OSX API's seem to behave as follows:
> >RO> * If you create a file with an non-UTF8 name on a HFS+ filesystem the
> >RO> system automaticly encodes the name.
>
> >RO> That is, open(chr(255)
Larry Hastings wrote:
Counting the votes for http://bugs.python.org/issue5799 :
+1 from Mark Hammond (via private mail)
+1 from Paul Moore (via the tracker)
+1 from Tim Golden (in Python-ideas, though what he literally said
was "I'm up for it")
+1 from Michael Foord
+1 from E
On 30 Apr 2009, at 05:52, Martin v. Löwis wrote:
How do get a printable unicode version of these path strings if they
contain none unicode data?
Define "printable". One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
What I mean by prin
> Ronald Oussoren (RO) wrote:
>RO> For what it's worth, the OSX API's seem to behave as follows:
>RO> * If you create a file with an non-UTF8 name on a HFS+ filesystem the
>RO> system automaticly encodes the name.
>RO> That is, open(chr(255), 'w') will silently create a file named '%FF'
>RO
Counting the votes for http://bugs.python.org/issue5799 :
+1 from Mark Hammond (via private mail)
+1 from Paul Moore (via the tracker)
+1 from Tim Golden (in Python-ideas, though what he literally said
was "I'm up for it")
+1 from Michael Foord
+1 from Eric Smith
There have b
On 30-Apr-09, at 7:39 AM, Guido van Rossum wrote:
FWIW, I'm in agreement with this PEP (i.e. its status is now
Accepted). Martin, you can update the PEP and start the
implementation.
+1
Kudos to Martin for seeing this through with (imo) considerable
patience and dignity.
-Mike
__
MRAB wrote:
> One further question: should the encoder accept a string like
> u'\xDCC2\xDC80'? That would encode to b'\xC2\x80'
Indeed so.
> which, when decoded, would give u'\x80'.
Assuming the encoding is UTF-8, yes.
> Does the PEP only guarantee that strings decoded
> from the filesystem are
Jared Grubb wrote:
> Ok, so if I understand, the situation is:
> * python points to 2.x version
> * python3 points to 3.x version
> * need to be able to run certain 3k scripts from cmdline (since we're
>talking about shebangs) using Python3k even though "python"
>points to 2.x
> So, if I
One further question: should the encoder accept a string like
u'\xDCC2\xDC80'? That would encode to b'\xC2\x80', which, when decoded,
would give u'\x80'. Does the PEP only guarantee that strings decoded
from the filesystem are reversible, but not check what might be de novo
strings?
__
On 2009.04.30 18:21:03 +0200, "Martin v. Löwis" wrote:
> Perhaps - the entire PEP is about Python 3 only. I don't know whether
> PyGTK already works with 3.x.
It does not. There is a bug in the Gnome tracker for it, and I believe
some work has been done to start porting PyGObject, but it appears
Cameron Simpson writes:
> On 29Apr2009 22:14, Stephen J. Turnbull wrote:
> | Baptiste Carvello writes:
> | > By contrast, if the new utf-8b codec would *supercede* the old one,
> | > \udcxx would always mean raw bytes (at least on UCS-4 builds, where
> | > surrogates are unused). Thus ambi
>> If I pass a string with an embedded U+ to gtk, gtk will truncate
>> the string, and stop rendering it at this character. This is worse than
>> what it does for invalid UTF-8 sequences. Chances are fairly high that
>> other C libraries will fail in the same way, in particular if they
>> expec
On 04:07 pm, mar...@v.loewis.de wrote:
Martin, if you're going to stick with the half-surrogate trick, would
you mind adding a section to the PEP on "alternate encoding
strategies",
explaining why the NULL method was not selected?
In the PEP process, it isn't my job to criticize competing pr
On 03:35 pm, mar...@v.loewis.de wrote:
So, why do you prefer half surrogate coding to U+ quoting?
If I pass a string with an embedded U+ to gtk, gtk will truncate
the string, and stop rendering it at this character. This is worse than
what it does for invalid UTF-8 sequences. Chances ar
> Martin, if you're going to stick with the half-surrogate trick, would
> you mind adding a section to the PEP on "alternate encoding strategies",
> explaining why the NULL method was not selected?
In the PEP process, it isn't my job to criticize competing proposals.
Instead, proponents of competi
On Thu, Apr 30, 2009 at 09:42, Thomas Breuel wrote:
> So, I don't see any reason to prefer your half surrogate quoting to the Mono
> U+-based quoting. Both seem to achieve the same goal with respect to
> round tripping file names, displaying them, etc., but Mono quoting actually
> results in
> What's an analogous failure? Or, rather, why would a failure analogous
> to the one I got when using System.IO.DirectoryInfo ever exist in
> Python?
>
>
> Mono.Unix uses an encoder and a decoder that knows about special quoting
> rules. System.IO uses a different encoder and decode
2009/4/30 Thomas Breuel :
> The analogous phenomenon will exist in Python with PEP 383. Let's say I
> have a C library with wide character interfaces and I pass it a unicode
> string from Python.(*)
[...]
> (*) There's actually a second, sutble issue. PEP 383 intends utf-8b only to
> be used for
On 02:42 pm, tmb...@gmail.com wrote:
So, why do you prefer half surrogate coding to U+ quoting?
I have also been eagerly waiting for an answer to this question. I am
afraid I have lost it somewhere in the storm of this thread :).
Martin, if you're going to stick with the half-surrogate
>
> What's an analogous failure? Or, rather, why would a failure analogous
> to the one I got when using System.IO.DirectoryInfo ever exist in
> Python?
Mono.Unix uses an encoder and a decoder that knows about special quoting
rules. System.IO uses a different encoder and decoder because it's a
r
FWIW, I'm in agreement with this PEP (i.e. its status is now
Accepted). Martin, you can update the PEP and start the
implementation.
On Thu, Apr 30, 2009 at 2:12 AM, "Martin v. Löwis" wrote:
>> Did you use a name with other characters? Were they displayed? Both
>> before and after the surrogate
>
> You can't even print them without getting an error from Python. In fact,
> you also can't print strings containing the proposed half-surrogate
> encodings either: in both cases, the output encoder rejects them with a
> UnicodeEncodeError. (If not even Python, with its generally lenient
> att
Martin v. Löwis wrote:
OK, so why not adopt the Mono solution in CPython? It seems to produce
valid unicode strings, removing at least one issue with PEP 383. It
also means that IronPython and CPython actually would be compatible.
See my other message. The Mono solution may not be what you ex
[top-posting for once to preserve full quoting]
Glenn,
Could you please reduce your suggestions into sample text for the PEP?
We seem to be now at the stage where nobody is objecting to the PEP, so
the focus should be on making the PEP clearer.
If you still want to create an alternative PEP impl
> This has nothing to do with how Mono quotes. The reason for this is
> that Mono quotes at all and that the Mono developers decided not to
> change System.IO to understand UNIX quoting.
>
> If Mono used PEP 383 quoting, this would fail the same way.
>
> And analogous failures will exist with
>
>
> > "The upshot to all this is that Mono.Unix and Mono.Unix.Native can list,
> > access, and open all files on your filesystem, regardless of encoding."
>
> I think this is misleading. With Mono 2.0.1, I get
This has nothing to do with how Mono quotes. The reason for this is that
Mono quotes
> OK, so why not adopt the Mono solution in CPython? It seems to produce
> valid unicode strings, removing at least one issue with PEP 383. It
> also means that IronPython and CPython actually would be compatible.
See my other message. The Mono solution may not be what you expect it to be.
Rega
>> Because in Python, we want to be able to access all files on disk.
>> Neither Java nor Mono are capable of doing that.
>
> Java is not capable of doing that. Mono, as I keep pointing out, is. It
> uses NULLs to escape invalid UNIX filenames. Please see:
>
> http://go-mono.com/docs/index.aspx
>
> And then it goes on to say: "You won't be able to pass non-Unicode
> filenames as command-line arguments."(*) Not only that, but you can't
> reliably use such files with System.IO (whatever that is, but it
> sounds pretty basic). This support is only available "within the
> Mono.Unix and Mono
> Why didn't you point to that discussion from the PEP 383? And why
> didn't you point to Kowalczyk's message on encodings in Mono, Java, etc.
> from the PEP?
Because I assumed that readers of the PEP would know (and I'm sure
many of them do - this has been *really* discussed over and over aga
On Thu, 30 Apr 2009 at 11:26, gl...@divmod.com wrote:
On 08:25 am, mar...@v.loewis.de wrote:
> Why did you choose an incompatible approach for PEP 383?
Because in Python, we want to be able to access all files on disk.
Neither Java nor Mono are capable of doing that.
Java is not capable of do
>
> Java is not capable of doing that. Mono, as I keep pointing out, is. It
> uses NULLs to escape invalid UNIX filenames. Please see:
>
> http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding
>
> "The upshot to all this is that Mono.Unix and Mono.Unix.Native can list,
> access, and
On 08:25 am, mar...@v.loewis.de wrote:
Why did you choose an incompatible approach for PEP 383?
Because in Python, we want to be able to access all files on disk.
Neither Java nor Mono are capable of doing that.
Java is not capable of doing that. Mono, as I keep pointing out, is.
It uses N
2009/4/30 "Martin v. Löwis" :
>> OK, so what's wrong with os.listdir() and similar functions returning a
>> unicode string for strings that correctly encode/decode, and with byte
>> strings for strings that are not valid unicode?
>
> See http://bugs.python.org/issue3187
> in particular msg71655
Ca
On Thu, Apr 30, 2009 at 12:32, "Martin v. Löwis" wrote:
> > OK, so what's wrong with os.listdir() and similar functions returning a
> > unicode string for strings that correctly encode/decode, and with byte
> > strings for strings that are not valid unicode?
>
> See http://bugs.python.org/issue31
Thomas Breuel gmail.com> writes:
>
> So, I created some ISO8859-15 and ISO8859-8 encoded file names on a device,
plugged them into my Windows Vista machine, and fired up Python 3.0.First,
os.listdir("f:") returns a list of strings for those file names... but those
unicode strings are illegal.
So
> OK, so what's wrong with os.listdir() and similar functions returning a
> unicode string for strings that correctly encode/decode, and with byte
> strings for strings that are not valid unicode?
See http://bugs.python.org/issue3187
in particular msg71655
Regards,
Martin
__
>
> > Since both have had to deal with this, have you looked at what they
> > actually do before proposing PEP 383? What did you find?
>
> See
>
> http://mail.python.org/pipermail/python-3000/2007-September/010450.html
>
Thanks, that's very useful.
> > Why did you choose an incompatible approac
> I think it has to be excluded from mapping in order to not introduce
> security issues.
I think you are right. I have now excluded ASCII bytes from being
mapped, effectively not supporting any encodings that are not ASCII
compatible. Does that sound ok?
Regards,
Martin
_
> There are several different ways I tried it. The easiest was to mount a
> vfat file system with various encodings on Linux and use the Python byte
> interface to write file names, then plug that flash drive into Windows.
So can you share precisely what you have done, to allow others to
reproduc
> Assuming people agree that this is an accurate summary, it should be
> incorporated into the PEP.
Done!
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.pyth
On Thu, Apr 30, 2009 at 10:21, "Martin v. Löwis" wrote:
> Thomas Breuel wrote:
> > Given the stated rationale of PEP 383, I was wondering what Windows
> > actually does. So, I created some ISO8859-15 and ISO8859-8 encoded file
> > names on a device, plugged them into my Windows Vista machine, an
> Did you use a name with other characters? Were they displayed? Both
> before and after the surrogates?
Yes, yes, and yes (IOW, I put the surrogate in the middle).
> Did you use one or three half surrogates, to produce the three crossed
> boxes?
Only one, and it produced three boxes - probabl
Guido found out that I had misunderstood the existing
pkg mechanism: If a "zope" package is imported, and
it uses pkgutil.extend_path, then it won't glob for files
ending in .pkg, but instead searches the path for
files named zope.pkg.
IOW, this is unsuitable as a foundation of PEP 382. I have
now
On approximately 4/30/2009 1:48 AM, came the following characters from
the keyboard of Martin v. Löwis:
I checked how GUI libraries deal with half surrogates.
In pygtk, a warning gets issued to the console
/tmp/helloworld.py:71: PangoWarning: Invalid UTF-8 string passed to
pango_layout_set_text(
I checked how GUI libraries deal with half surrogates.
In pygtk, a warning gets issued to the console
/tmp/helloworld.py:71: PangoWarning: Invalid UTF-8 string passed to
pango_layout_set_text()
self.window.show()
and then the widget contains three crossed boxes.
wxpython (in its wxgtk version)
> CPython and IronPython are incompatible. And they will stay
> incompatible if the PEP is adopted.
>
> They would become compatible if CPython adopted Mono and/or Java
> semantics.
Which one should it adopt? Mono semantics, or Java semantics?
> Since both have had to deal with this, have you
Thomas Breuel wrote:
> Given the stated rationale of PEP 383, I was wondering what Windows
> actually does. So, I created some ISO8859-15 and ISO8859-8 encoded file
> names on a device, plugged them into my Windows Vista machine, and fired
> up Python 3.0.
How did you do that, and what were the s
On approximately 4/29/2009 7:50 PM, came the following characters from
the keyboard of Aahz:
On Thu, Apr 30, 2009, Cameron Simpson wrote:
The lengthy discussion mostly revolves around:
- Glenn points out that strings that came _not_ from listdir, and that are
_not_ well-formed unicode (==
On approximately 4/29/2009 10:17 PM, came the following characters from
the keyboard of Martin v. Löwis:
I don't understand the proposal and issues. I see a lot of people
claiming that they do, and then spending all their time either
talking past each other, or disagreeing. If everyone who claim
On approximately 4/29/2009 8:46 PM, came the following characters from
the keyboard of Terry Reedy:
Glenn Linderman wrote:
On approximately 4/29/2009 1:28 PM, came the following characters from
So where is the ambiguity here?
None. But not everyone can read all the Python source code to tr
> > Yes. Now think about the implications. This means that adopting PEP
> > 383 will make IronPython and Jython running on UNIX intrinsically
> > incompatible with CPython running on UNIX, and there's no way to fix
> that.
>
> *Not* adapting the PEP will also make CPython and IronPython
> incompa
Given the stated rationale of PEP 383, I was wondering what Windows actually
does. So, I created some ISO8859-15 and ISO8859-8 encoded file names on a
device, plugged them into my Windows Vista machine, and fired up Python 3.0.
First, os.listdir("f:") returns a list of strings for those file name
68 matches
Mail list logo