On approximately 4/25/2009 5:35 AM, came the following characters from
the keyboard of Martin v. Löwis:
Because the encoding is not reliably reversible.
Why do you say that? The encoding is completely reversible
(unless we disagree on what reversible means).
I'm +1 on the concept, -1 on the
On approximately 4/25/2009 5:22 AM, came the following characters from
the keyboard of Martin v. Löwis:
The problem with this, and other preceding schemes that have been
discussed here, is that there is no means of ascertaining whether a
particular file name str was obtained from a str API, or
On 26Apr2009 23:39, Glenn Linderman v+pyt...@g.nevcal.com wrote:
[...snip...]
There are still issues regarding how Windows and POSIX programs that are
sharing cross-mounted file systems might communicate file names between
each other, which is not at all clear from the PEP. If this is an
On approximately 4/27/2009 12:55 AM, came the following characters from
the keyboard of Cameron Simpson:
On 26Apr2009 23:39, Glenn Linderman v+pyt...@g.nevcal.com wrote:
[...snip...]
There are still issues regarding how Windows and POSIX programs that are
sharing cross-mounted file systems
Stephen J. Turnbull stephen at xemacs.org writes:
If
you see a broken encoding once, you're likely to see it a million times
(spammers have the most broken software) or maybe have it raise an
unhandled Exception a dozen times (in rate of using busted software,
the spammers are closely
Hello,
Antoine Pitrou solip...@pitrou.net writes:
Hello,
We're in the process of forward-porting the recent (massive) json
updates to 3.1, and we are also thinking of dropping remnants of
support of the bytes type in the json library (in 3.1, again). This
bytes support almost didn't work
I couldn't figure out a way to get rid of it short of multi-#including
templates and playing with the C preprocessor, however, and have the
nagging feeling the latter would be frowned upon by the maintainers.
Not sure if this is exactly what you mean, but look at Objects/stringlib.
On Mon, Apr 27, 2009 at 7:25 AM, Damien Diederen d...@crosstwine.com wrote:
Antoine Pitrou solip...@pitrou.net writes:
Hello,
We're in the process of forward-porting the recent (massive) json
updates to 3.1, and we are also thinking of dropping remnants of
support of the bytes type in the
Damien Diederen dd at crosstwine.com writes:
I couldn't figure out a way to get rid of it short of multi-#including
templates and playing with the C preprocessor, however, and have the
nagging feeling the latter would be frowned upon by the maintainers.
There is a precedent with
Antoine Pitrou writes:
I'm not sure how mail being stuck in a pipeline has anything to do
with Martin's proposal (which deals with file paths, not with
SMTP...).
I hate to break it to you, but most stages of mail processing have
very little to do with SMTP. In particular, processing MIME
2009/4/27 Stephen J. Turnbull step...@xemacs.org:
I believe there are solutions that don't have that problem.
Specifically, if the return values were bytes, or (better for 2.x,
where bytes are strings as far as most programmers are concerned) as a
new data type, to indicate that they're not
On Mon, Apr 27, 2009, Antoine Pitrou wrote:
Stephen J. Turnbull stephen at xemacs.org writes:
If
you see a broken encoding once, you're likely to see it a million times
(spammers have the most broken software) or maybe have it raise an
unhandled Exception a dozen times (in rate of using
Stephen J. Turnbull stephen at xemacs.org writes:
I hate to break it to you, but most stages of mail processing have
very little to do with SMTP. In particular, processing MIME
attachments often requires dealing with file names.
AFAIK, the file name is only there as an indication for the
Hi Antoine,
Antoine Pitrou solip...@pitrou.net writes:
Damien Diederen dd at crosstwine.com writes:
I couldn't figure out a way to get rid of it short of multi-#including
templates and playing with the C preprocessor, however, and have the
nagging feeling the latter would be frowned upon by
I went to upgrade a Vista machine from 2.6.1 to 2.6.2 and got error 2755
with the message system cannot open the device or file.
I uninstalled 2.6.1, removing all residual files also, and got the error
message again.
When I ran msiexec as follows to get a log, it magically worked:
msiexec
Paul Moore writes:
2009/4/27 Stephen J. Turnbull step...@xemacs.org:
I believe there are solutions that don't have that problem.
Specifically, if the return values were bytes, or (better for 2.x,
where bytes are strings as far as most programmers are concerned) as a
new data type, to
Antoine Pitrou writes:
or (better for 2.x, where bytes are strings as far as most
programmers are concerned) as a new data type,
I'm -1 on any new string-like type (for file paths or whatever
else) with custom encoding/decoding semantics. It's the best way to
ruin the clean
At 23:39 -0700 04/26/2009, Glenn Linderman wrote:
On approximately 4/25/2009 5:35 AM, came the following characters from
the keyboard of Martin v. Löwis:
Because the encoding is not reliably reversible.
Why do you say that? The encoding is completely reversible
(unless we disagree on what
At 16:09 + 04/27/2009, Antoine Pitrou wrote:
Stephen J. Turnbull stephen at xemacs.org writes:
I hate to break it to you, but most stages of mail processing have
very little to do with SMTP. In particular, processing MIME
attachments often requires dealing with file names.
AFAIK, the
Stephen J. Turnbull stephen at xemacs.org writes:
Excuse me, but I can't see a scheme that encodes bytes as Unicodes but
only sometimes as a clean separation.
Yet it is. Filenames are all unicode, without exception, and there's no implicit
conversion to bytes. That's a clean separation.
So
-On [20090414 16:43], Antoine Pitrou (solip...@pitrou.net) wrote:
If you have some time on your hands, you could try benchmarking it against
Python 3.1's (py3k) decoder. There are two cases to consider:
Bjoern actually did it himself already:
Jeroen Ruigrok van der Werven asmodai at in-nomine.org writes:
So on medium and large datasets the decoder of Bjoern is very interesting,
but the tiny case (just Bjoern's name) is quite a tad bit slower. The other
cases seems more typical of what the average use in Python would be.
Keep in
It's a private use area. It will never carry an official character
assignment.
I know that U+F - U+F is a private use area. I don't find a
definition of U+F01xx to know what the notation means. Are you picking
a particular character within the private use area, or a particular
There are still issues regarding how Windows and POSIX programs that
are sharing cross-mounted file systems might communicate file names
between each other, which is not at all clear from the PEP. If this
is an insoluble or un-addressed issue, it should be stated. (It is
probably
Jim Kleckner wrote:
I went to upgrade a Vista machine from 2.6.1 to 2.6.2 and got error 2755
with the message system cannot open the device or file.
I uninstalled 2.6.1, removing all residual files also, and got the error
message again.
When I ran msiexec as follows to get a log, it
On 27Apr2009 00:07, Glenn Linderman v+pyt...@g.nevcal.com wrote:
On approximately 4/25/2009 5:22 AM, came the following characters from
the keyboard of Martin v. Löwis:
The problem with this, and other preceding schemes that have been
discussed here, is that there is no means of ascertaining
On Mon, Apr 27, 2009 at 9:48 PM, Martin v. Löwis mar...@v.loewis.de wrote:
As Cameron says: it's out of the scope of the PEP. It really depends how
the operating system deals with them. Most likely, the files are not
accessible - not only not from Python, but also not accessible from
any other
$ touch $'\xFF\xAA\xFF'
$ vi $'\xFF\xAA\xFF'
$ egrep foo $'\xFF\xAA\xFF'
All worked fine from my Bash shell with locale encoding set to UTF-8.
I can also open the created file from the GNOME editor file dialog (it
even tells me the filename is not valid in my locale's encoding). The
Nedit
Simon Cross hodgestar+pythondev at gmail.com writes:
$ touch $'\xFF\xAA\xFF'
$ vi $'\xFF\xAA\xFF'
$ egrep foo $'\xFF\xAA\xFF'
All worked fine from my Bash shell with locale encoding set to UTF-8.
The PEP is precisely about making py3k able to better handle these files (right
now
Stephen J. Turnbull wrote:
Antoine Pitrou writes:
or (better for 2.x, where bytes are strings as far as most
programmers are concerned) as a new data type,
I'm -1 on any new string-like type (for file paths or whatever
else) with custom encoding/decoding semantics. It's the best
On approximately 4/27/2009 12:42 PM, came the following characters from
the keyboard of Martin v. Löwis:
It's a private use area. It will never carry an official character
assignment.
I know that U+F - U+F is a private use area. I don't find a
definition of U+F01xx to know what the
On approximately 4/27/2009 12:48 PM, came the following characters from
the keyboard of Martin v. Löwis:
There are still issues regarding how Windows and POSIX programs that
are sharing cross-mounted file systems might communicate file names
between each other, which is not at all clear from
On Tue, 28 Apr 2009 04:13:47 am Antoine Pitrou wrote:
Stephen J. Turnbull stephen at xemacs.org writes:
...
So what you'll get here, AFAICS, is a new situation where many
Windows-centric programmers will produce code that's incapable of
dealing with non-Unicode input because they don't have
On 27Apr2009 23:27, Simon Cross hodgestar+python...@gmail.com wrote:
| On Mon, Apr 27, 2009 at 9:48 PM, Martin v. Löwis mar...@v.loewis.de wrote:
| As Cameron says: it's out of the scope of the PEP. It really depends how
| the operating system deals with them. Most likely, the files are not
|
2009/4/27 Cameron Simpson c...@zip.com.au:
I think that, almost independent of this PEP, there should be an
os.fsencode() function that takes a byte string (as a POSIX OS call
will take) and performs the _same_ byte-string encoding that listdir()
and friends are doing under the hood. And a
On approximately 4/27/2009 2:14 PM, came the following characters from
the keyboard of Cameron Simpson:
On 27Apr2009 00:07, Glenn Linderman v+pyt...@g.nevcal.com wrote:
On approximately 4/25/2009 5:22 AM, came the following characters from
the keyboard of Martin v. Löwis:
The problem
On 27Apr2009 21:48, Martin v. L�wis mar...@v.loewis.de wrote:
| There are still issues regarding how Windows and POSIX programs that
| are sharing cross-mounted file systems might communicate file names
| between each other, which is not at all clear from the PEP. If this
| is an insoluble
On approximately 4/27/2009 5:42 PM, came the following characters from
the keyboard of Cameron Simpson:
I think that, almost independent of this PEP, there should be an
os.fsencode() function that takes a byte string (as a POSIX OS call
will take) and performs the _same_ byte-string encoding
On 27Apr2009 18:15, Glenn Linderman v+pyt...@g.nevcal.com wrote:
The problem with this, and other preceding schemes that have been
discussed here, is that there is no means of ascertaining whether a
particular file name str was obtained from a str API, or was funny-
decoded from a bytes API...
2009/4/27 Cameron Simpson c...@zip.com.au:
PROPOSAL: add to the PEP the following functions:
os.fsdecode(bytes) - funny-encoded Unicode
This is what os.listdir() does to produce the strings it hands out.
os.fsencode(funny-string) - bytes
This is what open(filename,..) does to turn
Glenn Linderman wrote:
On approximately 4/27/2009 12:42 PM, came the following characters from
the keyboard of Martin v. Löwis:
It's a private use area. It will never carry an official character
assignment.
I know that U+F - U+F is a private use area. I don't find a
definition of
I'm not suggesting the PEP should solve the problem of mounting foreign
file systems, although if it doesn't it should probably point that out.
I'm just suggesting that if the people that write software to solve the
problem of mounting foreign file systems have already solved the naming
On 27Apr2009 21:58, Benjamin Peterson benja...@python.org wrote:
| 2009/4/27 Cameron Simpson c...@zip.com.au:
| PROPOSAL: add to the PEP the following functions:
[...]
| and for me, I would like to see:
| os.setfilesystemencoding(coding)
|
| Currently os.getfilesystemencoding() returns you
I don't understand what you're saying. py3k filenames are all
unicode, even on POSIX systems,
How is that possible on POSIX systems where the underlying file system
uses bytes for filenames?
If I write a piece of Python code:
filename = 'some path/some name'
I might call it
Michael Foord writes:
The problem you don't address, which is still the reality for most
programmers (especially Mac OS X where filesystem encoding is UTF 8), is
that programmers *are* going to treat filenames as strings.
The proposed PEP allows that to work for them - whatever
Tony Nelson writes:
At 16:09 + 04/27/2009, Antoine Pitrou wrote:
Stephen J. Turnbull stephen at xemacs.org writes:
I hate to break it to you, but most stages of mail processing have
very little to do with SMTP. In particular, processing MIME
attachments often requires dealing
On Apr 27, 2009, at 11:35 PM, Martin v. Löwis wrote:
No. You seem to assume that all bytes 128 decode successfully
always.
I believe this assumption is wrong, in general:
py \x1b$B' \x1b(B.decode(iso-2022-jp) #2.x syntax
Traceback (most recent call last):
File stdin, line 1, in module
On approximately 4/27/2009 8:35 PM, came the following characters from
the keyboard of Martin v. Löwis:
Glenn Linderman wrote:
On approximately 4/27/2009 12:42 PM, came the following characters from
the keyboard of Martin v. Löwis:
It's a private use area. It will never carry an official
On Mon, 2009-04-27 at 22:25 -0700, Glenn Linderman wrote:
Indeed, that was the missing piece. I'd forgotten about the
encodings
that use escape sequences, rather than UTF-8, and DBCS. I don't
think
those encodings are permitted by POSIX file systems, but I suppose
they
could sneak in
On approximately 4/27/2009 8:39 PM, came the following characters from
the keyboard of Martin v. Löwis:
I'm not suggesting the PEP should solve the problem of mounting foreign
file systems, although if it doesn't it should probably point that out.
I'm just suggesting that if the people that
50 matches
Mail list logo