> PEP-383 attempts to represent non-UTF-8 byte sequences in Unicode
> strings in a reversible way.
That isn't really true; it is not, inherently, about UTF-8.
Instead, it tries to represent non-filesystem-encoding byte sequence
in Unicode strings in a reversible way.
> Quietly escaping a bad UTF-
> Does the PEP take into consideration the normalising behaviour of Mac
> OSX ? We've had some ongoing challenges in bzr related to this with bzr.
No, that's completely out of scope, AFAICT. I don't even know what the
issues are, so I'm not able to propose a solution, at the moment.
Regards,
Mart
On approximately 4/27/2009 7:11 PM, came the following characters from
the keyboard of Cameron Simpson:
On 27Apr2009 18:15, Glenn Linderman wrote:
The problem with this, and other preceding schemes that have been
discussed here, is that there is no means of ascertaining whether a
particular
James Y Knight wrote:
> Hopefully it can be assumed that your locale encoding really is a
> non-overlapping superset of ASCII, as is required by POSIX...
Can you please point to the part of the POSIX spec that says that
such overlapping is forbidden?
> I'm a bit scared at the prospect that U+DCAF
I thought PEP-383 was a fairly neat approach, but after thinking about it, I
now think that it is wrong.
PEP-383 attempts to represent non-UTF-8 byte sequences in Unicode strings in
a reversible way. But how do those non-UTF-8 byte sequences get into those
path names in the first place? Most lik
On approximately 4/27/2009 8:39 PM, came the following characters from
the keyboard of Martin v. Löwis:
I'm not suggesting the PEP should solve the problem of mounting foreign
file systems, although if it doesn't it should probably point that out.
I'm just suggesting that if the people that writ
On Mon, 2009-04-27 at 22:25 -0700, Glenn Linderman wrote:
>
> Indeed, that was the missing piece. I'd forgotten about the
> encodings
> that use escape sequences, rather than UTF-8, and DBCS. I don't
> think
> those encodings are permitted by POSIX file systems, but I suppose
> they
> could s
On approximately 4/27/2009 8:35 PM, came the following characters from
the keyboard of Martin v. Löwis:
Glenn Linderman wrote:
On approximately 4/27/2009 12:42 PM, came the following characters from
the keyboard of Martin v. Löwis:
It's a private use area. It will never carry an official charac
On Apr 27, 2009, at 11:35 PM, Martin v. Löwis wrote:
No. You seem to assume that all bytes < 128 decode successfully
always.
I believe this assumption is wrong, in general:
py> "\x1b$B' \x1b(B".decode("iso-2022-jp") #2.x syntax
Traceback (most recent call last):
File "", line 1, in
Unicode
Tony Nelson writes:
> At 16:09 + 04/27/2009, Antoine Pitrou wrote:
> >Stephen J. Turnbull xemacs.org> writes:
> >>
> >> I hate to break it to you, but most stages of mail processing have
> >> very little to do with SMTP. In particular, processing MIME
> >> attachments often requires dea
Michael Foord writes:
> The problem you don't address, which is still the reality for most
> programmers (especially Mac OS X where filesystem encoding is UTF 8), is
> that programmers *are* going to treat filenames as strings.
> The proposed PEP allows that to work for them - whatever plat
>> I don't understand what you're saying. py3k filenames are all
>> unicode, even on POSIX systems,
>
>
> How is that possible on POSIX systems where the underlying file system
> uses bytes for filenames?
>
> If I write a piece of Python code:
>
> filename = 'some path/some name'
>
> I m
On 27Apr2009 21:58, Benjamin Peterson wrote:
| 2009/4/27 Cameron Simpson :
| > PROPOSAL: add to the PEP the following functions:
[...]
| > and for me, I would like to see:
| > os.setfilesystemencoding(coding)
| >
| > Currently os.getfilesystemencoding() returns you the encoding based on
| > the c
> I'm not suggesting the PEP should solve the problem of mounting foreign
> file systems, although if it doesn't it should probably point that out.
> I'm just suggesting that if the people that write software to solve the
> problem of mounting foreign file systems have already solved the naming
>
Glenn Linderman wrote:
> On approximately 4/27/2009 12:42 PM, came the following characters from
> the keyboard of Martin v. Löwis:
It's a private use area. It will never carry an official character
assignment.
>>>
>>> I know that U+F - U+F is a private use area. I don't find a
>
2009/4/27 Cameron Simpson :
>
> PROPOSAL: add to the PEP the following functions:
>
> os.fsdecode(bytes) -> funny-encoded Unicode
> This is what os.listdir() does to produce the strings it hands out.
> os.fsencode(funny-string) -> bytes
> This is what open(filename,..) does to turn the file
On 27Apr2009 18:15, Glenn Linderman wrote:
> The problem with this, and other preceding schemes that have been
> discussed here, is that there is no means of ascertaining whether a
> particular file name str was obtained from a str API, or was funny-
> decoded from a bytes API... a
On approximately 4/27/2009 5:42 PM, came the following characters from
the keyboard of Cameron Simpson:
I think that, almost independent of this PEP, there should be an
os.fsencode() function that takes a byte string (as a POSIX OS call
will take) and performs the _same_ byte->string encoding tha
On approximately 4/27/2009 2:14 PM, came the following characters from
the keyboard of Cameron Simpson:
On 27Apr2009 00:07, Glenn Linderman wrote:
On approximately 4/25/2009 5:22 AM, came the following characters from
the keyboard of Martin v. Löwis:
The problem with this, and other p
2009/4/27 Cameron Simpson :
> I think that, almost independent of this PEP, there should be an
> os.fsencode() function that takes a byte string (as a POSIX OS call
> will take) and performs the _same_ byte->string encoding that listdir()
> and friends are doing under the hood. And a partner os.fsd
On 27Apr2009 23:27, Simon Cross wrote:
| On Mon, Apr 27, 2009 at 9:48 PM, "Martin v. Löwis" wrote:
| > As Cameron says: it's out of the scope of the PEP. It really depends how
| > the operating system deals with them. Most likely, the files are not
| > accessible - not only not from Python, but a
On 27Apr2009 21:48, Martin v. L�wis wrote:
| >>> There are still issues regarding how Windows and POSIX programs that
| >>> are sharing cross-mounted file systems might communicate file names
| >>> between each other, which is not at all clear from the PEP. If this
| >>> is an insoluble or un-
On Tue, 28 Apr 2009 04:13:47 am Antoine Pitrou wrote:
> Stephen J. Turnbull xemacs.org> writes:
...
> > So what you'll get here, AFAICS, is a new situation where many
> > Windows-centric programmers will produce code that's incapable of
> > dealing with non-Unicode input because they don't have to
On approximately 4/27/2009 12:48 PM, came the following characters from
the keyboard of Martin v. Löwis:
There are still issues regarding how Windows and POSIX programs that
are sharing cross-mounted file systems might communicate file names
between each other, which is not at all clear from th
On approximately 4/27/2009 12:42 PM, came the following characters from
the keyboard of Martin v. Löwis:
It's a private use area. It will never carry an official character
assignment.
I know that U+F - U+F is a private use area. I don't find a
definition of U+F01xx to know what the not
Stephen J. Turnbull wrote:
Antoine Pitrou writes:
> > or (better for 2.x, where bytes are strings as far as most
> > programmers are concerned) as a new data type,
>
> I'm -1 on any new string-like type (for file paths or whatever
> else) with custom encoding/decoding semantics. It's the
Simon Cross gmail.com> writes:
>
> $ touch $'\xFF\xAA\xFF'
> $ vi $'\xFF\xAA\xFF'
> $ egrep foo $'\xFF\xAA\xFF'
>
> All worked fine from my Bash shell with locale encoding set to UTF-8.
The PEP is precisely about making py3k able to better handle these files (right
now os.listdir() doesn't retu
> $ touch $'\xFF\xAA\xFF'
> $ vi $'\xFF\xAA\xFF'
> $ egrep foo $'\xFF\xAA\xFF'
>
> All worked fine from my Bash shell with locale encoding set to UTF-8.
> I can also open the created file from the GNOME editor file dialog (it
> even tells me the filename is not valid in my locale's encoding). The
On Mon, Apr 27, 2009 at 9:48 PM, "Martin v. Löwis" wrote:
> As Cameron says: it's out of the scope of the PEP. It really depends how
> the operating system deals with them. Most likely, the files are not
> accessible - not only not from Python, but also not accessible from
> any other Unix program
On 27Apr2009 00:07, Glenn Linderman wrote:
> On approximately 4/25/2009 5:22 AM, came the following characters from
> the keyboard of Martin v. Löwis:
>>> The problem with this, and other preceding schemes that have been
>>> discussed here, is that there is no means of ascertaining whether a
>>>
Jim Kleckner wrote:
> I went to upgrade a Vista machine from 2.6.1 to 2.6.2 and got error 2755
> with the message "system cannot open the device or file".
>
> I uninstalled 2.6.1, removing all residual files also, and got the error
> message again.
>
> When I ran msiexec as follows to get a log,
>>> There are still issues regarding how Windows and POSIX programs that
>>> are sharing cross-mounted file systems might communicate file names
>>> between each other, which is not at all clear from the PEP. If this
>>> is an insoluble or un-addressed issue, it should be stated. (It is
>>> pr
>> It's a private use area. It will never carry an official character
>> assignment.
>
>
> I know that U+F - U+F is a private use area. I don't find a
> definition of U+F01xx to know what the notation means. Are you picking
> a particular character within the private use area, or a part
Jeroen Ruigrok van der Werven in-nomine.org> writes:
>
> So on medium and large datasets the decoder of Bjoern is very interesting,
> but the tiny case (just Bjoern's name) is quite a tad bit slower. The other
> cases seems more typical of what the average use in Python would be.
Keep in mind wh
-On [20090414 16:43], Antoine Pitrou (solip...@pitrou.net) wrote:
>If you have some time on your hands, you could try benchmarking it against
>Python 3.1's (py3k) decoder. There are two cases to consider:
Bjoern actually did it himself already:
http://bjoern.hoehrmann.de/utf-8/decoder/dfa/#perfor
Stephen J. Turnbull xemacs.org> writes:
>
> Excuse me, but I can't see a scheme that encodes bytes as Unicodes but
> only sometimes as a "clean separation".
Yet it is. Filenames are all unicode, without exception, and there's no implicit
conversion to bytes. That's a clean separation.
> So what
At 16:09 + 04/27/2009, Antoine Pitrou wrote:
>Stephen J. Turnbull xemacs.org> writes:
>>
>> I hate to break it to you, but most stages of mail processing have
>> very little to do with SMTP. In particular, processing MIME
>> attachments often requires dealing with file names.
>
>AFAIK, the fi
At 23:39 -0700 04/26/2009, Glenn Linderman wrote:
>On approximately 4/25/2009 5:35 AM, came the following characters from
>the keyboard of Martin v. Löwis:
>>> Because the encoding is not reliably reversible.
>>
>> Why do you say that? The encoding is completely reversible
>> (unless we disagree on
Antoine Pitrou writes:
> > or (better for 2.x, where bytes are strings as far as most
> > programmers are concerned) as a new data type,
>
> I'm -1 on any new string-like type (for file paths or whatever
> else) with custom encoding/decoding semantics. It's the best way to
> ruin the clean
Paul Moore writes:
> 2009/4/27 Stephen J. Turnbull :
> > I believe there are solutions that don't have that problem.
> > Specifically, if the return values were bytes, or (better for 2.x,
> > where bytes are strings as far as most programmers are concerned) as a
> > new data type, to indicate
I went to upgrade a Vista machine from 2.6.1 to 2.6.2 and got error 2755
with the message "system cannot open the device or file".
I uninstalled 2.6.1, removing all residual files also, and got the error
message again.
When I ran msiexec as follows to get a log, it magically worked:
msiexec
Hi Antoine,
Antoine Pitrou writes:
> Damien Diederen crosstwine.com> writes:
>> I couldn't figure out a way to get rid of it short of multi-#including
>> "templates" and playing with the C preprocessor, however, and have the
>> nagging feeling the latter would be frowned upon by the maintainers
Stephen J. Turnbull xemacs.org> writes:
>
> I hate to break it to you, but most stages of mail processing have
> very little to do with SMTP. In particular, processing MIME
> attachments often requires dealing with file names.
AFAIK, the file name is only there as an indication for the user whe
On Mon, Apr 27, 2009, Antoine Pitrou wrote:
> Stephen J. Turnbull xemacs.org> writes:
>>
>> If
>> you see a broken encoding once, you're likely to see it a million times
>> (spammers have the most broken software) or maybe have it raise an
>> unhandled Exception a dozen times (in rate of using bu
2009/4/27 Stephen J. Turnbull :
> I believe there are solutions that don't have that problem.
> Specifically, if the return values were bytes, or (better for 2.x,
> where bytes are strings as far as most programmers are concerned) as a
> new data type, to indicate that they're not text until the cl
Antoine Pitrou writes:
> I'm not sure how mail being stuck in a pipeline has anything to do
> with Martin's proposal (which deals with file paths, not with
> SMTP...).
I hate to break it to you, but most stages of mail processing have
very little to do with SMTP. In particular, processing MIM
Damien Diederen crosstwine.com> writes:
>
> I couldn't figure out a way to get rid of it short of multi-#including
> "templates" and playing with the C preprocessor, however, and have the
> nagging feeling the latter would be frowned upon by the maintainers.
>
> There is a precedent with xmltok.
Hi Eric,
"Eric Smith" writes:
>> I couldn't figure out a way to get rid of it short of multi-#including
>> "templates" and playing with the C preprocessor, however, and have the
>> nagging feeling the latter would be frowned upon by the maintainers.
>
> Not sure if this is exactly what you mean,
On Mon, Apr 27, 2009 at 7:25 AM, Damien Diederen wrote:
>
> Antoine Pitrou writes:
>> Hello,
>>
>> We're in the process of forward-porting the recent (massive) json
>> updates to 3.1, and we are also thinking of dropping remnants of
>> support of the bytes type in the json library (in 3.1, again)
> I couldn't figure out a way to get rid of it short of multi-#including
> "templates" and playing with the C preprocessor, however, and have the
> nagging feeling the latter would be frowned upon by the maintainers.
Not sure if this is exactly what you mean, but look at Objects/stringlib.
str.for
Mark Dickinson pointed out to me that the trunk buildbots are failing
under Windows.
After some analysis, I think this is because of a change I made to use
_toupper in integer formatting. The correct solution to this is to
implement issue 5793 to come up with a working, cross-platform,
locale
Hello,
Antoine Pitrou writes:
> Hello,
>
> We're in the process of forward-porting the recent (massive) json
> updates to 3.1, and we are also thinking of dropping remnants of
> support of the bytes type in the json library (in 3.1, again). This
> bytes support almost didn't work at all, but the
Stephen J. Turnbull xemacs.org> writes:
>
> If
> you see a broken encoding once, you're likely to see it a million times
> (spammers have the most broken software) or maybe have it raise an
> unhandled Exception a dozen times (in rate of using busted software,
> the spammers are closely followed
On Mon, 27 Apr 2009 at 01:40, Glenn Linderman wrote:
Yes. My suggested use of ? is a visible character that is illegal in Windows
file names, thus causing no valid Windows file names to be visually mangled.
It is also a character that should be avoided in POSIX names because:
1) it is known t
On approximately 4/27/2009 12:55 AM, came the following characters from
the keyboard of Cameron Simpson:
On 26Apr2009 23:39, Glenn Linderman wrote:
[...snip...]
There are still issues regarding how Windows and POSIX programs that are
sharing cross-mounted file systems might communicate file
On 26Apr2009 23:39, Glenn Linderman wrote:
[...snip...]
> There are still issues regarding how Windows and POSIX programs that are
> sharing cross-mounted file systems might communicate file names between
> each other, which is not at all clear from the PEP. If this is an
> insoluble or un-
On approximately 4/25/2009 5:22 AM, came the following characters from
the keyboard of Martin v. Löwis:
The problem with this, and other preceding schemes that have been
discussed here, is that there is no means of ascertaining whether a
particular file name str was obtained from a str API, or wa
57 matches
Mail list logo