> Ned Deily (ND) wrote:
>ND> In article , Piet van Oostrum
>ND> wrote:
>>> > Ronald Oussoren (RO) wrote:
>>> >RO> For what it's worth, the OSX API's seem to behave as follows:
>>> >RO> * If you create a file with an non-UTF8 name on a HFS+ filesystem the
>>> >RO> system automaticly enc
James Y Knight writes:
> in python. It seems like the most common reason why people want to use
> SJIS is to make old pre-unicode apps work right in WINE -- in which
> case it doesn't actually affect unix python at all.
Mounting external drives, especially USB memory sticks which tend to
b
On 30 Apr, 2009, at 21:33, Piet van Oostrum wrote:
Ronald Oussoren (RO) wrote:
RO> For what it's worth, the OSX API's seem to behave as follows:
RO> * If you create a file with an non-UTF8 name on a HFS+
filesystem the
RO> system automaticly encodes the name.
RO> That is, open(chr(255
On Fri, 1 May 2009 06:55:48 am Thomas Breuel wrote:
> You can get the same error on Linux:
>
> $ python
> Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
> [GCC 4.3.3] on linux2
> Type "help", "copyright", "credits" or "license" for more
> information.
>
> >>> f=open(chr(255),'w')
>
> Traceb
Thomas Breuel wrote:
> Not for me (I am using Python 2.6.2).
>
> >>> f = open(chr(255), 'w')
> Traceback (most recent call last):
> File "", line 1, in
> IOError: [Errno 22] invalid mode ('w') or filename: '\xff'
> >>>
>
>
> You can get the same error on Linux:
>
> $ p
James Y Knight wrote:
On Apr 30, 2009, at 5:42 AM, Martin v. Löwis wrote:
I think you are right. I have now excluded ASCII bytes from being
mapped, effectively not supporting any encodings that are not ASCII
compatible. Does that sound ok?
Yes. The practical upshot of this is that users who br
On 30 Apr 2009, at 21:06, Martin v. Löwis wrote:
How do get a printable unicode version of these path strings if
they
contain none unicode data?
Define "printable". One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
What I mean by pr
>
> Not for me (I am using Python 2.6.2).
>
> >>> f = open(chr(255), 'w')
> Traceback (most recent call last):
> File "", line 1, in
> IOError: [Errno 22] invalid mode ('w') or filename: '\xff'
> >>>
You can get the same error on Linux:
$ python
Python 2.6.2 (release26-maint, Apr 19 2009, 01:5
On Apr 30, 2009, at 5:42 AM, Martin v. Löwis wrote:
I think you are right. I have now excluded ASCII bytes from being
mapped, effectively not supporting any encodings that are not ASCII
compatible. Does that sound ok?
Yes. The practical upshot of this is that users who brokenly use
"ja_JP.SJI
Barry Scott wrote:
On 30 Apr 2009, at 05:52, Martin v. Löwis wrote:
How do get a printable unicode version of these path strings if they
contain none unicode data?
Define "printable". One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
>>> How do get a printable unicode version of these path strings if they
>>> contain none unicode data?
>>
>> Define "printable". One way would be to use a regular expression,
>> replacing all codes in a certain range with a question mark.
>
> What I mean by printable is that the string must be va
In article , Piet van Oostrum
wrote:
> > Ronald Oussoren (RO) wrote:
> >RO> For what it's worth, the OSX API's seem to behave as follows:
> >RO> * If you create a file with an non-UTF8 name on a HFS+ filesystem the
> >RO> system automaticly encodes the name.
>
> >RO> That is, open(chr(255)
On 30 Apr 2009, at 05:52, Martin v. Löwis wrote:
How do get a printable unicode version of these path strings if they
contain none unicode data?
Define "printable". One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
What I mean by prin
> Ronald Oussoren (RO) wrote:
>RO> For what it's worth, the OSX API's seem to behave as follows:
>RO> * If you create a file with an non-UTF8 name on a HFS+ filesystem the
>RO> system automaticly encodes the name.
>RO> That is, open(chr(255), 'w') will silently create a file named '%FF'
>RO
MRAB wrote:
> One further question: should the encoder accept a string like
> u'\xDCC2\xDC80'? That would encode to b'\xC2\x80'
Indeed so.
> which, when decoded, would give u'\x80'.
Assuming the encoding is UTF-8, yes.
> Does the PEP only guarantee that strings decoded
> from the filesystem are
One further question: should the encoder accept a string like
u'\xDCC2\xDC80'? That would encode to b'\xC2\x80', which, when decoded,
would give u'\x80'. Does the PEP only guarantee that strings decoded
from the filesystem are reversible, but not check what might be de novo
strings?
__
Cameron Simpson writes:
> On 29Apr2009 22:14, Stephen J. Turnbull wrote:
> | Baptiste Carvello writes:
> | > By contrast, if the new utf-8b codec would *supercede* the old one,
> | > \udcxx would always mean raw bytes (at least on UCS-4 builds, where
> | > surrogates are unused). Thus ambi
[top-posting for once to preserve full quoting]
Glenn,
Could you please reduce your suggestions into sample text for the PEP?
We seem to be now at the stage where nobody is objecting to the PEP, so
the focus should be on making the PEP clearer.
If you still want to create an alternative PEP impl
> I think it has to be excluded from mapping in order to not introduce
> security issues.
I think you are right. I have now excluded ASCII bytes from being
mapped, effectively not supporting any encodings that are not ASCII
compatible. Does that sound ok?
Regards,
Martin
_
> Assuming people agree that this is an accurate summary, it should be
> incorporated into the PEP.
Done!
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.pyth
On approximately 4/29/2009 7:50 PM, came the following characters from
the keyboard of Aahz:
On Thu, Apr 30, 2009, Cameron Simpson wrote:
The lengthy discussion mostly revolves around:
- Glenn points out that strings that came _not_ from listdir, and that are
_not_ well-formed unicode (==
On approximately 4/29/2009 8:46 PM, came the following characters from
the keyboard of Terry Reedy:
Glenn Linderman wrote:
On approximately 4/29/2009 1:28 PM, came the following characters from
So where is the ambiguity here?
None. But not everyone can read all the Python source code to tr
On Wed, Apr 29, 2009 at 23:03, Terry Reedy wrote:
> Thomas Breuel wrote:
>
>>
>>Sure. However, that requires you to provide meaningful, reproducible
>>counter-examples, rather than a stenographic formulation that might
>>hint some problem you apparently see (which I believe is just no
> Thanks for clarifying the Windows behavior, here. A little more
> clarification in the PEP could have avoided lots of discussion. It
> would seem that a PEP, proposed to modify a poorly documented (and
> therefore likely poorly understood) area, should be educational about
> the status quo, as
> How do get a printable unicode version of these path strings if they
> contain none unicode data?
Define "printable". One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
> I'm guessing that an app has to understand that filenames come in tw
Glenn Linderman wrote:
On approximately 4/29/2009 1:28 PM, came the following characters from
So where is the ambiguity here?
None. But not everyone can read all the Python source code to try to
understand it; they expect the documentation to help them avoid that.
Because the documentatio
On Thu, Apr 30, 2009, Cameron Simpson wrote:
>
> The lengthy discussion mostly revolves around:
>
> - Glenn points out that strings that came _not_ from listdir, and that are
> _not_ well-formed unicode (== "have bare surrogates in them") but that
> were intended for use as filenames wil
On 29Apr2009 23:41, Barry Scott wrote:
> On 22 Apr 2009, at 07:50, Martin v. Löwis wrote:
>> If the locale's encoding is UTF-8, the file system encoding is set to
>> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
>> (which must be >= 0x80) into half surrogate codes U+DC80..U
On 22 Apr 2009, at 07:50, Martin v. Löwis wrote:
If the locale's encoding is UTF-8, the file system encoding is set to
a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
(which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.
Forgive me if this has been covered.
On 29Apr2009 22:14, Stephen J. Turnbull wrote:
| Baptiste Carvello writes:
| > By contrast, if the new utf-8b codec would *supercede* the old one,
| > \udcxx would always mean raw bytes (at least on UCS-4 builds, where
| > surrogates are unused). Thus ambiguity could be avoided.
|
| Unfortunat
On 29Apr2009 17:03, Terry Reedy wrote:
> Thomas Breuel wrote:
>> Sure. However, that requires you to provide meaningful, reproducible
>> counter-examples, rather than a stenographic formulation that might
>> hint some problem you apparently see (which I believe is just not
>> there
On approximately 4/29/2009 1:28 PM, came the following characters from
the keyboard of Martin v. Löwis:
C. File on disk with the invalid surrogate code, accessed via the
str interface, no decoding happens, matches in memory the file on disk
with the byte that translates to the same surrogate, acc
Thomas Breuel wrote:
Sure. However, that requires you to provide meaningful, reproducible
counter-examples, rather than a stenographic formulation that might
hint some problem you apparently see (which I believe is just not
there).
Well, here's another one: PEP 383 would disall
Glenn Linderman wrote:
On approximately 4/29/2009 4:36 AM, came the following characters from
the keyboard of Cameron Simpson:
On 29Apr2009 02:56, Glenn Linderman wrote:
os.listdir(b"")
I find that on my Windows system, with all ASCII path file names,
that I get quite different results wh
> So while out of scope of the PEP, I don't think it's at all
> artificial.
Sure - but I see this as the same case as "the file got renamed".
If you have a LRU list in your app, and a file gets renamed, then
the LRU list breaks (unless you also store the inode number in the
LRU list, and lookup th
>>> C. File on disk with the invalid surrogate code, accessed via the
>>> str interface, no decoding happens, matches in memory the file on disk
>>> with the byte that translates to the same surrogate, accessed via the
>>> bytes interface. Ambiguity.
>> What does that mean? What sp
> Sure. However, that requires you to provide meaningful, reproducible
> counter-examples, rather than a stenographic formulation that might
> hint some problem you apparently see (which I believe is just not
> there).
>
>
> Well, here's another one: PEP 383 would disallow UTF-8 e
"Martin v. Löwis" writes:
> I find the case pretty artificial, though: if the locale encoding
> changes, all file names will look incorrect to the user, so he'll
> quickly switch back, or rename all the files.
It's not necessarily the case that the locale encoding changes, but
rather the name
Baptiste Carvello writes:
> By contrast, if the new utf-8b codec would *supercede* the old one,
> \udcxx would always mean raw bytes (at least on UCS-4 builds, where
> surrogates are unused). Thus ambiguity could be avoided.
Unfortunately, that's false. It could have come from a literal strin
On approximately 4/29/2009 4:36 AM, came the following characters from
the keyboard of Cameron Simpson:
On 29Apr2009 02:56, Glenn Linderman wrote:
os.listdir(b"")
I find that on my Windows system, with all ASCII path file names, that I
get quite different results when I pass os.listdir an
On approximately 4/29/2009 4:07 AM, came the following characters from
the keyboard of R. David Murray:
On Tue, 28 Apr 2009 at 20:29, Glenn Linderman wrote:
On approximately 4/28/2009 7:40 PM, came the following characters from
the keyboard of R. David Murray:
On Tue, 28 Apr 2009 at 13:37, Gle
On 29Apr2009 02:56, Glenn Linderman wrote:
> os.listdir(b"")
>
> I find that on my Windows system, with all ASCII path file names, that I
> get quite different results when I pass os.listdir an empty str vs an
> empty bytes.
>
> Rather than keep you guessing, I get the root directory contents
On Tue, 28 Apr 2009 at 20:29, Glenn Linderman wrote:
On approximately 4/28/2009 7:40 PM, came the following characters from the
keyboard of R. David Murray:
On Tue, 28 Apr 2009 at 13:37, Glenn Linderman wrote:
> C. File on disk with the invalid surrogate code, accessed via the str
> interfac
On approximately 4/29/2009 12:29 AM, came the following characters from
the keyboard of Martin v. Löwis:
C. File on disk with the invalid surrogate code, accessed via the str
interface, no decoding happens, matches in memory the file on disk
with
the byte that translates to the same surrogate, ac
On approximately 4/29/2009 12:38 AM, came the following characters from
the keyboard of Baptiste Carvello:
Glenn Linderman a écrit :
3. When an undecodable byte 0xPQ is found, decode to the escape
codepoint, followed by codepoint U+01PQ, where P and Q are hex digits.
The problem with this
> Sure. However, that requires you to provide meaningful, reproducible
> counter-examples, rather than a stenographic formulation that might
> hint some problem you apparently see (which I believe is just not
> there).
Well, here's another one: PEP 383 would disallow UTF-8 encodings of half
surro
Glenn Linderman a écrit :
If there is going to be a required transformation from de novo strings
to funny-encoded strings, then why not make one that people can actually
see and compare and decode from the displayable form, by using
displayable characters instead of lone surrogates?
The
Lino Mastrodomenico a écrit :
Only for the new utf-8b encoding (if Martin agrees), while the
existing utf-8 is fine as is (or at least waaay outside the scope of
this PEP).
This is questionable. This would have the consequence that \udcxx in a python
string would sometimes mean a surrogate,
Zooko O'Whielacronx wrote:
If you switch to iso8859-15 only in the presence of undecodable
UTF-8, then you have the same round-trip problem as the PEP: both
b'\xff' and b'\xc3\xbf' will be converted to u'\u00ff' without a
way to unambiguously recover the original file name.
Why do you say
On 29Apr2009 08:27, Martin v. L?wis wrote:
| > I would like utility functions to perform:
| > os-bytes->funny-encoded
| > funny-encoded->os-bytes
| > or explicit example code snippets for same in the PEP text.
|
| Done!
Thanks!
--
Cameron Simpson DoD#743
http://www.cskk.ezoshosting.com/cs/
Glenn Linderman a écrit :
3. When an undecodable byte 0xPQ is found, decode to the escape
codepoint, followed by codepoint U+01PQ, where P and Q are hex digits.
The problem with this strategy is: paths are often sliced, so your 2 codepoints
could get separated. The good thing with the PEP'
> C. File on disk with the invalid surrogate code, accessed via the str
> interface, no decoding happens, matches in memory the file on disk
> with
> the byte that translates to the same surrogate, accessed via the bytes
> interface. Ambiguity.
Is that an alternative to A
On approximately 4/28/2009 10:52 PM, came the following characters from
the keyboard of Martin v. Löwis:
C. File on disk with the invalid surrogate code, accessed via the str
interface, no decoding happens, matches in memory the file on disk with
the byte that translates to the same surrogate, ac
> I would like utility functions to perform:
> os-bytes->funny-encoded
> funny-encoded->os-bytes
> or explicit example code snippets for same in the PEP text.
Done!
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mail
> I'm more concerned with your (yours? someone else's?) mention of shift
> characters. I'm unfamiliar with these encodings: to translate such a
> thing into a Latin example, is it the case that there are schemes with
> valid encodings that look like:
>
> [SHIFT] a b c
>
> which would produce "A
>> The Python UTF-8 codec will happily encode half-surrogates; people argue
>> that it is a bug that it does so, however, it would help in this
>> specific case.
>
> Can we use this encoding scheme for writing into files as well? We've
> turned the filename with undecodable bytes into a string wi
>>> C. File on disk with the invalid surrogate code, accessed via the str
>>> interface, no decoding happens, matches in memory the file on disk with
>>> the byte that translates to the same surrogate, accessed via the bytes
>>> interface. Ambiguity.
>>
>> Is that an alternative to A and B?
>
> I
On approximately 4/28/2009 4:06 PM, came the following characters from
the keyboard of Cameron Simpson:
I think I may be able to resolve Glenn's issues with the scheme lower
down (through careful use of definitions and hand waving).
Close. You at least resolved what you thought my issue was
On approximately 4/28/2009 7:40 PM, came the following characters from
the keyboard of R. David Murray:
On Tue, 28 Apr 2009 at 13:37, Glenn Linderman wrote:
C. File on disk with the invalid surrogate code, accessed via the str
interface, no decoding happens, matches in memory the file on disk
w
On 28Apr2009 13:37, Glenn Linderman wrote:
> On approximately 4/28/2009 1:25 PM, came the following characters from
> the keyboard of Martin v. Löwis:
>>> The UTF-8b representation suffers from the same potential ambiguities as
>>> the PUA characters...
>>
>> Not at all the same ambiguities. He
Martin v. Löwis wrote:
>> Since the serialization of the Unicode string is likely to use UTF-8,
>> and the string for such a file will include half surrogates, the
>> application may raise an exception when encoding the names for a
>> configuration file. These encoding exceptions will be as rare a
On 28Apr2009 14:37, Thomas Breuel wrote:
| But the biggest problem with the proposal is that it isn't needed: if you
| want to be able to turn arbitrary byte sequences into unicode strings and
| back, just set your encoding to iso8859-15. That already works and it
| doesn't require any changes.
On Tue, 28 Apr 2009 at 13:37, Glenn Linderman wrote:
C. File on disk with the invalid surrogate code, accessed via the str
interface, no decoding happens, matches in memory the file on disk with the
byte that translates to the same surrogate, accessed via the bytes interface.
Ambiguity.
Unles
On 28Apr2009 11:49, Antoine Pitrou wrote:
| Paul Moore gmail.com> writes:
| >
| > I've yet to hear anyone claim that they would have an actual problem
| > with a specific piece of code they have written.
|
| Yep, that's the problem. Lots of theoretical problems noone has ever
encountered
| bro
Zooko O'Whielacronx wrote:
> On Apr 28, 2009, at 6:46 AM, Hrvoje Niksic wrote:
>> If you switch to iso8859-15 only in the presence of undecodable UTF-8,
>> then you have the same round-trip problem as the PEP: both b'\xff' and
>> b'\xc3\xbf' will be converted to u'\u00ff' without a way to
>> unambi
On approximately 4/28/2009 2:01 PM, came the following characters from
the keyboard of MRAB:
Glenn Linderman wrote:
On approximately 4/28/2009 11:55 AM, came the following characters
from the keyboard of MRAB:
I've been thinking of "python-escape" only in terms of UTF-8, the only
encoding menti
I think I may be able to resolve Glenn's issues with the scheme lower
down (through careful use of definitions and hand waving).
On 27Apr2009 23:52, Glenn Linderman wrote:
> On approximately 4/27/2009 7:11 PM, came the following characters from
> the keyboard of Cameron Simpson:
[...]
>> There
On approximately 4/28/2009 2:02 PM, came the following characters from
the keyboard of Martin v. Löwis:
Glenn Linderman wrote:
On approximately 4/28/2009 1:25 PM, came the following characters from
the keyboard of Martin v. Löwis:
The UTF-8b representation suffers from the same potential ambigu
Glenn Linderman wrote:
> On approximately 4/28/2009 1:25 PM, came the following characters from
> the keyboard of Martin v. Löwis:
>>> The UTF-8b representation suffers from the same potential ambiguities as
>>> the PUA characters...
>>
>> Not at all the same ambiguities. Here, again, the two choi
Glenn Linderman wrote:
On approximately 4/28/2009 11:55 AM, came the following characters from
the keyboard of MRAB:
I've been thinking of "python-escape" only in terms of UTF-8, the only
encoding mentioned in the PEP. In UTF-8, bytes 0x00 to 0x7F are
decodable.
UTF-8 is only mentioned in the
> Others have made this suggestion, and it is helpful to the PEP, but not
> sufficient. As implemented as an error handler, I'm not sure that the
> b'\xed\xb3\xbf' sequence would trigger the error handler, if the UTF-8
> decoder is happy with it. Which, in my testing, it is.
Rest assured that th
On approximately 4/28/2009 1:25 PM, came the following characters from
the keyboard of Martin v. Löwis:
The UTF-8b representation suffers from the same potential ambiguities as
the PUA characters...
Not at all the same ambiguities. Here, again, the two choices:
A. use PUA characters to repres
On approximately 4/28/2009 6:01 AM, came the following characters from
the keyboard of Lino Mastrodomenico:
2009/4/28 Glenn Linderman :
The switch from PUA to half-surrogates does not resolve the issues with the
encoding not being a 1-to-1 mapping, though. The very fact that you think
you can
On approximately 4/28/2009 11:55 AM, came the following characters from
the keyboard of MRAB:
I've been thinking of "python-escape" only in terms of UTF-8, the only
encoding mentioned in the PEP. In UTF-8, bytes 0x00 to 0x7F are
decodable.
UTF-8 is only mentioned in the sense of having special
> The UTF-8b representation suffers from the same potential ambiguities as
> the PUA characters...
Not at all the same ambiguities. Here, again, the two choices:
A. use PUA characters to represent undecodable bytes, in particular for
UTF-8 (the PEP actually never proposed this to happen).
On approximately 4/28/2009 10:53 AM, came the following characters from
the keyboard of James Y Knight:
On Apr 28, 2009, at 2:50 AM, Martin v. Löwis wrote:
James Y Knight wrote:
Hopefully it can be assumed that your locale encoding really is a
non-overlapping superset of ASCII, as is required
On Apr 28, 2009, at 6:46 AM, Hrvoje Niksic wrote:
Are you proposing to unconditionally encode file names as
iso8859-15, or to do so only when undecodeable bytes are encountered?
For what it is worth, what we have previously planned to do for the
Tahoe project is the second of these -- decod
James Y Knight wrote:
On Apr 28, 2009, at 2:50 AM, Martin v. Löwis wrote:
James Y Knight wrote:
Hopefully it can be assumed that your locale encoding really is a
non-overlapping superset of ASCII, as is required by POSIX...
Can you please point to the part of the POSIX spec that says that
s
On approximately 4/28/2009 10:00 AM, came the following characters from
the keyboard of Martin v. Löwis:
An alternative that doesn't suffer from the risk of not being able to
store decoded strings would have been the use of PUA characters, but
people rejected it because of the potential ambigui
On Apr 28, 2009, at 2:50 AM, Martin v. Löwis wrote:
James Y Knight wrote:
Hopefully it can be assumed that your locale encoding really is a
non-overlapping superset of ASCII, as is required by POSIX...
Can you please point to the part of the POSIX spec that says that
such overlapping is forb
> If the PEP depends on this being changed, it should be mentioned in the
> PEP.
The PEP says that the utf-8b codec decodes invalid bytes into low
surrogates. I have now clarified that a strict definition of UTF-8
is assumed for utf-8b.
Regards,
Martin
___
> Since the serialization of the Unicode string is likely to use UTF-8,
> and the string for such a file will include half surrogates, the
> application may raise an exception when encoding the names for a
> configuration file. These encoding exceptions will be as rare as the
> unusual names (whic
> It does solve this issue, because (unlike e.g. U+F01FF) '\udcff' is
> not a valid Unicode character (not a character at all, really) and the
> only way you can put this in a POSIX filename is if you use a very
> lenient UTF-8 encoder that gives you b'\xed\xb3\xbf'.
>
> Since this byte sequence
Paul Moore writes:
> But it seems to me that there is an assumption that problems will
> arise when code gets a potentially funny-decoded string and doesn't
> know where it came from.
>
> Is that a real concern?
Yes, it's a real concern. I don't think it's possible to show a small
piece of
On Mon, Apr 27, 2009 at 23:43, Stephen J. Turnbull wrote:
> Nobody said we were at the stage of *saving* the [attachment]!
But speaking of saving files, I think that's the biggest hole in this
that has been nagging at the back of my mind. This PEP intends to
allow easy access to filenames and oth
2009/4/28 Hrvoje Niksic :
> Lino Mastrodomenico wrote:
>>
>> Since this byte sequence [b'\xed\xb3\xbf'] doesn't represent a valid
>> character when
>> decoded with UTF-8, it should simply be considered an invalid UTF-8
>> sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not*
>> '\udcff
Lino Mastrodomenico wrote:
Since this byte sequence [b'\xed\xb3\xbf'] doesn't represent a valid character
when
decoded with UTF-8, it should simply be considered an invalid UTF-8
sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not*
'\udcff').
"Should be considered" or "will be co
2009/4/28 Glenn Linderman :
> The switch from PUA to half-surrogates does not resolve the issues with the
> encoding not being a 1-to-1 mapping, though. The very fact that you think
> you can get away with use of lone surrogates means that other people might,
> accidentally or intentionally, also
Thomas Breuel wrote:
But the biggest problem with the proposal is that it isn't needed: if
you want to be able to turn arbitrary byte sequences into unicode
strings and back, just set your encoding to iso8859-15. That already
works and it doesn't require any changes.
Are you proposing to unc
>
> Yep, that's the problem. Lots of theoretical problems noone has ever
> encountered
> brought up against a PEP which resolves some actual problems people
> encounter on
> a regular basis.
How can you bring up practical problems against something that hasn't been
implemented?
The fact that no
For what it's worth, the OSX API's seem to behave as follows:
* If you create a file with an non-UTF8 name on a HFS+ filesystem the
system automaticly encodes the name.
That is, open(chr(255), 'w') will silently create a file named '%FF'
instead of the name you'd expect on a unix system.
Paul Moore wrote:
2009/4/28 Antoine Pitrou :
Paul Moore gmail.com> writes:
I've yet to hear anyone claim that they would have an actual problem
with a specific piece of code they have written.
Yep, that's the problem. Lots of theoretical problems noone has ever encountered
brou
2009/4/28 Antoine Pitrou :
> Paul Moore gmail.com> writes:
>>
>> I've yet to hear anyone claim that they would have an actual problem
>> with a specific piece of code they have written.
>
> Yep, that's the problem. Lots of theoretical problems noone has ever
> encountered
> brought up against a P
2009/4/28 Glenn Linderman :
> So assume a non-decodable sequence in a name. That puts us into Martin's
> funny-decode scheme. His funny-decode scheme produces a bare string,
> indistinguishable from a bare string that would be produced by a str API
> that happens to contain that same sequence. D
> Does the PEP take into consideration the normalising behaviour of Mac
> OSX ? We've had some ongoing challenges in bzr related to this with bzr.
No, that's completely out of scope, AFAICT. I don't even know what the
issues are, so I'm not able to propose a solution, at the moment.
Regards,
Mart
On approximately 4/27/2009 7:11 PM, came the following characters from
the keyboard of Cameron Simpson:
On 27Apr2009 18:15, Glenn Linderman wrote:
The problem with this, and other preceding schemes that have been
discussed here, is that there is no means of ascertaining whether a
particular
James Y Knight wrote:
> Hopefully it can be assumed that your locale encoding really is a
> non-overlapping superset of ASCII, as is required by POSIX...
Can you please point to the part of the POSIX spec that says that
such overlapping is forbidden?
> I'm a bit scared at the prospect that U+DCAF
On approximately 4/27/2009 8:39 PM, came the following characters from
the keyboard of Martin v. Löwis:
I'm not suggesting the PEP should solve the problem of mounting foreign
file systems, although if it doesn't it should probably point that out.
I'm just suggesting that if the people that writ
On Mon, 2009-04-27 at 22:25 -0700, Glenn Linderman wrote:
>
> Indeed, that was the missing piece. I'd forgotten about the
> encodings
> that use escape sequences, rather than UTF-8, and DBCS. I don't
> think
> those encodings are permitted by POSIX file systems, but I suppose
> they
> could s
On approximately 4/27/2009 8:35 PM, came the following characters from
the keyboard of Martin v. Löwis:
Glenn Linderman wrote:
On approximately 4/27/2009 12:42 PM, came the following characters from
the keyboard of Martin v. Löwis:
It's a private use area. It will never carry an official charac
1 - 100 of 207 matches
Mail list logo