[issue13643] 'ascii' is a bad filesystem default encoding

2017-12-18 Thread STINNER Victor
STINNER Victor added the comment: Follow-up: the PEP 538 (bpo-28180) and PEP 540 (bpo-29240) have been accepted and implemented in Python 3.7! -- ___ Python tracker ___ __

[issue13643] 'ascii' is a bad filesystem default encoding

2016-12-20 Thread Nick Coghlan
Nick Coghlan added the comment: Also see http://bugs.python.org/issue28180 for a more recent proposal to tackle this by coercing the C locale to the C.UTF-8 locale -- nosy: +ncoghlan ___ Python tracker ___

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-23 Thread Martin Pool
Martin Pool added the comment: Terry, that's fine. Thanks to everyone who contributed to the discussion. -- ___ Python tracker ___ _

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-23 Thread Terry J. Reedy
Terry J. Reedy added the comment: Martin, after reading most all of the unusually large sequence of messages, I am closing this because three of the core developers with the most experience in this area are dead-set against your proposal. That does not make it 'wrong', but does mean that it w

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-22 Thread akira
Changes by akira <4kir4...@gmail.com>: -- nosy: +akira ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://m

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread R. David Murray
R. David Murray added the comment: > _My_ locale is set properly. The problem is all the other > people in the world who do not have their locale set to match > their files on disk; telling them each to fix it is tedious. > But perhaps the OS is the best place to address that, when the > incorr

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread Martin Pool
Martin Pool added the comment: On 22 December 2011 13:15, STINNER Victor wrote: > You cannot pass directly "h\xe9.txt", but if you know the "correct" file > system encoding, you can encode it explicitly using str.encode("utf-8"). My recollection was that there were some cases where you couldn

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread STINNER Victor
STINNER Victor added the comment: > The problem as I see it is this: > > On Linux, filenames are generally (but not always) in UTF-8; people > fairly commonly end up with no locale configured, which causes Python > to decode filenames as ascii. It is easy for this to end up with them > hitting

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread Martin Pool
Martin Pool added the comment: On 22 December 2011 12:32, STINNER Victor wrote: > > STINNER Victor added the comment: > > On 22/12/2011 02:16, Martin Pool wrote: >> The proposal is that in some cases where Python currently assumes >> filenames are ascii on Linux, it ought to instead assume the

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread STINNER Victor
STINNER Victor added the comment: On 22/12/2011 02:16, Martin Pool wrote: > The proposal is that in some cases where Python currently assumes > filenames are ascii on Linux, it ought to instead assume they are > utf-8. Oh, I expected a use case describing the problem, not the proposed solution

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread Martin Pool
Martin Pool added the comment: On 22 December 2011 11:21, STINNER Victor wrote: > This discussion is becoming very long, I didn't remember the original > purpose. The proposal is that in some cases where Python currently assumes filenames are ascii on Linux, it ought to instead assume they are

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread STINNER Victor
STINNER Victor added the comment: >> Nope. The locale encoding is chosen using LC_ALL, LC_CTYPE or LANG >> variable: use the first non-empty variable. LC_MESSAGES doesn't affect >> the encoding. Example: > > That's good to know, thanks. Only leaves the case where setlocale > is called again wi

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread STINNER Victor
STINNER Victor added the comment: This discussion is becoming very long, I didn't remember the original purpose. You want to use UTF-8 instead of ASCII, so what? What do you want to do with your nicely well decoded filenames? You cannot print it to your terminal nor pass it to a subprocess, b

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread Antoine Pitrou
Antoine Pitrou added the comment: > It is a de facto, not de jure standard: UTF-8 is how things are > typically stored. Other software (eg gnome file handling utilities) > makes this assumption. See eg > . So should we specifically detect Lin

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread Martin Pool
Martin Pool added the comment: On 21 December 2011 12:41, Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > >> The standard encoding is UTF-8. > > How so? I don't know of any Linux or Unix spec which says so. If you get > the Linux heads to standardize this then I'll certainly be v

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread R. David Murray
R. David Murray added the comment: > But currently everything handling filenames as unicode on > nix needs to worry about surrogates (that can't be encoded > as ascii) already, or it will still be passing values that > can't be interpreted by other processes as you highlighed > earlier. Making u

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread STINNER Victor
STINNER Victor added the comment: > it will still be passing values that can't be > interpreted by other processes as you highlighed earlier. On UNIX, data going outside Python has be be encoded: you pass byte strings, not directly Unicode. Surrogates are encoded back to original bytes. Examp

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread Martin
Martin added the comment: > Nope. The locale encoding is chosen using LC_ALL, LC_CTYPE or LANG > variable: use the first non-empty variable. LC_MESSAGES doesn't affect > the encoding. Example: That's good to know, thanks. Only leaves the case where setlocale is called again with a different

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread STINNER Victor
STINNER Victor added the comment: > Having more than one encoding on unix is already a reality, there's nothing > to stop someone setting LANG=de_DE.UTF-8 and LC_MESSAGES=C say. Nope. The locale encoding is chosen using LC_ALL, LC_CTYPE or LANG variable: use the first non-empty variable. LC_M

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread vila
Changes by vila : -- nosy: +vila ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Antoine Pitrou
Antoine Pitrou added the comment: > The standard encoding is UTF-8. How so? I don't know of any Linux or Unix spec which says so. If you get the Linux heads to standardize this then I'll certainly be very happy (and countless others will, too). But AFAIK this it not the case and I don't see why

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin Pool
Martin Pool added the comment: On 21 December 2011 12:16, Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > > So, you're complaining about something which works, kind of: > > $ touch héhé > $ LANG=C python3 -c "import os; print(os.listdir())" > ['h\udcc3\udca9h\udcc3\udca9'] It's

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin Pool
Martin Pool added the comment: Thanks for the example. Like you say, realistically, all data exchanged with other programs and with the system needs to be in the same encoding. (User document content may be in something else.) On modern systems, this problem is solved by making the standard e

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Antoine Pitrou
Antoine Pitrou added the comment: So, you're complaining about something which works, kind of: $ touch héhé $ LANG=C python3 -c "import os; print(os.listdir())" ['h\udcc3\udca9h\udcc3\udca9'] > This makes robustly working with non-ascii filenames on different > platforms needlessly annoying, g

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin
Martin added the comment: > During 1 month, we had PYTHONFSENCODING environment variable. It was not a > good idea. I strongly agree. There is no sense in having a separate configurable value, anyone who would think about using a PYTHONFSENCODING should just change their locale instead. Howev

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread STINNER Victor
STINNER Victor added the comment: I should not write comments so late :-p > Not after Python start. Using two encodings at the same would just ... at the same time > ... because I would like to inconsistency. because it would lead to inconsistencies --

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread STINNER Victor
STINNER Victor added the comment: > The main problem I see being discussed is that > changing the encoding after Python starts would > be dangerous, which I agree with, but we're not > proposing to do that. Not after Python start. Using two encodings at the same would just adds new problems. O

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin Pool
Martin Pool added the comment: On 21 December 2011 11:26, STINNER Victor wrote: > I never checked which locale is used by default for programs called by cron. > So I checked: on Fedora 16, programs start with a very few environment > variables, and LANG and LC_ALL are not set. You can add "LA

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin Pool
Martin Pool added the comment: On 21 December 2011 11:01, STINNER Victor wrote: > > Again: please read the discussion (in closed issues) explaing why we removed > it (and which problems it introduced). There's a lot of history, so I'm not sure exactly which problems you're referring to. The

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread STINNER Victor
STINNER Victor added the comment: > There are two problems with this: one is just the practical > one that it scales poorly to have to tell every user to do this > and to take them through working out how to set this in a way > that covers cron jobs, daemons, things run over ssh, etc. I never c

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread STINNER Victor
STINNER Victor added the comment: > If there was a separate LC_FILENAMES then Python could respect > that and insist people set it, but there isn't. During 1 month, we had PYTHONFSENCODING environment variable. It was not a good idea. Again: please read the discussion (in closed issues) explai

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin Pool
Martin Pool added the comment: > I'm not sure why having a locale set to C or something invalid should be > considered a Python bug. Programs like bzr that hit these problems can tell their users, either in the docs or an error message, "change your locale to a UTF-8 one". There are two pro

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin
Martin added the comment: > It was already discussed: using a different encoding for filenames and for > other things is really not a good idea. The main problem is the interaction > with other programs. Yes, for many programs, a change like this will mean they create the file, but then throw

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin
Martin added the comment: > I'm not sure why having a locale set to C or something invalid should be > considered a Python bug. You have to handle un-decodable filenames no > matter what you do, since things aren't always encoded in utf-8 on non-OSX > unix even when that is the system locale.

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread STINNER Victor
STINNER Victor added the comment: > under either the C locale, or with an invalid or missing locale The right fix is to fix your locale, not Python. -- ___ Python tracker ___ _

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread STINNER Victor
STINNER Victor added the comment: > Currently when running Python on a non-OSX posix environment > under either the C locale, or with an invalid or missing locale, > it's not possible to operate using unicode filenames outside > the ascii range. It was already discussed: using a different encod

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread R. David Murray
R. David Murray added the comment: I'm not sure why having a locale set to C or something invalid should be considered a Python bug. You have to handle un-decodable filenames no matter what you do, since things aren't always encoded in utf-8 on non-OSX unix even when that is the system local

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin
New submission from Martin : Currently when running Python on a non-OSX posix environment under either the C locale, or with an invalid or missing locale, it's not possible to operate using unicode filenames outside the ascii range. Using bytes works, as does reading expecting unicode, using t