Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-12 Thread INADA Naoki
On Fri, Jan 13, 2017 at 12:12 AM, Victor Stinner wrote: > 2017-01-12 1:23 GMT+01:00 INADA Naoki : >> I'm ±0 to surrogateescape by default. I feel +1 for stdout and -1 for stdin. > > The use case is to be able to write a Python 3 program which

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-12 Thread Chris Barker
On Thu, Jan 12, 2017 at 7:50 AM, Stephen J. Turnbull wrote: > > So I see no downside to using utf-8 when the C locale is defined. > > You don't have much incentive to look for one, and I doubt you have > the experience of the edge cases (if you do, please

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-12 Thread Stephen J. Turnbull
Chris Barker writes: > 2) There are non-ascii file names, etc. on this supposedly ASCII system. In > which case, do folks expect their Python programs to find these issues and > raise errors? They may well expect that their Python program will not let > them try to save a non ASCII filename,

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-12 Thread Stephen J. Turnbull
Stephan Houben writes: > I think this may be a minor concern ultimately, but it would be > nice if we had some API to at least reliable answer the question > "can I safely output non-ASCII myself?" You can't; stdout might be a TTY, pipe, or socket in which case you have no way to determine

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-12 Thread Victor Stinner
2017-01-12 1:23 GMT+01:00 INADA Naoki : > I'm ±0 to surrogateescape by default. I feel +1 for stdout and -1 for stdin. The use case is to be able to write a Python 3 program which works work UNIX pipes without failing with encoding errors:

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-12 Thread Stephan Houben
Hi Petr, 2017-01-11 12:22 GMT+01:00 Petr Viktorin : > > For example, this may mean that a built-in Python string sort will give you >> a different ordering than invoking the external "sort" command. >> I have been bitten by this kind of issues, leading to spurious "diffs" if >>

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread INADA Naoki
> My PEP 540 is different than Nick's PEP 538, even for the POSIX > locale. I propose to always use the surrogateescape error handler, > whereas Nick wants to keep the strict error handler for inputs and > outputs. > https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler > > The

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread Victor Stinner
2017-01-06 10:50 GMT+01:00 M.-A. Lemburg : > Victor: I think you are taking the UTF-8 idea a bit too far. > Nick was trying to address the situation where the locale is > set to "C", or rather not set at all (in which case the lib C > defaults to the "C" locale). The latter is a

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread INADA Naoki
> > > (I wonder if we can use LC_CTYPE=UTF-8...) > > Syntactically incorrect: that means the language UTF-8. > "LC_TYPE=.UTF-8" might work, but IIRC the language tag is required, > the region and encoding are optional. Thus ja_JP, ja.UTF-8 are OK, > but .UTF-8 is not. I'm sorry. I know it, but

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-10 Thread INADA Naoki
> > That kind of thing makes me very nervous, and I think justifiably so. > And it's only *sufficient* to justify a change to Python's defaults if > Python checks for and accurately identifies when it's in a container. > In my company, we use network boot servers. To reduce boot image, the image

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-10 Thread Stephen J. Turnbull
INADA Naoki writes: > When talking about general Docker image, using C locale is OK for > most cases. In other words, images using C locale is properly > configured. s/properly/compatibly/. "Proper" has strong connotations of "according to protocol". Configuring LC_CTYPE for ASCII

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-10 Thread INADA Naoki
> > Of course. The question is not "should cb@noaa properly configure > docker?", it's "Can docker properly configure docker (soon enough)? > And if not, should we configure Python?" The third question depends > on whether fixing it for you breaks things for others. When talking about general

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-10 Thread Chris Barker - NOAA Federal
>> How common is this problem? > > Last 2 or 3 years, I don't recall having be bitten by such issue. We just got bitten by this on our CI server. Granted, we could fix it by properly configuring docker, but it would have been nice if it " just worked" -CHB

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-09 Thread INADA Naoki
> > The problem is if people have locales set for non-UTF-8, which Chinese > people often do ("GB18030 isn't just a good idea, it's the law"). > Especially forcing stdout to something other than the locale is likely > to mess things up. Oh, I didn't know non-UTF-8 is used for LC_CTYPE in these

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-09 Thread Stephan Houben
Hi Stephen, 2017-01-09 19:42 GMT+01:00 Stephen J. Turnbull < turnbull.stephen...@u.tsukuba.ac.jp>: > > Private sector may be up to date, but academic sector > (and from the state of e-stat.go.jp, government in general, I suspect) > is stuck in the Jomon era. > I went to that page, checked the

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-09 Thread Stephen J. Turnbull
INADA Naoki writes: > But when I see non UTF-8 text, I don't change locale to read such > text. Nobody does. The problem is if people have locales set for non-UTF-8, which Chinese people often do ("GB18030 isn't just a good idea, it's the law"). Especially forcing stdout to something other

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-09 Thread Barry Warsaw
On Jan 06, 2017, at 11:08 PM, Steve Dower wrote: >Passing universal_newlines will use whatever locale.getdefaultencoding() There is no locale.getdefaultencoding(); I think you mean locale.getpreferredencoding(False). (See the "Changed in version 3.3" note in $17.5.1.1 of the stdlib docs.)

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-07 Thread Nick Coghlan
On 8 January 2017 at 02:47, Stephen J. Turnbull wrote: > I agree that people around me mostly know only two encodings: "works > for me" and "mojibake", but they also use locales configured for them > by technical staff. On top of that, international students

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Victor Stinner
2017-01-07 1:06 GMT+01:00 Barry Warsaw : > For some reason it's not configured: (...) Ok, thanks for the information. > I'm not sure why that's the default inside a chroot. I found at least one good reason to use the POSIX locale to build a package: it helps to get

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Victor Stinner
2017-01-06 22:20 GMT+01:00 Barry Warsaw : >>Because I have the impression that nowadays all Linux distributions are UTF-8 >>by default and you have to show some bloody-mindedness to end up with a POSIX >>locale. > > It can still happen in some corner cases, even on Debian and

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Chris Angelico
On Sat, Jan 7, 2017 at 8:20 AM, Barry Warsaw wrote: > On Jan 06, 2017, at 07:22 AM, Stephan Houben wrote: > >>Because I have the impression that nowadays all Linux distributions are UTF-8 >>by default and you have to show some bloody-mindedness to end up with a POSIX >>locale. >

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Barry Warsaw
On Jan 06, 2017, at 07:22 AM, Stephan Houben wrote: >Because I have the impression that nowadays all Linux distributions are UTF-8 >by default and you have to show some bloody-mindedness to end up with a POSIX >locale. It can still happen in some corner cases, even on Debian and Ubuntu where

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Barry Warsaw
On Jan 05, 2017, at 05:50 PM, Victor Stinner wrote: >I guess that all users and most developers are more in the "UNIX mode" >camp. *If* we want to change the default, I suggest to use the "UNIX >mode" by default. FWIW, it seems to be a general and widespread recommendation to always pass

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Oleg Broytman
On Fri, Jan 06, 2017 at 10:15:52AM +0900, INADA Naoki wrote: > >> Always use UTF-8 > >> > >> > >> Python already always use the UTF-8 encoding on Mac OS X, Android and > >> Windows. > >> Since UTF-8 became the defacto encoding, it makes sense to always

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Victor Stinner
2017-01-06 10:50 GMT+01:00 M.-A. Lemburg : > Victor: I think you are taking the UTF-8 idea a bit too far. Hum, sorry, the PEP is still a draft, the rationale is far from perfect yet. Let me try to simplify the issue: users are unable to configure a locale for various reasons and

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Victor Stinner
2017-01-06 7:22 GMT+01:00 Stephan Houben : > How common is this problem? Last 2 or 3 years, I don't recall having be bitten by such issue. On the bug tracker, new issues are opened infrequently. * http://bugs.python.org/issue19977 opened at 2013-12-13, closed at 2014-04-27

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread M.-A. Lemburg
On 06.01.2017 04:32, Nick Coghlan wrote: > On 6 January 2017 at 12:37, Victor Stinner wrote: >> 2017-01-06 3:10 GMT+01:00 Stephen J. Turnbull >> : >>> I've quoted Victor out of context, and his other posts make me very >>> doubtful

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Steven D'Aprano
On Fri, Jan 06, 2017 at 02:54:49AM +0100, Victor Stinner wrote: > Let's say that you have the filename b'nonascii\xff': it's decoded as > 'nonascii\xdcff' by the UTF-8 mode. How do GUIs handle such filename? > (I don't know the answer, it's a real question ;-)) I ran this in Python 2.7 to create

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Victor Stinner
2017-01-06 3:10 GMT+01:00 Stephen J. Turnbull : > The point of this, I suppose, is that piping to xargs works by > default. Please read the second version (latest) version of my PEP 540 which contains a new "Use Cases" section which helps to define issues and

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Stephen J. Turnbull
Victor Stinner writes: > Python 3.6 is not exactly in the first or the later category: "it > depends". > > To read data from the operating system, Python 3.6 behaves in "UNIX > mode": os.listdir() *does* return invalid filenames, it uses a funny > encoding using surrogates. > > To write

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Victor Stinner
2017-01-06 2:15 GMT+01:00 INADA Naoki : >>> Always use UTF-8 (...) >>Please don't! (...) > > For stdio (including console), PYTHONIOENCODING can be used for > supporting legacy system. > e.g. `export PYTHONIOENCODING=$(locale charmap)` The problem with ignoring the

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Victor Stinner
Ok, I modified my PEP: the POSIX locale now enables the UTF-8 mode. 2017-01-05 18:10 GMT+01:00 Victor Stinner : > A common request is that "Python just works" without having to pass a > command line option or set an environment variable. Maybe the default > behaviour

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread INADA Naoki
>> Always use UTF-8 >> >> >> Python already always use the UTF-8 encoding on Mac OS X, Android and >> Windows. >> Since UTF-8 became the defacto encoding, it makes sense to always use it on >> all >> platforms with any locale. > >Please don't! I use different locales and

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Steven D'Aprano
On Thu, Jan 05, 2017 at 04:38:22PM +0100, Victor Stinner wrote: [...] > Python 3 promotes Unicode everywhere including filenames. A solution to > support filenames not decodable from the locale encoding was found: the > ``surrogateescape`` error handler (`PEP 393 >

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Victor Stinner
> https://www.python.org/dev/peps/pep-0540/ I read the PEP 538, PEP 540, and issues related to switching to UTF-8. At least, I can say one thing: people have different points of view :-) To understand why people disagree, I tried to categorize the different point of views and Python

[Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Victor Stinner
Hi, Nick Coghlan asked me to review his PEP 538 "Coercing the legacy C locale to C.UTF-8": https://www.python.org/dev/peps/pep-0538/ Nick wants to change the default behaviour. I'm not sure that I'm brave enough to follow this direction, so I proposed my old "-X utf8" command line idea as a new