On Fri, Jan 13, 2017 at 12:12 AM, Victor Stinner
wrote:
> 2017-01-12 1:23 GMT+01:00 INADA Naoki :
>> I'm ±0 to surrogateescape by default. I feel +1 for stdout and -1 for stdin.
>
> The use case is to be able to write a Python 3 program which
On Thu, Jan 12, 2017 at 7:50 AM, Stephen J. Turnbull wrote:
> > So I see no downside to using utf-8 when the C locale is defined.
>
> You don't have much incentive to look for one, and I doubt you have
> the experience of the edge cases (if you do, please
Chris Barker writes:
> 2) There are non-ascii file names, etc. on this supposedly ASCII system. In
> which case, do folks expect their Python programs to find these issues and
> raise errors? They may well expect that their Python program will not let
> them try to save a non ASCII filename,
Stephan Houben writes:
> I think this may be a minor concern ultimately, but it would be
> nice if we had some API to at least reliable answer the question
> "can I safely output non-ASCII myself?"
You can't; stdout might be a TTY, pipe, or socket in which case you
have no way to determine
2017-01-12 1:23 GMT+01:00 INADA Naoki :
> I'm ±0 to surrogateescape by default. I feel +1 for stdout and -1 for stdin.
The use case is to be able to write a Python 3 program which works
work UNIX pipes without failing with encoding errors:
Hi Petr,
2017-01-11 12:22 GMT+01:00 Petr Viktorin :
>
> For example, this may mean that a built-in Python string sort will give you
>> a different ordering than invoking the external "sort" command.
>> I have been bitten by this kind of issues, leading to spurious "diffs" if
>>
> My PEP 540 is different than Nick's PEP 538, even for the POSIX
> locale. I propose to always use the surrogateescape error handler,
> whereas Nick wants to keep the strict error handler for inputs and
> outputs.
> https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler
>
> The
2017-01-06 10:50 GMT+01:00 M.-A. Lemburg :
> Victor: I think you are taking the UTF-8 idea a bit too far.
> Nick was trying to address the situation where the locale is
> set to "C", or rather not set at all (in which case the lib C
> defaults to the "C" locale). The latter is a
>
> > (I wonder if we can use LC_CTYPE=UTF-8...)
>
> Syntactically incorrect: that means the language UTF-8.
> "LC_TYPE=.UTF-8" might work, but IIRC the language tag is required,
> the region and encoding are optional. Thus ja_JP, ja.UTF-8 are OK,
> but .UTF-8 is not.
I'm sorry. I know it, but
>
> That kind of thing makes me very nervous, and I think justifiably so.
> And it's only *sufficient* to justify a change to Python's defaults if
> Python checks for and accurately identifies when it's in a container.
>
In my company, we use network boot servers.
To reduce boot image, the image
INADA Naoki writes:
> When talking about general Docker image, using C locale is OK for
> most cases. In other words, images using C locale is properly
> configured.
s/properly/compatibly/. "Proper" has strong connotations of
"according to protocol". Configuring LC_CTYPE for ASCII
>
> Of course. The question is not "should cb@noaa properly configure
> docker?", it's "Can docker properly configure docker (soon enough)?
> And if not, should we configure Python?" The third question depends
> on whether fixing it for you breaks things for others.
When talking about general
>> How common is this problem?
>
> Last 2 or 3 years, I don't recall having be bitten by such issue.
We just got bitten by this on our CI server. Granted, we could fix it
by properly configuring docker, but it would have been nice if it "
just worked"
-CHB
>
> The problem is if people have locales set for non-UTF-8, which Chinese
> people often do ("GB18030 isn't just a good idea, it's the law").
> Especially forcing stdout to something other than the locale is likely
> to mess things up.
Oh, I didn't know non-UTF-8 is used for LC_CTYPE in these
Hi Stephen,
2017-01-09 19:42 GMT+01:00 Stephen J. Turnbull <
turnbull.stephen...@u.tsukuba.ac.jp>:
>
> Private sector may be up to date, but academic sector
> (and from the state of e-stat.go.jp, government in general, I suspect)
> is stuck in the Jomon era.
>
I went to that page, checked the
INADA Naoki writes:
> But when I see non UTF-8 text, I don't change locale to read such
> text.
Nobody does.
The problem is if people have locales set for non-UTF-8, which Chinese
people often do ("GB18030 isn't just a good idea, it's the law").
Especially forcing stdout to something other
On Jan 06, 2017, at 11:08 PM, Steve Dower wrote:
>Passing universal_newlines will use whatever locale.getdefaultencoding()
There is no locale.getdefaultencoding(); I think you mean
locale.getpreferredencoding(False). (See the "Changed in version 3.3" note in
$17.5.1.1 of the stdlib docs.)
On 8 January 2017 at 02:47, Stephen J. Turnbull
wrote:
> I agree that people around me mostly know only two encodings: "works
> for me" and "mojibake", but they also use locales configured for them
> by technical staff. On top of that, international students
2017-01-07 1:06 GMT+01:00 Barry Warsaw :
> For some reason it's not configured: (...)
Ok, thanks for the information.
> I'm not sure why that's the default inside a chroot.
I found at least one good reason to use the POSIX locale to build a
package: it helps to get
2017-01-06 22:20 GMT+01:00 Barry Warsaw :
>>Because I have the impression that nowadays all Linux distributions are UTF-8
>>by default and you have to show some bloody-mindedness to end up with a POSIX
>>locale.
>
> It can still happen in some corner cases, even on Debian and
On Sat, Jan 7, 2017 at 8:20 AM, Barry Warsaw wrote:
> On Jan 06, 2017, at 07:22 AM, Stephan Houben wrote:
>
>>Because I have the impression that nowadays all Linux distributions are UTF-8
>>by default and you have to show some bloody-mindedness to end up with a POSIX
>>locale.
>
On Jan 06, 2017, at 07:22 AM, Stephan Houben wrote:
>Because I have the impression that nowadays all Linux distributions are UTF-8
>by default and you have to show some bloody-mindedness to end up with a POSIX
>locale.
It can still happen in some corner cases, even on Debian and Ubuntu where
On Jan 05, 2017, at 05:50 PM, Victor Stinner wrote:
>I guess that all users and most developers are more in the "UNIX mode"
>camp. *If* we want to change the default, I suggest to use the "UNIX
>mode" by default.
FWIW, it seems to be a general and widespread recommendation to always pass
On Fri, Jan 06, 2017 at 10:15:52AM +0900, INADA Naoki
wrote:
> >> Always use UTF-8
> >>
> >>
> >> Python already always use the UTF-8 encoding on Mac OS X, Android and
> >> Windows.
> >> Since UTF-8 became the defacto encoding, it makes sense to always
2017-01-06 10:50 GMT+01:00 M.-A. Lemburg :
> Victor: I think you are taking the UTF-8 idea a bit too far.
Hum, sorry, the PEP is still a draft, the rationale is far from
perfect yet. Let me try to simplify the issue: users are unable to
configure a locale for various reasons and
2017-01-06 7:22 GMT+01:00 Stephan Houben :
> How common is this problem?
Last 2 or 3 years, I don't recall having be bitten by such issue.
On the bug tracker, new issues are opened infrequently.
* http://bugs.python.org/issue19977 opened at 2013-12-13, closed at 2014-04-27
On 06.01.2017 04:32, Nick Coghlan wrote:
> On 6 January 2017 at 12:37, Victor Stinner wrote:
>> 2017-01-06 3:10 GMT+01:00 Stephen J. Turnbull
>> :
>>> I've quoted Victor out of context, and his other posts make me very
>>> doubtful
On Fri, Jan 06, 2017 at 02:54:49AM +0100, Victor Stinner wrote:
> Let's say that you have the filename b'nonascii\xff': it's decoded as
> 'nonascii\xdcff' by the UTF-8 mode. How do GUIs handle such filename?
> (I don't know the answer, it's a real question ;-))
I ran this in Python 2.7 to create
2017-01-06 3:10 GMT+01:00 Stephen J. Turnbull
:
> The point of this, I suppose, is that piping to xargs works by
> default.
Please read the second version (latest) version of my PEP 540 which
contains a new "Use Cases" section which helps to define issues and
Victor Stinner writes:
> Python 3.6 is not exactly in the first or the later category: "it
> depends".
>
> To read data from the operating system, Python 3.6 behaves in "UNIX
> mode": os.listdir() *does* return invalid filenames, it uses a funny
> encoding using surrogates.
>
> To write
2017-01-06 2:15 GMT+01:00 INADA Naoki :
>>> Always use UTF-8 (...)
>>Please don't! (...)
>
> For stdio (including console), PYTHONIOENCODING can be used for
> supporting legacy system.
> e.g. `export PYTHONIOENCODING=$(locale charmap)`
The problem with ignoring the
Ok, I modified my PEP: the POSIX locale now enables the UTF-8 mode.
2017-01-05 18:10 GMT+01:00 Victor Stinner :
> A common request is that "Python just works" without having to pass a
> command line option or set an environment variable. Maybe the default
> behaviour
>> Always use UTF-8
>>
>>
>> Python already always use the UTF-8 encoding on Mac OS X, Android and
>> Windows.
>> Since UTF-8 became the defacto encoding, it makes sense to always use it on
>> all
>> platforms with any locale.
>
>Please don't! I use different locales and
On Thu, Jan 05, 2017 at 04:38:22PM +0100, Victor Stinner wrote:
[...]
> Python 3 promotes Unicode everywhere including filenames. A solution to
> support filenames not decodable from the locale encoding was found: the
> ``surrogateescape`` error handler (`PEP 393
>
> https://www.python.org/dev/peps/pep-0540/
I read the PEP 538, PEP 540, and issues related to switching to UTF-8. At
least, I can say one thing: people have different points of view :-)
To understand why people disagree, I tried to categorize the different point of
views and Python
Hi,
Nick Coghlan asked me to review his PEP 538 "Coercing the legacy C
locale to C.UTF-8":
https://www.python.org/dev/peps/pep-0538/
Nick wants to change the default behaviour. I'm not sure that I'm
brave enough to follow this direction, so I proposed my old "-X utf8"
command line idea as a new
36 matches
Mail list logo