[issue34914] Clarify text encoding used to enable UTF-8 mode

2018-10-18 Thread Nick Coghlan
Nick Coghlan added the comment: Your explanation is why this is a docs enhancement proposal rather than a bug report: as far as we're aware, all encodings that get used as locale encodings have the property that encoding "-X utf8" with the locale encoding gives the same answer as encoding it wi

[issue34914] Clarify text encoding used to enable UTF-8 mode

2018-10-10 Thread STINNER Victor
STINNER Victor added the comment: Well, I'm not saying that using gb18030 with UTF-8 will be just fine for everything. Mojibake is likely around the corner :-) C locale coercion and the UTF-8 mode are workarounds for the crappy and wild Unix world :-) --

[issue34914] Clarify text encoding used to enable UTF-8 mode

2018-10-10 Thread STINNER Victor
STINNER Victor added the comment: I'm not sure that I understand your issue. There are 3 ways to enable the UTF-8 Mode: * if the LC_CTYPE locale is "C" or "POSIX" * if PYTHONUTF8 env var is equal to "1" * using -X utf8 or -X utf8=1 command line option For the first 2 cases are fine if the lo

[issue34914] Clarify text encoding used to enable UTF-8 mode

2018-10-06 Thread Nick Coghlan
New submission from Nick Coghlan : While working on the docs updates for bpo-34589 (clarifying that "PYTHONCOERCECLOCALE=0" and "PYTHONCOERCELOCALE=warn" need both the environment variable name and the value to be encoded as ASCII in order to have any effect), I realised that it was less expl