Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Chris Angelico
On Sun, Nov 15, 2015 at 12:47 PM, Glenn Linderman wrote: > On 11/14/2015 5:37 PM, Chris Angelico wrote: > > On Sun, Nov 15, 2015 at 12:27 PM, Glenn Linderman > wrote: > > Notepad defaults to ANSI encoding, as I think it always has. UTF-8 is an >

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread eryksun
On Sat, Nov 14, 2015 at 7:06 PM, Steve Dower wrote: > The native encoding on Windows has been UTF-16 since Windows NT. Obviously > we've survived without Python tokenization support for a long time, but > every API uses it. Windows 2000 was the first version to have broad

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread eryksun
On Sat, Nov 14, 2015 at 7:15 PM, Chris Angelico wrote: > Can the py.exe launcher handle a UTF-16 shebang? (I'm pretty sure Unix > program loaders won't.) That alone might be a reason for strongly > encouraging ASCII-compat encodings. The launcher supports shebangs encoded as

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Stephen J. Turnbull
Steve Dower writes: > Saying [UTF-16] is rarely used is rather exposing your own > unawareness though - it could arguably be the most commonly used > encoding (depending on how you define "used"). Because we're discussing the storage of .py files, the relevant definition is the one used by

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Random832
Glenn Linderman writes: > On 11/14/2015 5:37 PM, Chris Angelico wrote: > > Thanks. Is "ANSI" always an eight-bit ASCII-compatible encoding? > > I wouldn't trust an answer to this question that didn't come from > someone that used Windows with Chinese, Japanese, or Korean,

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Steven D'Aprano
On Sat, Nov 14, 2015 at 09:19:37PM +0200, Serhiy Storchaka wrote: > If the support of UTF-16 and UTF-32 is planned, I'll take this to > attention during refactoring. But in many places besides the tokenizer > the ASCII compatible encoding of source files is expected. Perhaps another way of

[Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Serhiy Storchaka
For now UTF-16 and UTF-32 source encodings are not supported. There is a comment in Parser/tokenizer.c: /* Disable support for UTF-16 BOMs until a decision is made whether this needs to be supported. */ Can we make a decision whether this support will be added in foreseeable

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Victor Stinner
These encodings are rarely used. I don't think that any text editor use them. Editors use ascii, latin1, utf8 and... all locale encoding. But I don't know any OS using UTF-16 as a locale encoding. UTF-32 wastes disk space. Ok, even if it exists, Python already accepts a very wide range of

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Benjamin Peterson
I agree that supporting UTF-16 doesn't seem terribly useful. Also, thank you for giving the tokenizer some love! On Sat, Nov 14, 2015, at 11:19, Serhiy Storchaka wrote: > For now UTF-16 and UTF-32 source encodings are not supported. There is a > comment in Parser/tokenizer.c: > > /*

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Serhiy Storchaka
On 15.11.15 00:56, Victor Stinner wrote: These encodings are rarely used. I don't think that any text editor use them. Editors use ascii, latin1, utf8 and... all locale encoding. But I don't know any OS using UTF-16 as a locale encoding. UTF-32 wastes disk space. AFAIK the standard Windows

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Glenn Linderman
On 11/14/2015 3:21 PM, Serhiy Storchaka wrote: On 15.11.15 00:56, Victor Stinner wrote: These encodings are rarely used. I don't think that any text editor use them. Editors use ascii, latin1, utf8 and... all locale encoding. But I don't know any OS using UTF-16 as a locale encoding. UTF-32

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Random832
Victor Stinner writes: > These encodings are rarely used. I don't think that any text editor > use them. MS Windows' Notepad can be made to use UTF-16. ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Steve Dower
The native encoding on Windows has been UTF-16 since Windows NT. Obviously we've survived without Python tokenization support for a long time, but every API uses it. I've hit a few cases where it would have been handy for Python to be able to detect it, though nothing I couldn't work around.

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Chris Angelico
On Sun, Nov 15, 2015 at 12:06 PM, Steve Dower wrote: > The native encoding on Windows has been UTF-16 since Windows NT. Obviously > we've survived without Python tokenization support for a long time, but > every API uses it. > > I've hit a few cases where it would have

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Glenn Linderman
On 11/14/2015 5:15 PM, Chris Angelico wrote: Can the py.exe launcher handle a UTF-16 shebang? (I'm pretty sure Unix program loaders won't.) That alone might be a reason for strongly encouraging ASCII-compat encodings. That raises an interesting question about if py.exe can handle a leading

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Random832
Chris Angelico writes: > Can the py.exe launcher handle a UTF-16 shebang? (I'm pretty sure Unix > program loaders won't.) A lot of them can't handle UTF-8 with a BOM, either. > That alone might be a reason for strongly encouraging ASCII-compat > encodings. A "python" or

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Glenn Linderman
On 11/14/2015 5:15 PM, Chris Angelico wrote: I think even Notepad defaults to UTF-8 for files, now. Just installed Windows 10 on a new machine, and upgraded to the latest Windows 10 release, 1511. Notepad defaults to ANSI encoding, as I think it always has. UTF-8 is an option, and it does

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Chris Angelico
On Sun, Nov 15, 2015 at 12:27 PM, Glenn Linderman wrote: > Notepad defaults to ANSI encoding, as I think it always has. UTF-8 is an > option, and it does seem to try to notice the original encoding of the file, > when editing old files, but when creating a new one

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Glenn Linderman
On 11/14/2015 5:37 PM, Chris Angelico wrote: On Sun, Nov 15, 2015 at 12:27 PM, Glenn Linderman wrote: Notepad defaults to ANSI encoding, as I think it always has. UTF-8 is an option, and it does seem to try to notice the original encoding of the file, when editing old