[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2016-02-11 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Serhiy: Removing the shortcut would slow down the tokenizer a lot since UTF-8 encoded source code is the norm, not the exception. The "problem" here is that the tokenizer trusts the source code in being in the correct encoding when you use one of utf-8 or

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2016-02-10 Thread Jim Jewett
Jim Jewett added the comment: Does (did?) the utf8 special case allow for a much faster startup time, by not requiring all of the codecs machinery? -- nosy: +Jim.Jewett ___ Python tracker _

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2016-02-09 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I think the correct way is not add "utf8" to special case, but removes "utf-8". Here is a patch. -- components: +Interpreter Core stage: -> patch review type: -> behavior Added file: http://bugs.python.org/file41879/bad_utf8.patch _

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2015-12-27 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 27.12.2015 02:05, Serhiy Storchaka wrote: > >> I wonder why this does not trigger the exception. > > Because in case of utf-8 and iso-8859-1 decoding and encoding steps are > omitted. > > In general case the input is decoded from specified encoding and

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2015-12-26 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > I wonder why this does not trigger the exception. Because in case of utf-8 and iso-8859-1 decoding and encoding steps are omitted. In general case the input is decoded from specified encoding and than encoded to UTF-8 for parser. But for utf-8 and iso-8859

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2015-12-26 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 26.12.2015 22:46, STINNER Victor wrote: > > In Python, there are multiple implementations of the utf-8 codec with many > shortcuts. I'm not surprised to see bugs depending on the exact syntax of > the utf-8 codec name. Maybe we need to share even more cod

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2015-12-26 Thread STINNER Victor
STINNER Victor added the comment: In Python, there are multiple implementations of the utf-8 codec with many shortcuts. I'm not surprised to see bugs depending on the exact syntax of the utf-8 codec name. Maybe we need to share even more code to normalize and compare codec names. (I think that py

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2015-12-26 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Please fold these cases into one: if (strcmp(buf, "utf-8") == 0 || strncmp(buf, "utf-8-", 6) == 0) return "utf-8"; else if (strcmp(buf, "utf8") == 0 || strncmp(buf, "utf8-", 6) == 0) return "utf-8"; -> if

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2015-12-26 Thread 王杰
王杰 added the comment: I'm learning about Python's encoding rule and I write it as a test case. -- ___ Python tracker ___ ___ Python-bu

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2015-12-26 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: The problem is not that an error is raised with coding:utf8, but that it isn't raised with coding:utf-8. Here is an example with bad iso8859-3. An error is raised as expected. -- nosy: +serhiy.storchaka Added file: http://bugs.python.org/file41425/ba

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2015-12-26 Thread STINNER Victor
STINNER Victor added the comment: > I has a file "gbk-utf-8.py" and it's encoding is GBK. I don't understand why you use "# coding: utf-8" if the file is encoded to GBK. Why not using "# coding: gbk"? -- ___ Python tracker

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2015-12-26 Thread STINNER Victor
STINNER Victor added the comment: > Here is a fix with a patch. Oops, I mean 'with an unit test', sorry ;-) -- ___ Python tracker ___ ___

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2015-12-26 Thread STINNER Victor
STINNER Victor added the comment: Here is a fix with a patch. -- keywords: +patch versions: +Python 2.7 Added file: http://bugs.python.org/file41424/utf8.patch ___ Python tracker ___

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2015-12-26 Thread 王杰
王杰 added the comment: Python 2.7 -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mai

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2015-12-25 Thread Terry J. Reedy
Terry J. Reedy added the comment: What Python version? -- nosy: +terry.reedy ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2015-12-25 Thread Terry J. Reedy
Changes by Terry J. Reedy : -- nosy: +haypo ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2015-12-25 Thread Terry J. Reedy
Changes by Terry J. Reedy : -- nosy: +doerwalter, lemburg ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

2015-12-23 Thread 王杰
New submission from 王杰: I use CentOS 7.0 and change LANG=gbk. I has a file "gbk-utf-8.py" and it's encoding is GBK. # -*- coding:utf-8 -*- import chardet if __name__ == '__main__': s = '中文' print s, chardet.detect(s) I execute it and everything is ok. However it raise "SyntaxError" (a