[issue19519] Parser: don't transcode input string to UTF-8 if it is already encoded to UTF-8

2015-10-02 Thread STINNER Victor
Changes by STINNER Victor : -- resolution: -> out of date status: open -> closed ___ Python tracker ___ ___ Python-bugs-list mailing

[issue19519] Parser: don't transcode input string to UTF-8 if it is already encoded to UTF-8

2013-11-12 Thread STINNER Victor
STINNER Victor added the comment: > If the code is to be simplified, unifying the cases of string-based parsing > and file-based parsing might be a worthwhile goal. Ah yes, it enc and encoding attributes are almost the same, it would be nice to merge them! But I'm not sure that I understand, d

[issue19519] Parser: don't transcode input string to UTF-8 if it is already encoded to UTF-8

2013-11-07 Thread Martin v . Löwis
Martin v. Löwis added the comment: tok->enc and tok->encoding should always have the same value, except that tok->enc gets set earlier. tok->enc is used when parsing from strings, to remember what codec to use. For file based parsing, the codec object created knows what encoding to use; for s

[issue19519] Parser: don't transcode input string to UTF-8 if it is already encoded to UTF-8

2013-11-07 Thread STINNER Victor
STINNER Victor added the comment: > The parser should check that the input is actually valid UTF-8 data. Ah yes, correct. It looks like input data is still checked for valid UTF-8 data. I suppose that the byte strings should be decoded from UTF-8 because Python 3 manipulates Unicode strings, not

[issue19519] Parser: don't transcode input string to UTF-8 if it is already encoded to UTF-8

2013-11-07 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: The parser should check that the input is actually valid UTF-8 data. -- ___ Python tracker ___ ___

[issue19519] Parser: don't transcode input string to UTF-8 if it is already encoded to UTF-8

2013-11-07 Thread STINNER Victor
STINNER Victor added the comment: The patch has an issue, importing test.bad_coding2 (UTF-8 with a BOM) does not raise a SyntaxError anymore. -- Added file: http://bugs.python.org/file32528/input_is_utf8.patch ___ Python tracker

[issue19519] Parser: don't transcode input string to UTF-8 if it is already encoded to UTF-8

2013-11-07 Thread STINNER Victor
Changes by STINNER Victor : Removed file: http://bugs.python.org/file32526/input_is_utf8.patch ___ Python tracker ___ ___ Python-bugs-list mai

[issue19519] Parser: don't transcode input string to UTF-8 if it is already encoded to UTF-8

2013-11-07 Thread STINNER Victor
New submission from STINNER Victor: Python parser (Parser/tokenizer.c) has a translate_into_utf8() function to decode a string from the input encoding and encode it to UTF-8. This function is unnecessary if the input string is already encoded to UTF-8, which is something common nowadays. Linux