[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-05-06 Thread zy
New submission from zy : let s='\xff\n' The expected result of s.decode('gb2312', 'ignore') is u"\n", while in 2.6.6 it is u"". s can be replaced with chr(m) + chr(n) , where m is in range of 128~255, and n in 0~127. In the above cases, try decoding from chr(n) will never interfere with la

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-05-06 Thread Ezio Melotti
Changes by Ezio Melotti : -- nosy: +ezio.melotti stage: -> test needed versions: +Python 2.7 -Python 2.6 ___ Python tracker ___ ___ P

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-05-06 Thread Éric Araujo
Changes by Éric Araujo : -- nosy: +haypo, lemburg ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.p

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-05-07 Thread Terry J. Reedy
Terry J. Reedy added the comment: u'' in 2.7.1 also, on winxp -- nosy: +terry.reedy ___ Python tracker ___ ___ Python-bugs-list maili

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-05-07 Thread STINNER Victor
STINNER Victor added the comment: So the correct result for b'\xff\n'.decode('gb2312', 'replace') is u'?\n'? -- versions: +Python 3.1, Python 3.2, Python 3.3 ___ Python tracker

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-05-07 Thread zy
zy added the comment: > So the correct result for b'\xff\n'.decode('gb2312', 'replace') is u'?\n'? I think it should be so. This behavior does not leave out possible information, has no side-effect on later decodings, and should the '\n' indeed be redundant, an output of u'?\n' would unlikel

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-05-07 Thread STINNER Victor
STINNER Victor added the comment: _codecs_cn implements different multibyte encodings: gb2312, gbkext, gbcommon, gb18030ext, gbk, gb18030. And there are other Asian multibyte encodings: big5 family, ISO 2202 family, JIS family, korean encodings (KSX1001, EUC_KR, CP949, ...), Big5, CP950, ...

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-05-07 Thread zy
zy added the comment: I do not have documents on this subject. Though, I found that GNU iconv(1) behaves the same as my proposed behavior. My reading of the source code suggests that iconv(1) treat all encodings equally, which I think should also be true for python. As of security concerns,

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-05-11 Thread STINNER Victor
STINNER Victor added the comment: I asked if the change is correct on iconv mail list. Here is a copy of an answer. De: Bruno Haible À: [iconv mailing list] Cc: Victor Stinner Sujet: Re: [bug-gnu-libiconv] Invalid byte sequences and multiybyte encodings Date: Tue, 10 May 2011 1

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-05-11 Thread STINNER Victor
STINNER Victor added the comment: Oh, the HZ codec has no test! And what is this horrible BLOB, Lib/test/cjkencodings_test.py? -- ___ Python tracker ___ ___

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-05-11 Thread STINNER Victor
Changes by STINNER Victor : -- dependencies: +HZ codec has no test ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscrib

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-05-11 Thread STINNER Victor
Changes by STINNER Victor : -- nosy: +hyeshik.chang ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-05-30 Thread Roundup Robot
Roundup Robot added the comment: New changeset 3b1b06570cf9 by Victor Stinner in branch '2.7': Issue #12016: my_fgets() now always clears errors before calling fgets(). Fix http://hg.python.org/cpython/rev/3b1b06570cf9 New changeset de07f90ef45c by Victor Stinner in branch '3.2': Issue #12016:

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-05-30 Thread STINNER Victor
Changes by STINNER Victor : -- Removed message: http://bugs.python.org/msg137334 ___ Python tracker ___ ___ Python-bugs-list mailing l

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-05-30 Thread STINNER Victor
STINNER Victor added the comment: - I added tests for the HZ codec and some ISO 2022 codecs: #12057 - I fixed IncrementalEncoder.encode() (of multibytecodec ): #12100 - I fixed IncrementalEncoder.reset() (of multibytecodec): #12171 I can now work confidently on this issue. I will try to patch

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-06-03 Thread Roundup Robot
Roundup Robot added the comment: New changeset 3610841f7357 by Victor Stinner in branch '3.2': Issue #12016: Reindent decoders of HK and JP codecs http://hg.python.org/cpython/rev/3610841f7357 New changeset aa07c1237f4e by Victor Stinner in branch 'default': (Merge 3.2) Issue #12016: Reindent d

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-06-03 Thread Roundup Robot
Roundup Robot added the comment: New changeset 8572bf1b56ec by Victor Stinner in branch '3.2': Issue #12016: Add test_errorhandle() to TestBase_Mapping of http://hg.python.org/cpython/rev/8572bf1b56ec New changeset c3dc94d53ef8 by Victor Stinner in branch 'default': (Merge 3.2) Issue #12016: Ad

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-06-03 Thread STINNER Victor
STINNER Victor added the comment: cjk_decode.patch: - patch *all* CJK decoders to replace only the first byte of an invalid byte sequence (by U+FFFD). Example from the issue title: b'\xff\n'.decode('gb2312', 'replace') gives now '�\n' instead of just '�' - add at least one unit test for *eac

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-07-07 Thread Roundup Robot
Roundup Robot added the comment: New changeset 16cbd84de848 by Victor Stinner in branch 'default': Issue #12016: Multibyte CJK decoders now resynchronize faster http://hg.python.org/cpython/rev/16cbd84de848 -- ___ Python tracker

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

2011-07-07 Thread STINNER Victor
STINNER Victor added the comment: > Because I consider this issue as a bug, I would like > to apply this patch to 2.7, 3.2 and 3.3. It is maybe a bug but it is also an important change on Python behaviour, so finally I prefer to only change (fix) Python 3.3. Thanks for reporting the bug zy (c