[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-10-26 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: I added a cp65001 codec to Python 3.3: see issue #13216. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12281 ___

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-10-18 Thread Roundup Robot
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset af0800b986b7 by Victor Stinner in branch 'default': Issue #12281: Rewrite the MBCS codec to handle correctly replace and ignore http://hg.python.org/cpython/rev/af0800b986b7 -- nosy: +python-dev

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-10-18 Thread Roundup Robot
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 5841920d1ef6 by Victor Stinner in branch 'default': Issue #12281: Skip code page tests on non-Windows platforms http://hg.python.org/cpython/rev/5841920d1ef6 -- ___

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-10-18 Thread Roundup Robot
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 413b89242766 by Victor Stinner in branch 'default': Issue #12281: Fix test_codecs.test_cp932() on Windows XP http://hg.python.org/cpython/rev/413b89242766 -- ___ Python

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-10-18 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: test_codecs pass on Windows XP and Windows Seven buildbots. -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12281

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-10-17 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: mbcs6.patch: update patch to tip. -- Added file: http://bugs.python.org/file23430/mbcs6.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12281

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-10-17 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: Removed file: http://bugs.python.org/file22374/mbcs4.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12281 ___

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-10-17 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: Removed file: http://bugs.python.org/file22389/mbcs5.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12281 ___

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-10-17 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Version 7 of my patch. This patch is ready for a review: I implemented all TODO. Summary of the patch (of this issue): - fix mbcs encoding to handle correctly ignore replace error handlers on all Windows version - the mbcs

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-10-17 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: Removed file: http://bugs.python.org/file23430/mbcs6.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12281 ___

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-10-17 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +ezio.melotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12281 ___ ___

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-17 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: What about something like .decode('mbcs', errors='windows')? Yes, we can use an error handler specific to the mbcs codec, but I would prefer to not introduce special error handlers. For os.fsencode(), we can keep it unchanged, or

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-16 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Patch version 5 fixes the encode/decode flags on Windows XP. The codecs give different result on XP and Seven in some cases: Seven: - b'\x81\x00abc'.decode('cp932', 'replace') returns '\u30fb\x00abc' - '\udc80'.encode(CP_UTF8,

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-16 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc amaur...@gmail.com added the comment: What is the use of these code_page_encode() functions? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12281 ___

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-16 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: TODO: add more tests CP_UTF8: if self.vista_or_later: tests.append(('\udc80', 'strict', None)) tests.append(('\udc80', 'ignore', b'')) tests.append(('\udc80', 'replace', b'\xef\xbf\xbd'))

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-16 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: What is the use of these code_page_encode() functions? I wrote them to be able to write tests. We can maybe use them to implement the Python code page codecs using a custom codec register function: see msg138246. Windows codecs

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-16 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc amaur...@gmail.com added the comment: I don't know yet how Windows do decode bytes filenames (especially how it handles undecodable bytes), I suppose that it uses MultiByteToWideChar using cp=CP_ACP and flags=0. It's likely, yes. But you don't need a new codec function

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-15 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Patch version 4 (mbcs4.patch): - fix encode and decode flags depending on the code page and Windows version, e.g. use WC_ERR_INVALID_CHARS instead of WC_NO_BEST_FIT_CHARS for CP_UTF8 on Windows Vista and later - fix usage of the

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-15 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: Removed file: http://bugs.python.org/file22282/mbcs.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12281 ___

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-15 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: Removed file: http://bugs.python.org/file22315/mbcs2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12281 ___

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-15 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: Removed file: http://bugs.python.org/file22340/mbcs3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12281 ___

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-13 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Patch version 3: - add unit tests for code pages 932, 1252, CP_UTF7 and CP_UTF8 - fix encode/decode flags for CP_UTF7/CP_UTF8 - fix encode name on UnicodeDecodeError, support also CP_UTF7 and CP_UTF8 code page names TODO: -

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-13 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Using my patch, it is possible create a codec for any code page on demand: register a function checking if the encoding name starts with cp and ends with a valid code page number. Even if it is bad idea to set the OEM code page to

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-10 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Version 2 of my patch (mbcs2.patch): - patch also the encoder: fix ignore/replace depending on the Windows version, support any error handler: encode character per character if encoding in strict mode fails - Add

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-10 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Example on Windows Vista with ANSI=cp932: import codecs codecs.code_page_encode(1252, '\xe9') (b'\xe9', 1) codecs.mbcs_encode('\xe9') ... UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-10 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Decode examples, ANSI=cp932: codecs.code_page_decode(1252, b'\x80') ('\u20ac', 1) codecs.code_page_decode(932, b'\x82') ... UnicodeDecodeError: 'mbcs' codec can't decode bytes in position 0--1: No mapping for the Unicode

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-08 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: mbcs.patch fixes PyUnicode_DecodeMBCS(): - only use flags=0 if errors=replace on Windows = Vista or if errors=ignore on Windows Vista - support any error handler - support any code page (but the code page is hardcoded to CP_ACP)

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-08 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Example with ANSI=cp932 (on Windows Seven): - b'abc\xffdef'.decode('mbcs', 'replace') gives 'abc\uf8f3def' - b'abc\xffdef'.decode('mbcs', 'ignore') gives 'abcdef' -- nosy: +ocean-city

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-08 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Example with ANSI=cp932 (on Windows Seven): - b'abc\xffdef'.decode('mbcs', 'replace') gives 'abc\uf8f3def' - b'abc\xffdef'.decode('mbcs', 'ignore') gives 'abcdef' Oh, and b'\xff'.decode('mbcs', 'surrogateescape') gives '\udcff'

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-07 Thread STINNER Victor
New submission from STINNER Victor victor.stin...@haypocalc.com: Starting at Python 3.2, the MBCS codec uses MultiByteToWideChar() to decode bytes using flags=MB_ERR_INVALID_CHARS by default (strict error handler), flags=0 for the ignore error handler, and raise a ValueError for other error

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-07 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: MBCS codec was changed by #850997. Martin von Loewis proposed solutions to implement other error handlers in msg19180. -- ___ Python tracker rep...@bugs.python.org

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

2011-06-07 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: -- nosy: +loewis ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12281 ___ ___