[issue8092] utf8, backslashreplace and surrogates

2010-03-08 Thread STINNER Victor
New submission from STINNER Victor : utf8 encoder doesn't work in backslashreplace error handler: >>> "\uDC80".encode("utf8", "backslashreplace") TypeError: error handler should have returned bytes -- components: Unicode messages: 100678 nosy: haypo severity: normal status: open title:

[issue8092] utf8, backslashreplace and surrogates

2010-03-08 Thread STINNER Victor
STINNER Victor added the comment: See also issue #6697. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http

[issue8092] utf8, backslashreplace and surrogates

2010-03-08 Thread STINNER Victor
STINNER Victor added the comment: This issue is a regression introduced by r72208 to fix the issue #3672. Attached patch fixes PyUnicode_EncodeUTF8() if unicode_encode_call_errorhandler() returns an unicode string (eg. backslackreplace error handler). I don't know unicodeobject.c code (very w

[issue8092] utf8, backslashreplace and surrogates

2010-03-08 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- nosy: +lemburg, loewis priority: -> normal stage: -> patch review type: -> behavior versions: +Python 3.2 ___ Python tracker ___

[issue8092] utf8, backslashreplace and surrogates

2010-03-09 Thread Walter Dörwald
Walter Dörwald added the comment: After the patch the comment: /* Implementation limitations: only support error handler that return bytes, and only support up to four replacement bytes. */ no longer applies. Also I would like to see a version of this patch where the length limitation fo

[issue8092] utf8, backslashreplace and surrogates

2010-04-20 Thread STINNER Victor
Changes by STINNER Victor : -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/op

[issue8092] utf8, backslashreplace and surrogates

2010-04-20 Thread STINNER Victor
STINNER Victor added the comment: New version without the hardcoded limit: don't use goto encodeUCS4;, chain if to limit indentation depth: it only costs one copy of the UCS4 (5 lines are duplicated). The buffer is now reallocated each time a surrogate escape is longer than 4 bytes. I don't

[issue8092] utf8, backslashreplace and surrogates

2010-04-20 Thread STINNER Victor
Changes by STINNER Victor : Removed file: http://bugs.python.org/file16503/utf8_surrogate_error.patch ___ Python tracker ___ ___ Python-bugs-li

[issue8092] utf8, backslashreplace and surrogates

2010-04-20 Thread Antoine Pitrou
Antoine Pitrou added the comment: > I don't know if "nallocated += repsize - 4;" can overflow or not. > If yes, how can I detect the overflow? Sure, if they are both Py_ssize_t, just use: if (nallocated > PY_SSIZE_T_MAX - repsize + 4) { /* handle overflow ... */ } -- nosy: +pitrou

[issue8092] utf8, backslashreplace and surrogates

2010-04-20 Thread STINNER Victor
STINNER Victor added the comment: Oh no :-( I realized that I removed the first message of this issue! msg100687. Copy/paste of the message: --- This issue is a regression introduced by r72208 to fix the issue #3672. Attached patch fixes PyUnicode_EncodeUTF8() if unicode_encode_call_errorhand

[issue8092] utf8, backslashreplace and surrogates

2010-04-20 Thread STINNER Victor
STINNER Victor added the comment: Oops, I forgot the remove the reallocation in the unicode case in the patch version 2. Patch version 3: - micro-optimization: group both surrogates cases in the same if to avoid checking 0xD800 <= ch twice - check for integer overflow - (remove the duplica

[issue8092] utf8, backslashreplace and surrogates

2010-04-22 Thread STINNER Victor
Changes by STINNER Victor : Removed file: http://bugs.python.org/file17010/utf8_surrogate_error-2.patch ___ Python tracker ___ ___ Python-bugs-

[issue8092] utf8, backslashreplace and surrogates

2010-04-22 Thread STINNER Victor
STINNER Victor added the comment: Fixed: r80382 (py3k), r80383 (3.1). -- resolution: -> fixed status: open -> closed ___ Python tracker ___ _