New submission from Yuriy Pilgun <p...@ukrpost.net>: Reading UTF-16 text file with module 'codecs' fails, if surrogate pair is located at 72-character boundary.
Attached python script fails with message: UnicodeDecodeError: 'utf16' codec can't decode bytes in position 70-71: unexpected end of data The reason is splitting of input data for readline() into chunks, namely readsize = size or 72 ---------- components: Library (Lib), Unicode files: testutf16.py messages: 130498 nosy: ply priority: normal severity: normal status: open title: Reading UTF-16 with codecs.readline() breaks on surrogate pairs type: behavior versions: Python 2.7 Added file: http://bugs.python.org/file21070/testutf16.py _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue11461> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com