[issue24117] Wrong range checking in GB18030 decoder.

2017-04-07 Thread Ma Lin
Ma Lin added the comment: I closed this issue, because it involved too many things. 1, for GB18030 decoder bug, see issue29990. 2, for hz encoder bug, see issue30003. 3, for problem in Traditional Chinese codecs, please create a new issue. -- ___ Py

[issue24117] Wrong range checking in GB18030 decoder.

2017-04-04 Thread Ma Lin
Changes by Ma Lin : -- stage: patch review -> resolved status: open -> closed ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue24117] Wrong range checking in GB18030 decoder.

2016-11-14 Thread Mingye Wang
Mingye Wang added the comment: Just FYI, cp950 0xC6A1 (\uf6b1) is found in current WindowsBestFit: ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt -- nosy: +Artoria2e5 ___ Python tracker

[issue24117] Wrong range checking in GB18030 decoder.

2016-01-02 Thread Ma Lin
Ma Lin added the comment: I posted in a Taiwanese forum: https://groups.google.com/forum/#!forum/pythontw no reply yet. -- ___ Python tracker ___

[issue24117] Wrong range checking in GB18030 decoder.

2016-01-02 Thread Ezio Melotti
Ezio Melotti added the comment: Did you hear anything back from them? -- versions: +Python 3.6 -Python 3.4 ___ Python tracker ___ ___

[issue24117] Wrong range checking in GB18030 decoder.

2015-05-18 Thread Ma Lin
Ma Lin added the comment: >> If you could provide links to the relevant pages/section we can verify that >> the codecs are indeed incorrect. Here is CP950, 0xC6A1 is not in it. https://msdn.microsoft.com/zh-cn/goglobal/cc305155 I can provide one link, but there are many variants of BIG5 conve

[issue24117] Wrong range checking in GB18030 decoder.

2015-05-18 Thread Ma Lin
Ma Lin added the comment: This is not a de-facto standard, it should be fixed. I already posted this infomation on a Taiwan Python community, let's wait their inspection. -- ___ Python tracker

[issue24117] Wrong range checking in GB18030 decoder.

2015-05-18 Thread Ezio Melotti
Ezio Melotti added the comment: > The data come from ICU, Unicode.org, IBM, If you could provide links to the relevant pages/section we can verify that the codecs are indeed incorrect. Also keep in mind that there might people relying on these incorrectness, so we have to be careful while ch

[issue24117] Wrong range checking in GB18030 decoder.

2015-05-18 Thread Ma Lin
Ma Lin added the comment: >> I examined all Chinese codecs I said it above, but I forgot Taiwan and HongKong are using Chinese as well. BIG5 and CP950 are using a wrong convert table, test this: >>> u = b'\xC6\xA1'.decode('big5') >>> hex(ord(u)) '0x30fe' This should not happen, 0xC6A1 is neithe

[issue24117] Wrong range checking in GB18030 decoder.

2015-05-07 Thread Ma Lin
Ma Lin added the comment: Good question. GB2312: I tested those programming languages one by one. GBK/CP936/GB18030-2000: I gathered data via Internet as much as I can, then compare them to Python3's codecs. I check key points with authoritative source, and verify every appeared conflicts. Th

[issue24117] Wrong range checking in GB18030 decoder.

2015-05-07 Thread Ezio Melotti
Ezio Melotti added the comment: Do you have authoritative links that describe these standards? -- ___ Python tracker ___ ___ Python-bu

[issue24117] Wrong range checking in GB18030 decoder.

2015-05-07 Thread Ma Lin
Changes by Ma Lin : Added file: http://bugs.python.org/file39320/forpy35.patch ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubs

[issue24117] Wrong range checking in GB18030 decoder.

2015-05-07 Thread Ma Lin
Changes by Ma Lin : Added file: http://bugs.python.org/file39319/forpy34.patch ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubs

[issue24117] Wrong range checking in GB18030 decoder.

2015-05-07 Thread Ma Lin
Changes by Ma Lin : Removed file: http://bugs.python.org/file39277/forpy27.patch ___ Python tracker ___ ___ Python-bugs-list mailing list Unsu

[issue24117] Wrong range checking in GB18030 decoder.

2015-05-07 Thread Ma Lin
Ma Lin added the comment: I examined all Chinese codecs, here are the patches, please review them, feel free to ask me your question. Thanks to Hye-Shik, your framework is very easy to understand :) -- Added file: http://bugs.python.org/file39318/forpy27.patch

[issue24117] Wrong range checking in GB18030 decoder.

2015-05-07 Thread Ma Lin
Changes by Ma Lin : Removed file: http://bugs.python.org/file39278/forpy3.patch ___ Python tracker ___ ___ Python-bugs-list mailing list Unsub

[issue24117] Wrong range checking in GB18030 decoder.

2015-05-04 Thread Ma Lin
Ma Lin added the comment: I found another bug in hz codec. hz encoding uses 7-bit ASCII to represent Chinese characters, it was popular in USENET networks in the late 1980s and early 1990s. I will do more check and fix them together, then I will invite you to review the patch. u = 'hi~python

[issue24117] Wrong range checking in GB18030 decoder.

2015-05-03 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Adding Hye-Shik who wrote the codec. -- nosy: +hyeshik.chang ___ Python tracker ___ ___ Python-b

[issue24117] Wrong range checking in GB18030 decoder.

2015-05-03 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- nosy: +lemburg, loewis, serhiy.storchaka stage: -> patch review ___ Python tracker ___ ___ Python-bu

[issue24117] Wrong range checking in GB18030 decoder.

2015-05-03 Thread Ma Lin
Changes by Ma Lin : -- title: A small bug in GB18030 decoder. -> Wrong range checking in GB18030 decoder. ___ Python tracker ___ ___