[issue39910] os.ftruncate on Windows should be sparse

2020-03-09 Thread Mingye Wang
New submission from Mingye Wang : Consider this interaction: cmd> echo > 1.txt cmd> python -c "__import__('os').truncate('1.txt', 1024 ** 3)" cmd> fsutil sparse queryFlag 1.txt Not only takes a long time as is typical for a zero-write, but also reports non-sparse as

[issue39732] plistlib should export UIDs in XML like Apple does

2020-02-23 Thread Mingye Wang
Change by Mingye Wang : -- keywords: +patch pull_requests: +17987 stage: -> patch review pull_request: https://github.com/python/cpython/pull/18622 ___ Python tracker <https://bugs.python.org/issu

[issue39732] plistlib should export UIDs in XML like Apple does

2020-02-23 Thread Mingye Wang
New submission from Mingye Wang : Although there is no native UID type in Apple's XML format, Apple's NSKeyedArchiver still works with it because it converts the UID to a dict of {"CF$UID": int(some_uint64_val)}. Plistlib should do the same. For a sample, see https://github.com/a

[issue28343] Bad encoding alias cp936 -> gbk: euro sign

2020-01-05 Thread Mingye Wang
Mingye Wang added the comment: b'\x80'.decode('cp936') is still broken on python 3.7. Working on a PR. -- versions: +Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue28

[issue28343] Bad encoding alias cp936 -> gbk: euro sign

2016-11-24 Thread Mingye Wang
Changes by Mingye Wang <arthur200...@gmail.com>: -- versions: -Python 3.3, Python 3.4 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python

[issue28693] No EUDC (HKSCS) support in Windows cp950

2016-11-24 Thread Mingye Wang
Mingye Wang added the comment: Windows cp950's EUDC<->PUA mapping is not specific to HKSCS. -- title: No HKSCS support in Windows cp950 -> No EUDC (HKSCS) support in Windows cp950 ___ Python tracker <rep...@bugs.python.org> <http

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Mingye Wang
Mingye Wang added the comment: > Codecs are strict by default in Python. Call MultiByteToWideChar() with the > MB_ERR_INVALID_CHARS flag as Python does. Great catch. Without MB_ERR_INVALID_CHARS or WC_NO_BEST_FIT_CHARS Windows would perform the "best fit" behavior describe

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Mingye Wang
Mingye Wang added the comment: ... On the other hand, I am happy to use these Win32 functions if they are faster, but still the table should be made correct in the first place. (See also issue28343 (936) and issue28693 (950) for problems with DBCS Chinese code pages

[issue28343] Bad encoding alias cp936 -> gbk: euro sign

2016-11-16 Thread Mingye Wang
Mingye Wang added the comment: Update: the test script at issue28712 can be modified to show this issue too. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/i

[issue28693] No HKSCS support in Windows cp950

2016-11-16 Thread Mingye Wang
Mingye Wang added the comment: Update: the test script at issue28712 can be modified to show this issue too. -- components: +Windows nosy: +paul.moore, steve.dower, tim.golden, zach.ware ___ Python tracker <rep...@bugs.python.org>

[issue28343] Bad encoding alias cp936 -> gbk: euro sign

2016-11-16 Thread Mingye Wang
Changes by Mingye Wang <arthur200...@gmail.com>: -- components: +Windows nosy: +paul.moore, steve.dower, tim.golden, zach.ware ___ Python tracker <rep...@bugs.python.org> <http://bugs.python

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Mingye Wang
Mingye Wang added the comment: Yes, it's a table issue. My suggested fix is to replace them all with WindowsBestFit tables, where MS currently redirects https://msdn.microsoft.com/en-us/globalization/mt767590 visitors to. These old "WINDOWS" tables appear abandoned sinc

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Mingye Wang
Changes by Mingye Wang <arthur200...@gmail.com>: Removed file: http://bugs.python.org/file45502/pycp.py ___ Python tracker <rep...@bugs.python.org> <http://bugs.python

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Mingye Wang
Mingye Wang added the comment: The output is already attached as win10_14959_py36.txt. PS: after playing with ctypes, I got a version of pycp that works with Py < 3.3 too (attached with comment). -- Added file: http://bugs.python.org/file45503/pycp_ctypes

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Mingye Wang
Changes by Mingye Wang <arthur200...@gmail.com>: Removed file: http://bugs.python.org/file45497/pycp.py ___ Python tracker <rep...@bugs.python.org> <http://bugs.python

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Mingye Wang
Changes by Mingye Wang <arthur200...@gmail.com>: Added file: http://bugs.python.org/file45502/pycp.py ___ Python tracker <rep...@bugs.python.org> <http://bugs.python

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Mingye Wang
Mingye Wang added the comment: Ugh... This is weird. Attached is a correct version use Python 3.6's 'code page' methods. I have modified the script a little to make sure it runs on Py3. -- Added file: http://bugs.python.org/file45501/win10_14959_py36.txt

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-15 Thread Mingye Wang
Mingye Wang added the comment: > Python 3.4.3 on Cygwin also fails ``b'\x81\x8d'.encode('cp1252')``. ... but since Cygwin packagers did not enable Win32 APIs for their build, I cannot test the script directly. -- ___ Python tracker &

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-15 Thread Mingye Wang
Changes by Mingye Wang <arthur200...@gmail.com>: Added file: http://bugs.python.org/file45498/windows10_14959.txt ___ Python tracker <rep...@bugs.python.org> <http://bugs.python

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-15 Thread Mingye Wang
New submission from Mingye Wang: Mappings for 0x81 and 0x8D in multiple Windows code pages diverge from what Windows does. Attached is a script that tests for this behavior. (These two bytes are not necessary the only problems, but for sure they are the most widespread and famous ones. Again

[issue28343] Bad encoding alias cp936 -> gbk: euro sign

2016-11-15 Thread Mingye Wang
Mingye Wang added the comment: Also, go to ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit936.txt for MS reference. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/i

[issue24117] Wrong range checking in GB18030 decoder.

2016-11-14 Thread Mingye Wang
Mingye Wang added the comment: Just FYI, cp950 0xC6A1 (\uf6b1) is found in current WindowsBestFit: ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt -- nosy: +Artoria2e5 ___ Python tracker <rep...@bugs.python.

[issue28693] No HKSCS support in Windows cp950

2016-11-14 Thread Mingye Wang
New submission from Mingye Wang: Python's cp950 implementation lacks support for HKSCS ('big5hkscs'). This support, which maps HKSCS Big5-EUDC code points to Unicode PUA code points algorithmically, is found in Windows Vista+ as well as an update for XP. An experiment session is shown below

[issue28343] Bad encoding alias cp936 -> gbk: euro sign

2016-10-09 Thread Mingye Wang
Mingye Wang added the comment: The "join the web people" solution should look like this: $ diff -Naurp a/_codecs_cn.c b/_codecs_cn.c --- a/_codecs_cn.c2016-10-09 14:24:04.675111500 -0700 +++ b/_codecs_cn.c2016-10-09 14:27:06.600961500 -0700 @@ -128,6 +128,12 @@ E

[issue24036] GB2312 codec is using a wrong covert table

2016-10-02 Thread Mingye Wang
Mingye Wang added the comment: > Advice for final user: This seems something worthy of adding to the codecs doc as a footnote. Perhaps something like "(deprecated) ... gb2312 is an obsolete encoding from the 1980s. Use gbk or gb18030 instead." will do. > libiconv-1.14 is also

[issue28343] Bad encoding alias cp936 -> gbk: euro sign

2016-10-02 Thread Mingye Wang (Arthur2e5)
New submission from Mingye Wang (Arthur2e5): Microsoft's cp936 defines a euro sign at 0x80, but Python would kick the bucket when asked to do something like `u'\u20ac'.encode('cp936')`. This may break things for zh-hans-cn windows users who wants to put a euro sign in their file name