Ma Lin added the comment:
Although the improvement is not great, it's a very hot code path.
Could you review the PR?
--
components: +Windows
nosy: +paul.moore, tim.golden
___
Python tracker
<https://bugs.python.org/is
Change by Ma Lin :
--
pull_requests: +21213
pull_request: https://github.com/python/cpython/pull/22132
___
Python tracker
<https://bugs.python.org/issue41
Change by Ma Lin :
--
pull_requests: +21211
pull_request: https://github.com/python/cpython/pull/22130
___
Python tracker
<https://bugs.python.org/issue41
Change by Ma Lin :
--
keywords: +patch
pull_requests: +21208
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/22126
___
Python tracker
<https://bugs.python.org/issu
New submission from Ma Lin :
The code in zlib module:
self->zst.next_in = data->buf; // set next_in
...
ENTER_ZLIB(self); // acquire thread lock
`self->zst` is a `z_stream` struct defined in zlib, used to record states of a
compress/decompress stream:
typed
Ma Lin added the comment:
I have spent two weeks, almost complete the code, a preview:
https://github.com/animalize/cpython/pull/8/files
Write directly for stdlib, since there are already zstd modules on pypi.
In addition, the API of zstd is simple, not as complicated as lzma.
Can also use
Ma Lin added the comment:
> when I delete the file %APPDATA%\Microsoft\HTML Help\hh.dat,
> the problem seems to go away.
It doesn't work for me.
Moreover, `Binary Index=Yes` no longer works on my PC.
A few days ago, I installed a clean Windows 10 2004, then CHM's index
Ma Lin added the comment:
> More realistically, including the docs as unbundled HTML files
> and relying on the default browser is probably an all-around better idea.
CHM's index function is very convenient, I almost always use this feature when
I use CHM.
How about use tkinter
Ma Lin added the comment:
There are two zstd modules on pypi:
https://pypi.org/project/zstd/
https://pypi.org/project/zstandard/
The first one is too simple.
The second one is powerful, but has too many APIs:
ZstdCompressorIterator
ZstdDecompressorIterator
Ma Lin added the comment:
There can be at most one empty match at a position. IIRC, Perl's regex engine
has very similar behavior.
If don't want empty match, use + is fine.
--
___
Python tracker
<https://bugs.python.o
Ma Lin added the comment:
The re.sub() doc said:
Changed in version 3.7: Empty matches for the pattern are replaced when
adjacent to a previous non-empty match.
IMO 3.7+ behavior is more reasonable, and it fixed a bug, see issue25054.
--
nosy: +malin
Ma Lin added the comment:
A more thorough solution was used, see issue41486.
So I close this issue.
--
stage: -> resolved
status: open -> closed
___
Python tracker
<https://bugs.python.org/i
Change by Ma Lin :
--
keywords: +patch
pull_requests: +20886
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/21740
___
Python tracker
<https://bugs.python.org/issu
Change by Ma Lin :
Added file: https://bugs.python.org/file49368/benchmark_real.py
___
Python tracker
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailin
Change by Ma Lin :
Added file: https://bugs.python.org/file49367/benchmark.py
___
Python tracker
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailin
Change by Ma Lin :
Added file: https://bugs.python.org/file49365/0to200MB_step2MB.png
___
Python tracker
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list m
Change by Ma Lin :
Added file: https://bugs.python.org/file49366/0to20MB_step64KB.png
___
Python tracker
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list m
Change by Ma Lin :
Added file: https://bugs.python.org/file49364/0to2GB_step30MB.png
___
Python tracker
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list m
New submission from Ma Lin :
🔵 bz2/lzma module's current growth algorithm
bz2/lzma module's initial output buffer size is 8KB [1][2], and they are using
this output buffer growth algorithm [3][4]:
newsize = size + (size >> 3) + 6
[1] https://github.com/python/cpyth
Ma Lin added the comment:
I'm working on issue41265.
If nothing happens, I also would like to write a zstd module for stdlib before
the end of the year, but I dare not promise this.
If anyone wants to work on this issue, very gra
Ma Lin added the comment:
Some underlying stream has fast-path for .readall().
So close this issue.
--
stage: patch review -> resolved
status: open -> closed
___
Python tracker
<https://bugs.python.org/i
Ma Lin added the comment:
At least fix this bug:
the error-handler object is not cached, it needs to be
looked up from a dict every time, which is very inefficient.
The code:
https://github.com/python/cpython/blob/v3.9.0b4/Modules/cjkcodecs/multibytecodec.c#L81-L98
I will submit a
Change by Ma Lin :
--
keywords: +patch
pull_requests: +20842
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/21698
___
Python tracker
<https://bugs.python.org/issu
New submission from Ma Lin :
BufferedReader's constructor has a `buffer_size` parameter, it's the size of
this buffer:
When reading data from BufferedReader object, a larger
amount of data may be requested from the underlying raw
stream, and kept in an inter
Ma Lin added the comment:
I'm working on a patch.
lzma decompressing speed increases:
baseline: 0.275722 sec
patched: 0.140405 sec
(Uncompressed data size 52.57 MB)
The new algorithm looks like this:
#define INITIAL_BUFFER_SIZE (16*1024)
static inline Py_ssize_t
get_ne
Ma Lin added the comment:
> But how many new Python web application use CJK codec instead of UTF-8?
A CJK character usually takes 2-bytes in CJK encodings, but takes 3-bytes in
UTF-8.
I tested a Chinese book:
in GBK: 853,025 bytes
in UTF-8: 1,267,523 bytes
For CJK content, UTF-8
Ma Lin added the comment:
IMO "xmlcharrefreplace" is useful for Web application.
For example, the page's charset is "gbk", then this statement can generate the
bytes content easily & safely:
s.encode('gbk', 'xmlcharrefreplace')
Maybe so
New submission from Ma Lin :
CJK encode/decode functions only have three error-handler fast-paths:
replace
ignore
strict
See the code: [1][2]
If use other built-in error-handlers, need to get the error-handler object, and
call it with an Unicode Exception argument. See the code
Ma Lin added the comment:
> Add zstd support in tarfile
This requires the stdlib to contain a Zstandard module.
You can ask in the Idea forum:
https://discuss.python.org/c/ideas
--
nosy: +malin
___
Python tracker
<https://bugs.pyth
Ma Lin added the comment:
It is better to raise a warning when using problematic combination.
But IMO either "raising a warning" or "adding more description to doc" is too
dependent on the implementation detail of liblzma.
--
__
Ma Lin added the comment:
Maybe the zlib module can also use the same algorithm.
zlib module's initial buffer size is 16KB [1], each time the size doubles [2].
[1] zlib module's initial buffer size:
https://github.com/python/cpython/blob/v3.9.0b4/Modules/zlibmodule.c#L32
[2] z
New submission from Ma Lin :
lzma/bz2 modules are using the same buffer growth algorithm: [1][2]
newsize = size + (size >> 3) + 6;
lzma/bz2 modules' default output buffer is 8192 bytes [3][4], so the growth
step is below.
For many cases, maybe the buffer is resized too
Ma Lin added the comment:
There was a similar issue (issue21872).
When decompressing a lzma.FORMAT_ALONE format data, and it doesn't have the end
marker (but has the correct "Uncompressed Size" in the .lzma header), sometimes
the last one to dozens bytes can't be outpu
Ma Lin added the comment:
The docs[1] said:
Compression filters:
FILTER_LZMA1 (for use with FORMAT_ALONE)
FILTER_LZMA2 (for use with FORMAT_XZ and FORMAT_RAW)
But your code uses a combination of `FILTER_LZMA1` and `FORMAT_RAW`, is this ok?
[1] https
Change by Ma Lin :
--
components: +Library (Lib) -Extension Modules
nosy: +malin
___
Python tracker
<https://bugs.python.org/issue41210>
___
___
Python-bug
Ma Lin added the comment:
Do I need to write a detailed review guide? I suppose that after reading it
from beginning to end, it will be easy to understand PR 12427, no need to read
anything else.
Or plan to replace the sre module with the regex module in a future version
Ma Lin added the comment:
Why you always want to use "utf-8" encoded identifier as group name in `bytes`
pattern.
The direction is: a group name written in `bytes` pattern, and will convert to
`str.
Not this direction: `str` group name -(utf8)-> `bytes` pattern -> `
Ma Lin added the comment:
Please look at these:
>>> orig_name = "Ř"
>>> orig_ch = orig_name.encode("cp1250") # Because why not?
>>> orig_ch
b'\xd8'
>>> name = list(re.match(b"(?P<" + orig_ch +
Ma Lin added the comment:
> this limitation to the latin-1 subset is not compatible with the
> documentation, which says that valid Python identifiers are valid group names.
Not all latin-1 characters are valid identifier, for example:
>>> '\x94'.en
Ma Lin added the comment:
It seems you don't know some knowledge of encoding yet.
Naturally, `bytes` cannot contain character which Unicode code point is greater
than \u00ff. So you can only use "latin1" encoding, which map from character to
byte (or reverse) directly.
&
Ma Lin added the comment:
In this case, you can only use 'latin1', which directly map one character
(\u-\u00FF) to/from one byte.
If use 'utf-8', it may map one character to multiple bytes, such as 'Δ' ->
b'\xce\x94'
'\x94
Ma Lin added the comment:
`latin1` is the character set that Unicode code point from \u to \u00ff,
and the characters are directly mapped from/to bytes.
So b'\xe9' is mapped to \u00e9, it is `é`.
Of course, characters with Unicode code point greater than 0xff are impossible
to
Ma Lin added the comment:
> a non-ascii group name will raise an error in bytes, even if encoded
Looks like this is a language limitation:
>>> b'é'
File "", line 1
SyntaxError: bytes can only contain ASCII literal characters.
No prob
Ma Lin added the comment:
Group name is `str` is very reasonable. Essentially it is just a name, it has
nothing to do with `bytes`.
Other names in Python are also `str` type, such as codec names, hashlib names.
--
nosy: +Ma Lin
___
Python tracker
Ma Lin added the comment:
I suggest not to close this issue, this is an opportunity to investigate
whether Python3 has this problem as well.
--
nosy: +Ma Lin
___
Python tracker
<https://bugs.python.org/issue29
Ma Lin added the comment:
Good catch.
You can submit a PR to fix this. If you start from zero and do it slowly, it
will take about a week or two.
--
components: +Windows -Build
nosy: +Ma Lin, paul.moore, steve.dower, tim.golden, zach.ware
Change by Ma Lin :
--
keywords: +patch
pull_requests: +19847
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/20622
___
Python tracker
<https://bugs.python.org/issu
New submission from Ma Lin :
The Windows build is using xz-5.2.2, it was released on 2015-09-29.
xz-5.2.5 was released recently, maybe we can update this library.
When preparing cpython-source-deps, don't forget to copy
`xz-5.2.5\windows\vs2019\config.h` to `xz-5.2.5\windows\` f
Ma Lin added the comment:
Is there hope to merge to 3.9 branch?
--
___
Python tracker
<https://bugs.python.org/issue35859>
___
___
Python-bugs-list mailin
Ma Lin added the comment:
I did a git bisect, this commit fixed the bug:
https://github.com/python/cpython/commit/ac22f6aa989f18c33c12615af1c66c73cf75d5e7
--
___
Python tracker
<https://bugs.python.org/issue40
Ma Lin added the comment:
On Windows 10, Python 3.7, I get the same message as above reply.
If use Python 3.8, it works well.
--
nosy: +Ma Lin
___
Python tracker
<https://bugs.python.org/issue40
Ma Lin added the comment:
It seems that people usually use the socket module like this, I think it's safe
to respect this habit:
if hasattr(socket, "FLAG_NAME"):
do_something
If use PR19402, your program will have problem on the older version system, not
on
Ma Lin added the comment:
Windows build encountered a similar problem, see issue32394.
The solution is to check the runtime system version when importing socket
module, if it is an older system, delete the constants. [1]
issue32394 has a small script (winsdk_watchdog.py) to help find such
Ma Lin added the comment:
I also planned to review this commit at some moment, I feel a bit unsteady
about it.
If an optimization needs to be fine-tuned, and may introduces some pitfalls for
future code maintenance, IMHO it is best to avoid doing this kind of
optimization.
--
nosy
Ma Lin added the comment:
Is it possible to scan stdlib to find similar bugs?
--
nosy: +Ma Lin
___
Python tracker
<https://bugs.python.org/issue39033>
___
___
Ma Lin added the comment:
issue29097 fixed bug in `datetime.fromtimestamp()`.
But this issue is about `datetime.timestamp()`, not fixed yet.
--
___
Python tracker
<https://bugs.python.org/issue37
Change by Ma Lin :
--
nosy: +Ma Lin
___
Python tracker
<https://bugs.python.org/issue23692>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.pyth
Ma Lin added the comment:
> I'd still retain \0 as a special case, since it really is useful.
Yes, maybe \0 is used widely, I didn't think of it.
Changing is troublesome, let's keep it as is.
--
___
Python tracker
<ht
Ma Lin added the comment:
Octal escape:
\oooCharacter with octal value ooo
As in Standard C, up to three octal digits are accepted.
It only accepts UCS1 characters (ooo <= 0o377):
>>> ord('\377')
255
>>> len('\378')
Ma Lin added the comment:
@veaba
Post only in English is fine.
> Is this actually needed?
Maybe very very few people dynamically generate some large patterns.
> However, \g<...> is not accepted in a pattern.
> in the "regex" module I added support for it in a patter
Ma Lin added the comment:
Backreference number in replace string can't >= 100
https://github.com/python/cpython/blob/v3.8.0/Lib/sre_parse.py#L1022-L1036
If none take this, I will try to fix this issue tomorrow.
--
nosy: +serhiy.storchaka
title: Regular match overfl
Ma Lin added the comment:
An simpler reproduce code:
```
import re
NUM = 99
# items = [ '(001)', '(002)', '(003)', ..., '(NUM)']
items = [r'(%03d)' % i for i in range(1, 1+NUM)]
pattern = '|'.join(items)
# repl = '\1
Change by Ma Lin :
--
nosy: +Ma Lin
type: security ->
___
Python tracker
<https://bugs.python.org/issue38582>
___
___
Python-bugs-list mailing list
Unsubscrib
Ma Lin added the comment:
PR 15732 became an overhaul:
- replace/backslashreplace/surrogateescape were wrongly described as encoding
only, in fact they can also be used in decoding.
- clarify the description of surrogatepass.
- add more descriptions to each handler.
- add two REPL examples
Ma Lin added the comment:
> Thus this breaks editing the physical line past the astral character. We
> cannot do anything with this.
I tried, it's sad the experience is not very good.
------
nosy: +Ma Lin
___
Python tracker
<https://b
Ma Lin added the comment:
> This file is copied directly from https://github.com/libexpat/libexpat/ >
> project. Would you mind to propose your patch there?
ok, I will report to there.
--
___
Python tracker
<https://bugs.python.or
Ma Lin added the comment:
Other warnings:
c:\vstinner\python\master\objects\longobject.c(420): warning C4244: 'function':
conversion from 'unsigned __int64' to 'sdigit', possible loss of data
c:\vstinner\python\master\objects\longobject.c(428): warning C4267
Ma Lin added the comment:
On my Windows, some non-ASCII characters cause this warning:
d:\dev\cpython\modules\expat\xmltok.c : warning C4819:
The file contains a character that cannot be represented in
the current code page (936). Save the file in Unicode format
to prevent
Ma Lin added the comment:
There are 4 functions have the similar code, see PR 16334.
Just replaced the `unsigned long` type with `size_t` type, got these benchmarks.
Can this be backported to 3.8 branch?
1. bytes.isascii()
D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "b
Change by Ma Lin :
--
title: micro-optimize ucs1lib_find_max_char in Windows 64-bit build -> Use
8-byte step to detect ASCII sequence in 64bit Windows builds
___
Python tracker
<https://bugs.python.org/issu
Change by Ma Lin :
--
keywords: +patch
pull_requests: +15911
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/16334
___
Python tracker
<https://bugs.python.org/issu
Ma Lin added the comment:
Maybe @sir-sigurd can find more optimizations.
FYI, `_Py_bytes_isascii()` function [1] also has similar code.
[1] https://github.com/python/cpython/blob/v3.8.0b4/Objects/bytes_methods.c#L104
--
___
Python tracker
<ht
New submission from Ma Lin :
C type `long` is 4-byte integer in 64-bit Windows build. [1]
But `ucs1lib_find_max_char()` function [2] uses SIZEOF_LONG, so it loses a
little performance in 64-bit Windows build.
Below is the benchmark of using SIZEOF_SIZE_T and this change:
- unsigned
Ma Lin added the comment:
> I'd fix them, but I'm not sure if we are going to restore CHECK_SMALL_INT()
> ¯\_(ツ)_/¯
I suggest we slow down, carefully sort out the recent commits for longobject.c:
https://bugs.python.org/issue37812#msg352837
Make the code has consiste
Ma Lin added the comment:
Recent commits for longobject.c
Revision: 5e63ab05f114987478a21612d918a1c0276fe9d2
Author: Greg Price
Date: 19-8-25 1:19:37
Message:
bpo-37812: Convert CHECK_SMALL_INT macro to a function so the return is
explicit. (GH-15216)
The concern for
Ma Lin added the comment:
PR 16270 use Py_UNREACHABLE() in a single line.
It solves this particular issue.
--
___
Python tracker
<https://bugs.python.org/issue38
Change by Ma Lin :
--
keywords: +patch
pull_requests: +15860
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/16270
___
Python tracker
<https://bugs.python.org/issu
Ma Lin added the comment:
If use static inline function, and Py_UNREACHABLE() inside an if-else branch
that should return a value, compiler may emit warning:
https://godbolt.org/z/YtcNSf
MSVC v19.14:
warning C4715: 'test': not all control paths return a value
c
Ma Lin added the comment:
> I agree that both changes should be reverted.
There is another commit after the two commits:
https://github.com/python/cpython/commit/c6734ee7c55add5fdc2c821729ed5f67e237a096
It is troublesome to revert them.
PR 16146 is on-going, maybe we can request the aut
Ma Lin added the comment:
> It's not clear to me if anyone benchmarked to see if the
> conversion to a macro had any measurable performance benefit.
I tested on that day, also use this command:
python.exe -m pyperf timeit -s "from collections import deque; consume =
deque(
Ma Lin added the comment:
We can change Py_UNREACHABLE() to assert(0) in longobject.c
Or remove the article in Py_UNREACHABLE()
--
___
Python tracker
<https://bugs.python.org/issue38
Ma Lin added the comment:
This commit changed Py_UNREACHABLE() five days ago:
https://github.com/python/cpython/commit/3ab61473ba7f3dca32d779ec2766a4faa0657923
If remove this change, it can be compiled successfully.
--
nosy: +Ma Lin
___
Python
Ma Lin added the comment:
Some memos:
1, In liblzma, these missing bytes were copied inside `dict_repeat` function:
788 case SEQ_COPY:
789 // Repeat len bytes from distance of rep0.
790 if (unlikely(dict_repeat(&dict, rep0, &len))) {
See l
Ma Lin added the comment:
PR 15710 has been merged into the master, but the merge message is not shown
here.
Commit:
https://github.com/python/cpython/commit/6b519985d23bd0f0bd072b5d5d5f2c60a81a19f2
Maybe this issue can be closed.
--
resolution: -> fixed
stage: patch rev
Change by Ma Lin :
--
pull_requests: +15407
pull_request: https://github.com/python/cpython/pull/15753
___
Python tracker
<https://bugs.python.org/issue38
Change by Ma Lin :
--
keywords: +patch
pull_requests: +15386
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/15732
___
Python tracker
<https://bugs.python.org/issu
New submission from Ma Lin :
Text descriptions about `Error Handlers` are not very friendly to novices.
https://docs.python.org/3/library/codecs.html#error-handlers
For example:
'xmlcharrefreplace'
Replace with the appropriate XML character reference (only for encoding).
I
Ma Lin added the comment:
> This change produces tiny, but measurable speed-up for handling small ints
I didn't get measurable change, I run this command a dozen times and take the
best result:
D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "from collections
Change by Ma Lin :
--
nosy: +Ma Lin
___
Python tracker
<https://bugs.python.org/issue26868>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.pyth
Change by Ma Lin :
--
title: Assertion failed: object has negative ref count -> reference counter
issue in signal module
___
Python tracker
<https://bugs.python.org/issu
Ma Lin added the comment:
This range has not been changed since "preallocated small integer pool" was
introduced:
#define NSMALLPOSINTS 257
#define NSMALLNEGINTS 5
The commit (Jan 2007):
https://github.com/python/cpython/commit/ddefaf31b366ea84250fc5090837c2b764a04102
I
Ma Lin added the comment:
Revert commit 5e63ab0 or use PR 15710, both are fine.
--
___
Python tracker
<https://bugs.python.org/issue38015>
___
___
Python-bug
Change by Ma Lin :
--
keywords: +patch
pull_requests: +15365
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/15710
___
Python tracker
<https://bugs.python.org/issu
Ma Lin added the comment:
I did a Git bisect, this is the first bad commit:
https://github.com/python/cpython/commit/9541bd321a94f13dc41163a5d7a1a847816fac84
nosy involved mates.
--
nosy: +berker.peksag, nanjekyejoannah
___
Python tracker
<ht
Change by Ma Lin :
--
keywords: +patch
pull_requests: +15355
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/15701
___
Python tracker
<https://bugs.python.org/issu
New submission from Ma Lin :
Adding these two lines to /Objects/longobject.c will disable the "preallocated
small integer pool":
#define NSMALLPOSINTS 0
#define NSMALLNEGINTS 0
Then run this reproduce code (attached):
from enum import IntEnum
import _signal
Ma Lin added the comment:
There will always be a new commit, replacing with a macro version also looks
good.
I have no opinion, both are fine.
--
___
Python tracker
<https://bugs.python.org/issue38
New submission from Ma Lin :
Commit 5e63ab0 replaces macro with this inline function:
static inline int
is_small_int(long long ival)
{
return -NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS;
}
(by default, NSMALLNEGINTS is 5, NSMALLPOSINTS is 257)
Change by Ma Lin :
--
nosy: +Ma Lin
___
Python tracker
<https://bugs.python.org/issue37907>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.pyth
Ma Lin added the comment:
I sent twice, but it doesn't appear in Python-Ideas list.
I will try to post to Python-Dev tomorrow.
--
___
Python tracker
<https://bugs.python.org/is
101 - 200 of 394 matches
Mail list logo