[issue19087] bytearray front-slicing not optimized

2015-04-18 Thread Martin Panter
Martin Panter added the comment: I think the changes for this issue are causing the crash and unexpected buffer expansion described in Issue 23985. Appending to a bytearray() can overstep the memory buffer because it doesn’t account for ob_start when checking for resizing. And “del” can

[issue19087] bytearray front-slicing not optimized

2013-10-05 Thread Roundup Robot
Roundup Robot added the comment: New changeset 499a96611baa by Antoine Pitrou in branch 'default': Issue #19087: Improve bytearray allocation in order to allow cheap popping of data at the front (slice deletion). http://hg.python.org/cpython/rev/499a96611baa -- nosy: +python-dev

[issue19087] bytearray front-slicing not optimized

2013-10-05 Thread Antoine Pitrou
Antoine Pitrou added the comment: The commit produced compiled errors on Windows, but I've since fixed them. -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org

[issue19087] bytearray front-slicing not optimized

2013-10-05 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Side effect of this change is that bytearray's data now can be non-aligned. We should examine all places which relies on this. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19087

[issue19087] bytearray front-slicing not optimized

2013-10-05 Thread Antoine Pitrou
Antoine Pitrou added the comment: Side effect of this change is that bytearray's data now can be non-aligned. We should examine all places which relies on this. The C API makes no guarantees as to alignment of private data areas, so any external code relying on it would be incorrect. The

[issue19087] bytearray front-slicing not optimized

2013-10-04 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I don't see much sense in differences between bytea_slice2.patch and bytea_slice3.patch, because bytea_slice3.patch is not smaller and simpler than bytea_slice2.patch. I meant that you can continue use self-ob_bytes instead of PyByteArray_AS_STRING(self)

[issue19087] bytearray front-slicing not optimized

2013-10-04 Thread Antoine Pitrou
Antoine Pitrou added the comment: I meant that you can continue use self-ob_bytes instead of PyByteArray_AS_STRING(self) if self-ob_bytes points not to the start of physical buffer, but to the start of logical byte array. *This* will simplify the patch a lot. It will make the diff smaller

[issue19087] bytearray front-slicing not optimized

2013-10-03 Thread Antoine Pitrou
Antoine Pitrou added the comment: Here is a slightly modified patch implementing Serhiy's suggestion. -- Added file: http://bugs.python.org/file31954/bytea_slice3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19087

[issue19087] bytearray front-slicing not optimized

2013-10-03 Thread STINNER Victor
STINNER Victor added the comment: bytea_slice3.patch looks simpler than bytea_slice2.patch, I prefer it. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19087 ___

[issue19087] bytearray front-slicing not optimized

2013-09-30 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I'm not sure that it is what you expected: bytearray() is only initialized once (setup of timeit). You probably want to reinitialize at each loop. There is no setup of timeit here. And you forgot bytes(b) after accumulating loop. bench_bytearray.py shows

[issue19087] bytearray front-slicing not optimized

2013-09-30 Thread Antoine Pitrou
Antoine Pitrou added the comment: The problem is the suboptimal code is also the natural way to write such code. If you know a simple and idiomatic way to write an optimal bytes FIFO, then please share it with us. Please share this written in the natural way real code with us. I can't

[issue19087] bytearray front-slicing not optimized

2013-09-30 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I don't understand why you avoid to show any examples which benefit. Shouldn't optimizing patches prove their efficient? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19087

[issue19087] bytearray front-slicing not optimized

2013-09-30 Thread Antoine Pitrou
Antoine Pitrou added the comment: I don't understand why you avoid to show any examples which benefit. Shouldn't optimizing patches prove their efficient? Many micro-optimizations get committed without proving themselves on a high-level benchmark suite, as long as they produce a big enough

[issue19087] bytearray front-slicing not optimized

2013-09-30 Thread Antoine Pitrou
Antoine Pitrou added the comment: However, the patch had a bug in the resizing logic. Here is a new patch fixing that (+ an additional test). -- Added file: http://bugs.python.org/file31926/bytea_slice2.patch ___ Python tracker

[issue19087] bytearray front-slicing not optimized

2013-09-30 Thread Antoine Pitrou
Antoine Pitrou added the comment: Other benchmarks for the new patch (exercising FIFO-like behaviour: some data is appended at one end, and popped at the other): timeit -s b=bytearray(10);s=b'x'*100 b[:100] = b''; b.extend(s) - before: 4.07 usec per loop - after: 0.812 usec per loop For

[issue19087] bytearray front-slicing not optimized

2013-09-30 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: A 4x improvement on a micro-benchmark is very likely to make a difference in at least some real-world code (while a 10% improvement wouldn't). If there is a code that uses the deleting from the beginning of a bytearray. I just pray to demonstrate this code.

[issue19087] bytearray front-slicing not optimized

2013-09-30 Thread STINNER Victor
STINNER Victor added the comment: I took me some time, but Antoine explained me the use case on IRC :-) The patch is useful is the bytearray is used as a FIFO: remove front, append tail. It can be seen as an implementation for BufferedReader. Consumer/producer is a common pattern, especially

[issue19087] bytearray front-slicing not optimized

2013-09-30 Thread STINNER Victor
STINNER Victor added the comment: I adapted my micro-benchmark to measure the speedup: bench_bytearray2.py. Result on bytea_slice2.patch: Common platform: CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes CPU model: Intel(R)

[issue19087] bytearray front-slicing not optimized

2013-09-29 Thread Antoine Pitrou
Antoine Pitrou added the comment: Results under Windows: - before: PCbuild\amd64\python.exe -m timeit b=bytearray(10) while b: b[-1:] = b'' 10 loops, best of 3: 74.8 msec per loop PCbuild\amd64\python.exe -m timeit b=bytearray(10) while b: b[:1] = b'' 10 loops, best of 3: 330 msec per

[issue19087] bytearray front-slicing not optimized

2013-09-29 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: A generic example is to parse messages out of a TCP stream. Basically any protocol transported on TCP needs such a facility, or has to find workarounds (which are either suboptimal or complicated). Could you please show concrete code in which you are going

[issue19087] bytearray front-slicing not optimized

2013-09-29 Thread Antoine Pitrou
Antoine Pitrou added the comment: Could you please show concrete code in which you are going to use this optimization? There's no need to use this optimization. Any networking code that has to find message boundaries and split on them will benefit. If you don't understand that, I'm not

[issue19087] bytearray front-slicing not optimized

2013-09-29 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Deleting a slice at the front of a bytearray have linear complexity from the size of a bytearray (in any case del b[:1] is a little faster than b[:1] = b''). I doubt than any performance critical code do it instead of increasing an index in constant time.

[issue19087] bytearray front-slicing not optimized

2013-09-29 Thread Antoine Pitrou
Antoine Pitrou added the comment: Deleting a slice at the front of a bytearray have linear complexity from the size of a bytearray (in any case del b[:1] is a little faster than b[:1] = b''). I doubt than any performance critical code do it instead of increasing an index in constant time.

[issue19087] bytearray front-slicing not optimized

2013-09-29 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: The same is true with your patch. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19087 ___ ___

[issue19087] bytearray front-slicing not optimized

2013-09-29 Thread Antoine Pitrou
Antoine Pitrou added the comment: The same is true with your patch. I don't understand. What is true with my patch? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19087 ___

[issue19087] bytearray front-slicing not optimized

2013-09-29 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: You increase internal index in a bytearray. A bytearray with small visible length can consume much hidden memory. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19087

[issue19087] bytearray front-slicing not optimized

2013-09-29 Thread Antoine Pitrou
Antoine Pitrou added the comment: You increase internal index in a bytearray. A bytearray with small visible length can consume much hidden memory. No, because PyByteArray_Resize() is always called afterwards to ensure that the buffer is resized when it gets below 50% usage. --

[issue19087] bytearray front-slicing not optimized

2013-09-29 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: No, because PyByteArray_Resize() is always called afterwards to ensure that the buffer is resized when it gets below 50% usage. I.e. the bytearray is compacted from time to time. -- ___ Python tracker

[issue19087] bytearray front-slicing not optimized

2013-09-29 Thread STINNER Victor
STINNER Victor added the comment: @Serhiy: I doubt than any performance critical code do it instead of increasing an index in constant time. Sorry, I don't get your point. It's not become Python is inefficient that developers must develop workarounds. Antoine's patch is simple, elegant, and

[issue19087] bytearray front-slicing not optimized

2013-09-29 Thread Antoine Pitrou
Antoine Pitrou added the comment: No, because PyByteArray_Resize() is always called afterwards to ensure that the buffer is resized when it gets below 50% usage. I.e. the bytearray is compacted from time to time. Is there a problem with that? --

[issue19087] bytearray front-slicing not optimized

2013-09-29 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Is there a problem with that? No more than with msg198657. Sorry, I don't get your point. It's not become Python is inefficient that developers must develop workarounds. I'm not sure that workarounds are much worst than using this optimization. At

[issue19087] bytearray front-slicing not optimized

2013-09-29 Thread Antoine Pitrou
Antoine Pitrou added the comment: One of most used cases for bytearrays is accumulating. And the patch slow down this case. I see no difference here. You are seeing a 10% slowdown, which is possibly a measurement glitch. The bottom line is that the performance remains approximately the

[issue19087] bytearray front-slicing not optimized

2013-09-29 Thread STINNER Victor
STINNER Victor added the comment: One of most used cases for bytearrays is accumulating. And the patch slow down this case. Please don't use the raw timeit command for micro-benchmarks, it is not reliable. For example, I'm unable to reproduce your slow down (7% on a microbenchmark is not

[issue19087] bytearray front-slicing not optimized

2013-09-29 Thread STINNER Victor
STINNER Victor added the comment: Oh, by the way: $ ./python -m timeit b = bytearray(); a = b'x' for i in range(1): b += a bytes(b) I'm not sure that it is what you expected: bytearray() is only initialized once (setup of timeit). You probably want to reinitialize at each loop.

[issue19087] bytearray front-slicing not optimized

2013-09-26 Thread STINNER Victor
STINNER Victor added the comment: Mercurial has another implementation strategy for a similar thing: http://selenic.com/repo/hg/file/50d721553198/mercurial/util.py#l935 I found an interesting comment in the following issue: I think the trouble we get into is chunkbuffer() creates new large

[issue19087] bytearray front-slicing not optimized

2013-09-26 Thread Antoine Pitrou
Antoine Pitrou added the comment: @Antoine: Do you know if your patch may reduce the memory fragmentation on bytearray front-slicing? It reduces the number of allocations so, yes, it can reduce memory fragmentation. We cannot really use a list of chunks for bytearray since it is supposed to

[issue19087] bytearray front-slicing not optimized

2013-09-26 Thread STINNER Victor
STINNER Victor added the comment: Could you please add unit tests for check that ob_start is used instead of memmove()? I didn't find a function for that in _testcapi. I tried to test it using sys.getsizeof(), but the size is not reliable (the bytearray buffer is not always shrinked, it

[issue19087] bytearray front-slicing not optimized

2013-09-26 Thread Antoine Pitrou
Antoine Pitrou added the comment: Could you please add unit tests for check that ob_start is used instead of memmove()? How would I do that? Most of the time we don't unit-test performance improvements (i.e. there are no tests that list.append() is O(1), for example). --

[issue19087] bytearray front-slicing not optimized

2013-09-25 Thread Antoine Pitrou
New submission from Antoine Pitrou: If you delete a slice at the end of a bytearray, it is naturally optimized (thanks to the resizing strategy). However, if you delete a slice at the front of a bytearray, it is not: a memmove() gets done every time. $ ./python -m timeit b=bytearray(1)

[issue19087] bytearray front-slicing not optimized

2013-09-25 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: And the same is for a list. List and bytearray are wrong types for front deleting. I don't think we should increase the size of bytearray, and complicate and slowdown it for such special purpose. If you want to implement a fifo using bytearray more optimal,

[issue19087] bytearray front-slicing not optimized

2013-09-25 Thread Antoine Pitrou
Antoine Pitrou added the comment: And the same is for a list. List and bytearray are wrong types for front deleting. There is no bytedeque(). I don't think we should increase the size of bytearray, and complicate and slowdown it for such special purpose. I don't think it would really slow

[issue19087] bytearray front-slicing not optimized

2013-09-25 Thread Antoine Pitrou
Antoine Pitrou added the comment: Here is a patch. Benchmarks (under Linux where realloc is fast; the gap may be wider under Windows): $ ./python -m timeit b=bytearray(10) while b: b[:1] = b'' - before: 225 msec per loop - after: 60.4 msec per loop $ ./python -m timeit b=bytearray(10)

[issue19087] bytearray front-slicing not optimized

2013-09-25 Thread Antoine Pitrou
Changes by Antoine Pitrou pit...@free.fr: -- stage: needs patch - patch review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19087 ___ ___

[issue19087] bytearray front-slicing not optimized

2013-09-25 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Could you please provide an example which uses this feature? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19087 ___

[issue19087] bytearray front-slicing not optimized

2013-09-25 Thread Antoine Pitrou
Antoine Pitrou added the comment: Could you please provide an example which uses this feature? A generic example is to parse messages out of a TCP stream. Basically any protocol transported on TCP needs such a facility, or has to find workarounds (which are either suboptimal or complicated).