[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-12 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-12 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: New changeset 5b76bdba071e7bbd9fda0b9b100d1506d95c04bd by Serhiy Storchaka in branch 'master': bpo-31993: Do not use memoryview when pickle large strings. (#5154)

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-11 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- pull_requests: +5009 ___ Python tracker ___ ___

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: New changeset 0a2da50e1867831251fad75377d0f10713eb6301 by Serhiy Storchaka in branch 'master': bpo-31993: Do not create frames for large bytes and str objects (#5114)

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-08 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I'll create a separate PR for the memoroview issue after merging PR 5114. -- ___ Python tracker

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-06 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > What do you mean? Large bytes and strings was written inside a frame when serialize by dumps(). dump() (as well as Python implementations of dumps() and dump()) write them outside of a frame. --

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-06 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: PR 5114 implements this when serialize in C into memory. -- resolution: fixed -> stage: resolved -> patch review status: closed -> open ___ Python tracker

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-06 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Humm, this feature doesn't work with C implementation. What do you mean? -- ___ Python tracker ___

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-06 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- pull_requests: +4980 ___ Python tracker ___ ___

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-06 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Humm, this feature doesn't work with C implementation. -- ___ Python tracker ___

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-06 Thread Olivier Grisel
Olivier Grisel added the comment: Thanks for the very helpful feedback and guidance during the review. -- ___ Python tracker

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-06 Thread Antoine Pitrou
Antoine Pitrou added the comment: Definitely! Thank you for your contribution :-) -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-06 Thread Olivier Grisel
Olivier Grisel added the comment: Shall we close this issue now that the PR has been merged to master? -- ___ Python tracker

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-06 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: New changeset 3cd7c6e6eb43dbd7d7180503265772a67953e682 by Serhiy Storchaka (Olivier Grisel) in branch 'master': bpo-31993: Do not allocate large temporary buffers in pickle dump. (#4353)

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-12 Thread Antoine Pitrou
Antoine Pitrou added the comment: Agreed. We shouldn't issue very small writes, but 64 kB is generally considered a reasonable buffer size for many kinds of I/O. Besides, it wouldn't be difficult to make the target frame size configurable if a use case arose for it, but I

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-12 Thread Olivier Grisel
Olivier Grisel added the comment: Flushing the buffer at each frame commit will cause a medium-sized write every 64kB on average (instead of one big write at the end). So that might actually cause a performance regression for some users if the individual file-object

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-12 Thread Antoine Pitrou
Antoine Pitrou added the comment: Framing was originally intended to improve unpickling (since you don't have to issue lots of tiny file reads anymore). No objection to also improve pickling, though :-) -- ___ Python tracker

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-12 Thread Olivier Grisel
Olivier Grisel added the comment: > While we are here, wouldn't be worth to flush the buffer in the C > implementation to the disk always after committing a frame? This will save a > memory when dump a lot of small objects. I think it's a good idea. The C pickler

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-12 Thread Olivier Grisel
Olivier Grisel added the comment: Thanks Antoine, I updated my code to what you suggested. -- ___ Python tracker ___

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-12 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: While we are here, wouldn't be worth to flush the buffer in the C implementation to the disk always after committing a frame? This will save a memory when dump a lot of small objects. --

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: > I would like to update the `write_utf8` function but I would need to find a > way to wrap `const char* data` as a PyBytes instance without making a memory > copy to be able to pass it to my `_Pickle_write_large_bytes`. You should pass it

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-11 Thread Olivier Grisel
Olivier Grisel added the comment: Alright, I found the source of my refcounting bug. I updated the PR to include the C version of the dump for PyBytes. I ran Serhiy's microbenchmarks on the C version and I could not detect any overhead on small bytes objects while I

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel added the comment: I have tried to implement the direct write bypass for the C version of the pickler but I get a segfault in a Py_INCREF on obj during the call to memo_put(self, obj) after the call to _Pickler_write_large_bytes. Here is the diff of

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel added the comment: BTW, I am looking at the C implementation at the moment. I think I can do it. -- ___ Python tracker

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Nice! I have got virtually the same code as your intermediate variant, but your final variant event better! -- ___ Python tracker

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel added the comment: Alright, the last version has now ~4% overhead for small bytes. -- ___ Python tracker ___

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel added the comment: Actually, I think this can still be improved while keeping it readable. Let me try again :) -- ___ Python tracker

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel added the comment: I have pushed a new version of the code that now has a 10% overhead for small bytes (instead of 40% previously). It could be possible to optimize further but I think that would render the code much less readable so I would be

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: This speeds up pickling large bytes objects. $ ./python -m timeit -s 'import pickle; a = [bytes([i%256])*100 for i in range(256)]' 'with open("/dev/null", "wb") as f: pickle._dump(a, f)' Unpatched: 10 loops, best of 5: 20.7

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I'll try to write the C implementation. Maybe it will use other heuristic. -- ___ Python tracker

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel added the comment: In my last comment, I also reported the user times (not spend in OS level disk access stuff): the code of the PR is on the order of 300-400ms while master is around 800ms or more. -- ___

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Actually the time varies too much between runs. 1.641s ... 8.475s ... 12.645s -- ___ Python tracker

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Antoine Pitrou
Antoine Pitrou added the comment: So we're saving memory and CPU time. Cool! -- ___ Python tracker ___

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Olivier Grisel
Olivier Grisel added the comment: More benchmarks with the unix time command: ``` (py37) ogrisel@ici:~/code/cpython$ git checkout master Switched to branch 'master' Your branch is up-to-date with 'origin/master'. (py37) ogrisel@ici:~/code/cpython$ time python

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Olivier Grisel
Olivier Grisel added the comment: Note that the time difference is not significant. I rerun the last command I got: ``` (py37) ogrisel@ici:~/code/cpython$ python ~/tmp/large_pickle_dump.py --use-pypickle Allocating source data... => peak memory usage: 2.014 GB

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Antoine Pitrou
Antoine Pitrou added the comment: But the total runtime is higher? (6 s. vs. 5 s.) Can you post the CPU time? (as measured by `time`, for example) -- ___ Python tracker

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Olivier Grisel
Olivier Grisel added the comment: I wrote a script to monitor the memory when dumping 2GB of data with python master (C pickler and Python pickler): ``` (py37) ogrisel@ici:~/code/cpython$ python ~/tmp/large_pickle_dump.py Allocating source data... => peak memory

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Would be nice to see benchmarks. And what about C version? -- ___ Python tracker ___

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Antoine Pitrou
Change by Antoine Pitrou : -- stage: needs patch -> patch review ___ Python tracker ___ ___

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Antoine Pitrou
Antoine Pitrou added the comment: As for the C pickler, currently it dumps the whole pickle into an internal buffer before calling write() at the end. You may want to make writing more incremental. See Modules/_pickler.c (especially _Pickler_Write()). -- stage:

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- nosy: +serhiy.storchaka ___ Python tracker ___

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Roundup Robot
Change by Roundup Robot : -- keywords: +patch pull_requests: +4309 stage: needs patch -> patch review ___ Python tracker

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Antoine Pitrou
Antoine Pitrou added the comment: Of course, +1 for fixing this. -- ___ Python tracker ___

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Antoine Pitrou
Antoine Pitrou added the comment: You don't need to add a test for a performance enhancement. -- stage: -> needs patch type: resource usage -> performance versions: -Python 3.4, Python 3.5, Python 3.6, Python 3.8 ___ Python tracker

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Olivier Grisel
New submission from Olivier Grisel : I noticed that both pickle.Pickler (C version) and pickle._Pickler (Python version) make unnecessary memory copies when dumping large str, bytes and bytearray objects. This is caused by unnecessary concatenation of the opcode and