[issue36103] Increase shutil.COPY_BUFSIZE

2019-03-01 Thread Inada Naoki
Inada Naoki added the comment: I chose 64 KiB because performance difference between 64 and 128 KiB I can see is only up to 5%. -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker

[issue36103] Increase shutil.COPY_BUFSIZE

2019-03-01 Thread Inada Naoki
Inada Naoki added the comment: New changeset 4f190306186973c1bbfadd3d3f146920856e by Inada Naoki in branch 'master': bpo-36103: change default buffer size of shutil.copyfileobj() (GH-12115) https://github.com/python/cpython/commit/4f190306186973c1bbfadd3d3f146920856e --

[issue36103] Increase shutil.COPY_BUFSIZE

2019-02-28 Thread Inada Naoki
Change by Inada Naoki : -- keywords: +patch pull_requests: +12121 stage: -> patch review ___ Python tracker ___ ___

[issue36103] Increase shutil.COPY_BUFSIZE

2019-02-26 Thread Inada Naoki
Inada Naoki added the comment: Read this file too. http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/ioblksize.h coreutils choose 128KiB for *minimal* buffer size to reduce syscall overhead. In case of shutil, we have Python interpreter overhead adding to syscall overhead. Who has

[issue36103] Increase shutil.COPY_BUFSIZE

2019-02-26 Thread Inada Naoki
Inada Naoki added the comment: > Also on Linux "echo 3 | sudo tee /proc/sys/vm/drop_caches" is supposed to > disable the cache. As I said already, shutil is not used only with cold cache. If cache is cold, disk speed will be upper bound in most cases. But when cache is hot, or using very

[issue36103] Increase shutil.COPY_BUFSIZE

2019-02-26 Thread Giampaolo Rodola'
Giampaolo Rodola' added the comment: @Inada: having played with this in the past I seem to remember that on Linux the bigger bufsize doesn't make a reasonable difference (but I may be wrong), that's why I suggest to try some benchmarks. In issue33671 I pasted some one-liners you can use

[issue36103] Increase shutil.COPY_BUFSIZE

2019-02-25 Thread Inada Naoki
Inada Naoki added the comment: > > desbma added the comment: > > If you do a benchmark by reading from a file, and then writing to /dev/null > several times, without clearing caches, you are measuring *only* the syscall > overhead: > * input data is read from the Linux page cache, not the

[issue36103] Increase shutil.COPY_BUFSIZE

2019-02-25 Thread desbma
desbma added the comment: If you do a benchmark by reading from a file, and then writing to /dev/null several times, without clearing caches, you are measuring *only* the syscall overhead: * input data is read from the Linux page cache, not the file on your SSD itself * no data is written

[issue36103] Increase shutil.COPY_BUFSIZE

2019-02-25 Thread Inada Naoki
Inada Naoki added the comment: > Your first link explains why 128kB buffer size is faster in the context of > cp: it's due to fadvise and kernel read ahead. > > None of the shutil functions call fadvise, so the benchmark and conclusions > are irrelevant to the Python buffer size IMO. Even

[issue36103] Increase shutil.COPY_BUFSIZE

2019-02-25 Thread desbma
desbma added the comment: Your first link explains why 128kB buffer size is faster in the context of cp: it's due to fadvise and kernel read ahead. None of the shutil functions call fadvise, so the benchmark and conclusions are irrelevant to the Python buffer size IMO. In general, the

[issue36103] Increase shutil.COPY_BUFSIZE

2019-02-25 Thread Inada Naoki
New submission from Inada Naoki : shutil.COPY_BUFSIZE is 16KiB on non-Windows platform. But it seems bit small for performance. As this article[1], 128KiB is the best performance on common system. [1]: https://eklitzke.org/efficient-file-copying-on-linux Another resource: EBS document [2]