[issue20962] Rather modest chunk size in gzip.GzipFile

2016-04-22 Thread Martin Panter
Martin Panter added the comment: Since there doesn’t seem to be much interest here any more, and the current code has changed and now uses 8 KiB buffering, I am closing this. Although in theory a buffer or chunk size paramter could still be added to the new code if there was a need.

[issue20962] Rather modest chunk size in gzip.GzipFile

2016-04-22 Thread Martin Panter
Changes by Martin Panter : -- Removed message: http://bugs.python.org/msg263972 ___ Python tracker ___

[issue20962] Rather modest chunk size in gzip.GzipFile

2016-04-22 Thread lissacoffey
lissacoffey added the comment: I measured both the total time of the run, the time to process each input records, and time to execute just the seek() call for each record. The bulk of the per-record time was in the call to seek(), so by reducing that time, I sped up my run-times

[issue20962] Rather modest chunk size in gzip.GzipFile

2015-04-18 Thread Martin Panter
Martin Panter added the comment: The gzip (as well as LZMA and bzip) modules should now use buffer and chunk sizes of 8 KiB (= io.DEFAULT_BUFFER_SIZE) for most read() and seek() type operations. I have a patch that adds a buffer_size parameter to the three compression modules if anyone is

[issue20962] Rather modest chunk size in gzip.GzipFile

2015-04-13 Thread Antoine Pitrou
Antoine Pitrou added the comment: Martin, do you think this is still an issue or has it been fixed by the compression refactor? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20962 ___

[issue20962] Rather modest chunk size in gzip.GzipFile

2015-03-15 Thread Martin Panter
Martin Panter added the comment: See also the patch for Issue 23529, which changes over to using BufferedReader for GzipFile, BZ2File and LZMAFile. The current patch there also passes a buffer_size parameter through to BufferedReader, although it currently defaults to io.DEFAULT_BUFFER_SIZE.

[issue20962] Rather modest chunk size in gzip.GzipFile

2014-04-28 Thread William Tisäter
William Tisäter added the comment: That makes sense. I proceeded and updated `Lib/gzip.py` to use `io.DEFAULT_BUFFER_SIZE` instead. This will change the existing behaviour in two ways: * Start using 1024 * 8 as buffer size instead of 1024. * Add one more kwarg (`buffer_size`) to

[issue20962] Rather modest chunk size in gzip.GzipFile

2014-04-28 Thread Antoine Pitrou
Antoine Pitrou added the comment: So I'd suggest, instead of using an hardcoded value, to simply reuse io.DEFAULT_BUFFER_SIZE. That way, if some day we decide to change it, all user code wil benefit from the change. I don't think io.DEFAULT_BUFFER_SIZE makes much sense as a heuristic for

[issue20962] Rather modest chunk size in gzip.GzipFile

2014-04-28 Thread Charles-François Natali
Charles-François Natali added the comment: I don't think io.DEFAULT_BUFFER_SIZE makes much sense as a heuristic for the gzip module (or compressed files in general). Perhaps gzip should get its own DEFAULT_BUFFER_SIZE? Do you mean from a namespace point of vue, or from a performance point

[issue20962] Rather modest chunk size in gzip.GzipFile

2014-04-28 Thread Antoine Pitrou
Antoine Pitrou added the comment: Sure, it might not be optimal for compressed files, but I gues that the optimal value is function of the compression-level block size and many other factors which are just too varied to come up with a reasonable heuristic. Well, I think that compressed

[issue20962] Rather modest chunk size in gzip.GzipFile

2014-04-28 Thread Skip Montanaro
Skip Montanaro added the comment: On Mon, Apr 28, 2014 at 1:59 PM, Antoine Pitrou rep...@bugs.python.org wrote: Well, I think that compressed files in general would benefit from a larger buffer size than plain binary I/O, but that's just a hunch. I agree. When writing my patch, my (perhaps

[issue20962] Rather modest chunk size in gzip.GzipFile

2014-04-28 Thread Charles-François Natali
Charles-François Natali added the comment: That could make sense, dunno. Note that the bz2 module uses a harcoded 8K value. Note that the buffer size should probably be passed to the open() call. Also, the allocation is quite peculiar: it uses an exponential buffer size, starting at a tiny

[issue20962] Rather modest chunk size in gzip.GzipFile

2014-04-28 Thread Skip Montanaro
Skip Montanaro added the comment: On Mon, Apr 28, 2014 at 3:08 PM, Charles-François Natali rep...@bugs.python.org wrote: In short, I think the overall buffering should be rewritten :-) Perhaps so, but I think we should open a separate ticket for that instead of instituting some feature creep

[issue20962] Rather modest chunk size in gzip.GzipFile

2014-04-28 Thread Charles-François Natali
Charles-François Natali added the comment: Perhaps so, but I think we should open a separate ticket for that instead of instituting some feature creep here (no matter how reasonable the concept or its changes would be). Agreed. The patch looks good to me, so feel free to commit! (FWIW, gzip

[issue20962] Rather modest chunk size in gzip.GzipFile

2014-04-27 Thread Charles-François Natali
Charles-François Natali added the comment: William, thanks for the benchmarks. Unfortunately this type of benchmark depends on the hardware (disk, SSD, emmoey bandwitdh, etc). So I'd suggest, instead of using an hardcoded value, to simply reuse io.DEFAULT_BUFFER_SIZE. That way, if some day

[issue20962] Rather modest chunk size in gzip.GzipFile

2014-04-24 Thread William Tisäter
William Tisäter added the comment: I played around with different file and chunk sizes using attached benchmark script. After several test runs I think 1024 * 16 would be the biggest win without losing too many μs on small seeks. You can find my benchmark output here:

[issue20962] Rather modest chunk size in gzip.GzipFile

2014-04-21 Thread Antoine Pitrou
Changes by Antoine Pitrou pit...@free.fr: -- nosy: +nadeem.vawda, serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20962 ___ ___

[issue20962] Rather modest chunk size in gzip.GzipFile

2014-04-20 Thread Ezio Melotti
Ezio Melotti added the comment: You should try with different chunk and file sizes and see what is the best compromise. Tagging as easy in case someone wants to put together a small script to benchmark this (maybe it could even be added to http://hg.python.org/benchmarks/), or even a patch.

[issue20962] Rather modest chunk size in gzip.GzipFile

2014-04-20 Thread Skip Montanaro
Skip Montanaro added the comment: Here's a straightforward patch. I didn't want to change the public API of the module, so just defined the chunk size with a leading underscore. Gzip tests continue to pass. -- keywords: +patch stage: needs patch - patch review Added file:

[issue20962] Rather modest chunk size in gzip.GzipFile

2014-03-17 Thread Skip Montanaro
New submission from Skip Montanaro: I've had the opportunity to use the seek() method of the gzip.GzipFile class for the first time in the past few days. Wondering why it seemed my processing times were so slow, I took a look at the code for seek() and read(). It seems like the chunk size for