Hi!

On Tue, Nov 29, 2011 at 04:54:33PM +0200, Lasse Collin wrote:
> On 2011-11-29 Stefan Westerfeld wrote:
> > Of course its your code base, and you can use mmap() or not, there
> > are some performance gains, which can to be bought with additional
> > code complexity.
> 
> Your mmap test isn't very realistic because it doesn't touch the
> allocated memory at all. Below is a modified version of your test
> program. 5000 simulates the input file size as bytes. x is there just
> in case to ensure that the memory reads aren't optimized away.
> ...
> 
> The mmap version will still be faster, but the difference isn't so
> enormous anymore. If the mmap in the test program is replaced with
> calloc, it's as slow as malloc+memset on GNU/Linux x86-64, but it may
> very well be as fast as mmap on some other OS.

Yes, I argee that if xz -9 is automatically using xz -6 for files smaller than
8Mb, the mmap optimization is probably not worth the implementation complexity,
because there are two cases

a) input file is > 8Mb: then mmap doesn't help much because the actual
compression will take most of the cpu cycles anyway (and all of the memory will
be written to)

b) input file is <= 8Mb: then mmap doesn't help much, because the hash table
initialization will be quiet fast (due to xz -6).

> Creating a separate xz process for every file wastes even more time
> than the hash table initialization. I tested with this on tmpfs:
>
> My results (times are in seconds):
> 
>                             -1e       -6        -9
>     Separate processes:    39.3     49.8       146
>     Single process:         2.6     14.7        57

Thanks for that hint. I've now rewritten my code to use the python lzma module
instead of using a system ("xz ...") call. That way I can reuse the same
compressor object for all files. And yes, that is a lot faster than calling
xz for each file.

Now, with all the optimizations in place, xz compression is no longer a
problem, so I'm quite happy with the final result. Below is a table with
per-file compression cost in milliseconds.

                            |   use xz -9 only |  use xz -6 for small files
============================+==================+=============================
  call xz via os.system()   |     60.23 ms     |      17.13 ms
  use python lzma module    |      2.31 ms     |       2.24 ms

> > But I think maybe its better to take a step back and see what I was
> > trying to do in the first place: compressing files which vary in
> > size. From the documentation I've found that using levels bigger than
> > -7, -8 and -9 doesn't change anything if the file is small enough. So
> > I can do this:
> > 
> > def xz_level_for_file (filename):
> >   size = os.path.getsize (filename)
> >   if (size <= 8 * 1024 * 1024):
> >     return "-6"
> >   if (size <= 16 * 1024 * 1024):
> >     return "-7"
> >   if (size <= 32 * 1024 * 1024):
> >     return "-8"
> >   return "-9"
> > 
> > in my code before calling xz. This will get around initializing the
> > 64M of memory for small files, and results in quite a bit of a
> > performance improvement in my test (probably even more than using
> > mmap).
> > 
> > It would still be cool if xz could do this automatically (or do it
> > with a special option) so that not every xz user needs to adapt the
> > compression settings according to the file size. Basically, it could
> > detect the file size and adjust the compression level downwards if
> > that will not produce worse results.
> 
> Adding an option to do this shouldn't be too hard. I added it to my
> to-do list.
> 
> At one point it was considered to enable such a feature by default. I'm
> not sure if it is good as a default, because then compressing the same
> data from stdin will produce different output than when the input size
> is known. Usually this doesn't matter, but sometimes it does, so if it
> is a default, there probably needs to be an option to disable it.

Yes, I agree about the stdin problem; you could try to read 32 Mb from stdin
before even starting to compress (because then you can decide on the
compression level), but if the data is being produced slowly this will reduce
overall performance. In any case whether its default or not is not so
important, as long as its there.

   Cu... Stefan
-- 
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan

Reply via email to