Hi!

On Tue, Nov 29, 2011 at 12:01:39AM +0200, Lasse Collin wrote:
> On 2011-11-28 Thorsten Glaser wrote:
> > Lasse Collin dixit:
> > If xz does indeed know it needs a zero???d allocation and
> > can express that in page sizes (pretty non-portable),
> > _and_ has fallback code for mmap-less architectutes (e.g.
> > several POSIX-for-Windows systems or ancient OSes) then
> > sure. But I???d say, leave malloc speedups to the OS. Or
> > the porter; they should know what they do.
> 
> I'm not interested in playing with mmap in liblzma.
> 
> > (calloc is indeed faster than malloc+memset here for
> > large allocations. About 1750 vs. 20 milliseconds.)
> 
> Add a few thousand random reads and writes, which liblzma will do even
> with small files. Maybe the calloc is so much faster because it just
> mmaps memory and doesn't touch it, so the kernel doesn't physically
> allocate and initialize it either.
> 
> I know that using calloc is the right way to get zeroed allocation. In
> liblzma I have allocations and initializations separated, because it
> allows reusing the existing allocations when (de)compressing many
> streams. I could still use calloc and skip memset as a special case,
> but currently I think it's not worth it at all.

Of course its your code base, and you can use mmap() or not, there are some
performance gains, which can to be bought with additional code complexity.

But I think maybe its better to take a step back and see what I was trying to
do in the first place: compressing files which vary in size. From the
documentation I've found that using levels bigger than -7, -8 and -9 doesn't
change anything if the file is small enough. So I can do this:

def xz_level_for_file (filename):
  size = os.path.getsize (filename)
  if (size <= 8 * 1024 * 1024):
    return "-6"
  if (size <= 16 * 1024 * 1024):
    return "-7"
  if (size <= 32 * 1024 * 1024):
    return "-8"
  return "-9"

in my code before calling xz. This will get around initializing the 64M
of memory for small files, and results in quite a bit of a performance
improvement in my test (probably even more than using mmap).

It would still be cool if xz could do this automatically (or do it with
a special option) so that not every xz user needs to adapt the compression
settings according to the file size. Basically, it could detect the file size
and adjust the compression level downwards if that will not produce worse
results.

   Cu... Stefan
-- 
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan

Reply via email to