Hi!

On Mon, Nov 28, 2011 at 05:03:03PM +0200, Lasse Collin wrote:
> On 2011-11-28 Stefan Westerfeld wrote:
> > Now the problem is that for those files I cannot predict the size.
> > Often they will be quite small, but they also could be 100 MB in size
> > or more. So I use xz -9 to get the best compression.
> > 
> > The problem is now that xz takes a lot of time to start:
> > 
> > stefan@ubuntu:/tmp$ time echo "foo" | xz -9 >/dev/null
> > 
> > real    0m0.155s
> > user    0m0.052s
> > sys     0m0.096s
> 
> The match finder hash table has to be initialized. It cannot be avoided.
> The bigger the dictionary, the bigger the hash table. It's about 64 MiB
> when using 64 MiB dictionary (xz -9). With 8 MiB dictionary (xz -6)
> it's about 16 MiB. So at a lower setting the initialization is faster.
> 
> xz allocates much more memory for other things. Most of that memory
> isn't initialized beforehand. Uninitialized memory doesn't cause a
> significant speed penalty because many kernels don't physically allocate
> large allocations before the memory will actually be used.

Just a thought: could performance be improved if xz requested the memory
via mmap(), like

  char *buffer = (char *) mmap (NULL, 64 * 1024 * 1024, PROT_READ|PROT_WRITE, 
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

I wrote a little test program which seems to indicate that mmap() is much
faster for getting zero initialized memory than malloc() + memset(). But thats
for the case where the application does not access the memory. For xz the
question is how much of the memory will be accessed, and how much not having to
zero-initialize the memory will save.

   Cu... Stefan

------ (call with malloc or mmap as argument) -----
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <assert.h>
#include <stdio.h>

int
main (int argc, char **argv)
{
  assert (argc == 2);
  if (strcmp (argv[1], "malloc") == 0)
    {
      void *buffer = malloc (64 * 1024 * 1024);
      memset (buffer, 0, 64 * 1024 * 1024);
    }
  else
    {
      char *buffer = (char *) mmap (NULL, 64 * 1024 * 1024, 
PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    }
}

-- 
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan

Reply via email to