On 2011-11-28 Stefan Westerfeld wrote: > Just a thought: could performance be improved if xz requested the > memory via mmap(), like > > char *buffer = (char *) mmap (NULL, 64 * 1024 * 1024, > PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > > I wrote a little test program which seems to indicate that mmap() is > much faster for getting zero initialized memory than malloc() + > memset(). But thats for the case where the application does not > access the memory. For xz the question is how much of the memory will > be accessed, and how much not having to zero-initialize the memory > will save.
With tiny input the memory won't be accessed much. With BT4 match finder, it's one read and one write per uncompressed input byte. Each read and write is a 32-bit integer. Since it's a hash table, it's random access. There are actually three hash tables in BT4, which are allocated at the same time, but the other two tables are small. If you do a few thousand random 32-bit reads and writes, the mmap method can still be faster, but it's not as huge difference as your test makes it look like. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode