On 2024-02-19 Sebastian Andrzej Siewior wrote:
> Okay, so the input matters, too. I tried 1GiB urandom (so it does not
> compress so well) but that went quicker than expected…

urandom should be incompressible. When LZMA2 cannot compress a chunk it
stores it in uncompressed form. Decompression is like "cat with CRC".

> I found 3 idle x86 boxes and re-run a test with linux' perf on them
> and the arm64 box. I all flavours for the two archives. On RiscV I
> did the 'xz -t' thing because perf seems not to be supported well or
> I lack access.

Great work! Thanks!

On IRC one person ran a bunch of tests too. On ARM64 the results were
mixed. A variant that was better with GCC could be worse with Clang. So
those weren't as clear as your results but they too made me think that
using 0 for non-x86-64 is the way to go for 5.6.0.

Your x86-64 asm variant results were interesting too. Seems that the bit
0x100 isn't good with GCC although the difference is small. I confirmed
this on the tests I did on Celeron G1620 (Ivy Bridge). So I wonder if
0x0F0 should be the x86-64 variant to use in xz 5.6.0 with GCC.

On another machine with Clang 16, 0x100 is 8 % faster with Linux kernel
source. So the difference is somewhat big. It's still slightly slower
than the GCC version. This is on Phenom II X4 920.

Since 0x100 is only a little worse with GCC, using it for both GCC and
Clang could be OK. An #ifdef __clang__ could be used too but perhaps
it's not great in the long term. Something has to be chosen for 5.6.0;
further tweaks can be made later.

By the way, the "time" command gives more precise results than "xz -v".
I use

    TIMEFORMAT=$'\nreal\t%3R\nuser\t%3U\nsys\t%3S\ncpu%%\t%P'

in bash to keep the output as seconds instead of minutes and seconds.

-- 
Lasse Collin

Reply via email to