On 2024-02-19 Sebastian Andrzej Siewior wrote: > Okay, so the input matters, too. I tried 1GiB urandom (so it does not > compress so well) but that went quicker than expected…
urandom should be incompressible. When LZMA2 cannot compress a chunk it stores it in uncompressed form. Decompression is like "cat with CRC". > I found 3 idle x86 boxes and re-run a test with linux' perf on them > and the arm64 box. I all flavours for the two archives. On RiscV I > did the 'xz -t' thing because perf seems not to be supported well or > I lack access. Great work! Thanks! On IRC one person ran a bunch of tests too. On ARM64 the results were mixed. A variant that was better with GCC could be worse with Clang. So those weren't as clear as your results but they too made me think that using 0 for non-x86-64 is the way to go for 5.6.0. Your x86-64 asm variant results were interesting too. Seems that the bit 0x100 isn't good with GCC although the difference is small. I confirmed this on the tests I did on Celeron G1620 (Ivy Bridge). So I wonder if 0x0F0 should be the x86-64 variant to use in xz 5.6.0 with GCC. On another machine with Clang 16, 0x100 is 8 % faster with Linux kernel source. So the difference is somewhat big. It's still slightly slower than the GCC version. This is on Phenom II X4 920. Since 0x100 is only a little worse with GCC, using it for both GCC and Clang could be OK. An #ifdef __clang__ could be used too but perhaps it's not great in the long term. Something has to be chosen for 5.6.0; further tweaks can be made later. By the way, the "time" command gives more precise results than "xz -v". I use TIMEFORMAT=$'\nreal\t%3R\nuser\t%3U\nsys\t%3S\ncpu%%\t%P' in bash to keep the output as seconds instead of minutes and seconds. -- Lasse Collin