On 2024-02-17 Sebastian Andrzej Siewior wrote:
> I did some testing on !x86. I changed LZMA_RANGE_DECODER_CONFIG to
> different values run a test and looked at the MiB/s value. xz_0 means
> LZMA_RANGE_DECODER_CONFIG was 0, xz_1 means the define was set to 1. I
> touched src/liblzma/lzma/lzma_decoder.c and rebuilt xz. I pinned the
> shell to a single CPU and run test for archive (-tv) for one file
> three times.

Great to see testing! The testing method is fine. If pinning to a
single core, I assume --threads=1 was set as well because
multithreading is the default now.

Branchless code can help when branch prediction penalties are high. So
it will depend on the processor (not just the instruction set).

On x86-64, there was a clear improvement with the branchless C code. It
was a little more with Clang than GCC. So if easily possible, also
testing with Clang could be useful. Testing your script on x86-64 could
be worth it too so check that at least on x86-64 you get an improvement
with =1 and =3 compared to =0. (The bit 1 makes the main difference; 2
should have a small effect, and 4 and 8 are questionable and perhaps
not worth benchmarking until the usefulness of =1 or =3 is clear.)

If the branchless C code is not consistent outside x86-64, then 5.6.0
likely should stick to =0. From your results it seems that the other
tweaks to the code provided a minor improvement on non-x86-64 still.
(The tweaks that LZMA_RANGE_DECODER_CONFIG doesn't affect.)

Thanks!

-- 
Lasse Collin

Reply via email to