On 2024-03-20 Brett Okken wrote: > The jdk8 changes show nice improvements over head. My assumption is > that with less math going on in the offsets of the while loop allowed > the jvm to better optimize.
Sounds good, thanks! :-) > I am surprised with the binary math behind your handling of long > comparisons here: I had to refresh my memory as I hadn't commented it in memcmplen.h. Now it is (based on Agner Fog's microarchitecture.pdf): - On some x86-64 processors (Intel Sandy Bridge to Tiger Lake), sub+jz and sub+jnz can be fused but xor+jz or xor+jnz cannot. Thus using subtraction has potential to be a tiny amount faster since the code checks if the quotient is non-zero. - Some processors (Intel Pentium 4) used to have more ALU resources for add/sub instructions than and/or/xor. So in the C code it's not a huge thing and in Java it's probably about nothing. But there is no real downside to using subtraction. I understand how xor seems more obvious choice. However, when looking for the lowest differing bit, subtraction will make that bit 1 and the bits below it 0. Only the bits above the 1 will differ between subtraction and xor but those bits are irrelevant here. I created a new branch, bytearrayview, which combines the CRC64 edits with the encoder speed changes as they share the ByteArrayView class (formerly ArrayUtil). > > I still need to check a few of your edits if some of them should be > > included. :-) > > I think the changes to LZMAEncoderNormal as part of this PR to avoid > the negative length comparison would be good to carry forward. Done, I hope. > 1. Use an interface with implementation chosen statically to separate > out the implementation options. I had an early version that used separate implementation classes but I must have done something wrong as that version was *clearly* slower. So I tried it again and it's as you say, no speed difference. :-) > 2. Allow specifying the implementation to use with a system property. Done. I hope it's done in a sensible enough way. The Java < 9 code is completely separate so it cannot be chosen. The property needs to be documented somewhere too. I suppose the ARM64 speed is still to be determined by you or someone else. -- Lasse Collin