On 2024-03-20 Brett Okken wrote:
> The jdk8 changes show nice improvements over head. My assumption is
> that with less math going on in the offsets of the while loop allowed
> the jvm to better optimize.

Sounds good, thanks! :-)

> I am surprised with the binary math behind your handling of long
> comparisons here:

I had to refresh my memory as I hadn't commented it in memcmplen.h. Now
it is (based on Agner Fog's microarchitecture.pdf):

  - On some x86-64 processors (Intel Sandy Bridge to Tiger Lake),
    sub+jz and sub+jnz can be fused but xor+jz or xor+jnz cannot.
    Thus using subtraction has potential to be a tiny amount faster
    since the code checks if the quotient is non-zero.

  - Some processors (Intel Pentium 4) used to have more ALU
    resources for add/sub instructions than and/or/xor.

So in the C code it's not a huge thing and in Java it's probably
about nothing. But there is no real downside to using subtraction.

I understand how xor seems more obvious choice. However, when looking
for the lowest differing bit, subtraction will make that bit 1 and the
bits below it 0. Only the bits above the 1 will differ between
subtraction and xor but those bits are irrelevant here.

I created a new branch, bytearrayview, which combines the CRC64 edits
with the encoder speed changes as they share the ByteArrayView class
(formerly ArrayUtil).

> > I still need to check a few of your edits if some of them should be
> > included. :-)  
> 
> I think the changes to LZMAEncoderNormal as part of this PR to avoid
> the negative length comparison would be good to carry forward.

Done, I hope.

> 1. Use an interface with implementation chosen statically to separate
> out the implementation options.

I had an early version that used separate implementation classes but I
must have done something wrong as that version was *clearly* slower. So
I tried it again and it's as you say, no speed difference. :-)

> 2. Allow specifying the implementation to use with a system property.

Done. I hope it's done in a sensible enough way. The Java < 9 code is
completely separate so it cannot be chosen. The property needs to be
documented somewhere too.

I suppose the ARM64 speed is still to be determined by you or someone
else.

-- 
Lasse Collin

Reply via email to