On 2021-01-11 Brett Okken wrote:
> I threw together a quick jmh test, and there is no value in the
> changes to Hash234.

OK, let's forget that then.

On 2021-01-16 Brett Okken wrote:
> I have found a way to use VarHandle byte array access at runtime in
> code which is compile time compatible with jdk 7. So here is an
> updated ArrayUtil class which will use a VarHandle to read long values
> in jdk 9+. If that is not available, it will attempt to use
> sun.misc.Unsafe. If that cannot be found, it falls back to standard
> byte by byte comparison.

Sounds promising. :-) You have already done quite a bit of work in both
writing code and benchmarking. Thank you!

The method you ended up is similar to src/liblzma/common/memcmplen.h
in XZ Utils. There 8-byte version is used on 64-bit systems and 4-byte
version on 32-bit systems. In XZ Utils, SSE2 version (16-byte
comparison) is faster than 4-byte compare on 32-bit x86, but on x86-64
the 8-byte version has similar speed or is faster than the SSE2 version
(it depends on the CPU).

Have you tested with 32-bit Java too? It's quite possible that it's
better to use ints than longs on 32-bit system. If so, that should be
detected at runtime too, I guess.

In XZ Utils the arrays have extra room at the end so that memcmplen.h
can always read 4/8/16 bytes at a time. Since this is easy to do, I
think it should be done in XZ for Java too to avoid special handling of
the last bytes.

> I did add an index bounds check for the unsafe implementation and
> found it had minimal impact on over all performance.

Since Java in general is memory safe, having bound checks with Unsafe is
nice as long as it doesn't hurt performance too much. This

        if (aFromIndex < 0 || aFromIndex + length > a.length ||
            bFromIndex < 0 || bFromIndex + length > b.length) {

is a bit relaxed though since it doesn't catch integer overflows.
Something like this would be more strict:

        if (length < 0 ||
            aFromIndex < 0 || aFromIndex > a.length - length ||
            bFromIndex < 0 || bFromIndex > b.length - length) {

> Using VarHandle (at least on jdk 11) offers very similar performance
> to Unsafe across all 3 files I used for benchmarking.

OK. I cannot comment the details much because I'm not familiar with
either API for now.

Comparing byte arrays as ints or longs results in unaligned/misaligned
memory access. MethodHandles.byteArrayViewVarHandle docs say that this
is OK. A quick web search gave me an impression that it might not be
safe with Unsafe though. Can you verify how it is with Unsafe? If it
isn't allowed, dropping support for Unsafe may be fine. It's just the
older Java versions that would use it anyway.

It is *essential* that the code works well also on archs that don't
have fast unaligned access. Even if the VarHandle method is safe, it's
not clear how the performance is on archs that don't support fast
unaligned access. It would be bad to add an optimization that is good
on x86-64 but counter-productive on some other archs. One may need
arch-specific code just like there is in XZ Utils, although on the
other hand it would be nice to keep the Java code less complicated.

Do you have a way to check how these methods behave on Android and ARM?
(I understand that this might be too much work to check. This may be
skipped.)

I wish to add module-info.java in the next release. Do these new
methods affect what should be in module-info.java? With the current
code this seems to be enough:

    module org.tukaani.xz {
        exports org.tukaani.xz;
    }

>     final int leadingZeros = (int)LEADING_ZEROS.invokeExact(diff);
>     return i + (leadingZeros / Byte.SIZE);

Seems that Java might not optimize that division to a right shift. It
could be better to use "leadingZeros >>> 3".

> I know you said you were not going to be able to work on xz-java for
> awhile, but given these benchmark results, which really exceeded my
> expectations, could this get some priority to release?

I understood that it's 9-18 % faster. That is significant but it's
still a performance optimization only, not an important bug fix, and to
me the code doesn't feel completely ready yet (for example, the
unaligned access is important to get right).

(Compare to the threaded decompression support that is coming to XZ
Utils. It will speed things up a few hundred percent.)

Can you provide a complete patch to make testing easier (or if not
possible, complete copies of modified files)? Also, please try to wrap
the lines so that they stay within 80 columns (with some long
unbreakable strings this may not be possible, then those lines can be
overlong instead of messing up the indentation).

I think your patch will find its way into XZ for Java in some form
but once again I repeat that it will take some time. These XZ projects
are only a hobby for me and currently I don't even turn on my computer
every day.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

Reply via email to