> I still need to check a few of your edits if some of them should be
> included. :-)

I think the changes to LZMAEncoderNormal as part of this PR to avoid the
negative length comparison would be good to carry forward. It basically
compares to the MATCH_LEN_MIN instead of 0. This can avoid some (short)
calls to getMatchLen whose results are going to be ignored anyway on the
very next line.
https://github.com/tukaani-project/xz-java/pull/13/commits/544e449446a3d652de2a5f170d197aef695f12ec#diff-e6858bec0a168955b2ad68bbe89af8ab9ca7b9b1ebf2d9b8bc362fb80dab2967

>I pushed basic docs for getMatchLen.

Thanks - that is very helpful.

> I can wait for the summary, thanks.

The jdk8 changes show nice improvements over head. My assumption is that
with less math going on in the offsets of the while loop allowed the jvm to
better optimize.
Benchmark                                        (file)  (preset)  Mode
 Cnt     Score      Error  Units
XZCompressionBenchmark.head             ihe_ovly_pr.dcm         3  avgt
 3     0.617 ±    0.159  ms/op
XZCompressionBenchmark.lasse            ihe_ovly_pr.dcm         3  avgt
 3     0.556 ±    0.163  ms/op
XZCompressionBenchmark.head             ihe_ovly_pr.dcm         6  avgt
 3     2.908 ±    0.346  ms/op
XZCompressionBenchmark.lasse            ihe_ovly_pr.dcm         6  avgt
 3     2.437 ±    0.160  ms/op
XZCompressionBenchmark.head                  image1.dcm         3  avgt
 3  2106.954 ± 1295.185  ms/op
XZCompressionBenchmark.lasse                 image1.dcm         3  avgt
 3  1703.705 ±  482.628  ms/op
XZCompressionBenchmark.head                  image1.dcm         6  avgt
 3  4304.648 ± 1650.114  ms/op
XZCompressionBenchmark.lasse                 image1.dcm         6  avgt
 3  3430.697 ±  129.481  ms/op
XZCompressionBenchmark.head                   large.xml         3  avgt
 3   805.220 ± 1094.696  ms/op
XZCompressionBenchmark.lasse                  large.xml         3  avgt
 3   658.586 ±   31.645  ms/op
XZCompressionBenchmark.head                   large.xml         6  avgt
 3  6743.478 ± 1634.641  ms/op
XZCompressionBenchmark.lasse                  large.xml         6  avgt
 3  5880.570 ±  587.226  ms/op


Defining an interface to defer the implementation to has no impact on
performance. Here are the results of LZUtil using an implementation which
matches what you have for jdk 8.

XZCompressionBenchmark.compress_legacy_loop  ihe_ovly_pr.dcm         3
 avgt    3     0.548 ±    0.056  ms/op
XZCompressionBenchmark.compress_legacy_loop  ihe_ovly_pr.dcm         6
 avgt    3     2.493 ±    0.097  ms/op
XZCompressionBenchmark.compress_legacy_loop       image1.dcm         3
 avgt    3  1720.038 ±  237.015  ms/op
XZCompressionBenchmark.compress_legacy_loop       image1.dcm         6
 avgt    3  3671.539 ± 2016.282  ms/op
XZCompressionBenchmark.compress_legacy_loop        large.xml         3
 avgt    3   667.045 ±  108.601  ms/op
XZCompressionBenchmark.compress_legacy_loop        large.xml         6
 avgt    3  5842.107 ±  552.634  ms/op

> Thanks. I was already tilted towards not using Unsafe and now I'm even
> more. The speed benefit of Unsafe over VarHandle should be tiny enough.
> It feels better that memory safety isn't ignored on any JDK version.

I have no problem with that. The performance differences between Unsafe and
VarHandle are very minimal (and sometimes reverse when bounds checks are
introduced to use of Unsafe).


I am surprised with the binary math behind your handling of long
comparisons here:
https://github.com/tukaani-project/xz-java/compare/master...array_compare#diff-1c6fd3fbd64728f8d99a692827015a1bd7341a0dc651cf6205cc72024e90b065R141-R147
Specifically you are using subtraction instead of XOR (like I did here)
https://github.com/tukaani-project/xz-java/pull/13/files#diff-2fc691ea3e96cf4821f4eceac43919cb659e7ae91b4e6919e35fb25f37439d3dR118-R127

Is there an advantage? By not supporting Unsafe, you do not have to deal
with BigEndian, so that makes this approach possible. I personally find XOR
to more clearly answer the question being asked (which is first byte with
difference). My first reaction was subtraction would not produce the
correct results.


Generally, I really like what you have. I would propose 2 changes:
1. Use an interface with implementation chosen statically to separate out
the implementation options. This makes it much easier to unit test all the
implementations. I also find that it makes the code easier to read/reason
about by being more modular.
2. Allow specifying the implementation to use with a system property. This
would be unlikely to be used outside of benchmarking, but would provide
options for users on unusual hardware.

Brett







On Tue, Mar 12, 2024 at 12:55 PM Lasse Collin <lasse.col...@tukaani.org>
wrote:

> On 2024-03-12 Brett Okken wrote:
> > I am still working on digesting your branch.
>
> I still need to check a few of your edits if some of them should be
> included. :-)
>
> > The difference in method signature is subtle, but I think a key part
> > of the improvements you are getting. Could you add javadoc to more
> > clearly describe how the args are to be interpreted and what the
> > return value means?
>
> I pushed basic docs for getMatchLen.
>
> Once crc64_varhandle2 is merged then array_compare should use ArrayUtil
> too. It doesn't make a difference in speed.
>
> > I am playing with manually unrolling the java 8 byte-by-byte impl
> > along with tests comparing unsafe, var handle, and vector approaches.
> > These tests take a long time to run, so it will be a couple days
> > before I have complete results. Do you want data as I have it (and it
> > is interesting), or wait for summary?
>
> I can wait for the summary, thanks.
>
> > I am not sure when I will get opportunity to test out arm64.
>
> If someone has, for example, a Raspberry Pi, the compression of zeros
> test is simple enough to do and at least on x86-64 has clear enough
> difference. It's an over-simplified test but it's a data point still.
>
> > I do have some things still on jdk 8, but only decompression. Surveys
> > seem to indicate quite a bit of jdk 8 still in use, but I have no
> > personal need.
>
> Thanks. I was already tilted towards not using Unsafe and now I'm even
> more. The speed benefit of Unsafe over VarHandle should be tiny enough.
> It feels better that memory safety isn't ignored on any JDK version. If
> a bug was found, it's nicer to not wonder if Unsafe had a role in it.
> This is better for security too.
>
> In my previous email I wondered if using Unsafe only with Java 8 would
> make upgrading to newer JDK look bad if newer JDK used VarHandle
> instead of Unsafe. Perhaps that worry was overblown. But the other
> reasons and keeping the code simpler make me want to avoid Unsafe.
>
> (C code via JNI wouldn't be memory safe but then the speed benefits
> should be much more significant too.)
>
> --
> Lasse Collin
>

Reply via email to