> I still need to check a few of your edits if some of them should be > included. :-)
I think the changes to LZMAEncoderNormal as part of this PR to avoid the negative length comparison would be good to carry forward. It basically compares to the MATCH_LEN_MIN instead of 0. This can avoid some (short) calls to getMatchLen whose results are going to be ignored anyway on the very next line. https://github.com/tukaani-project/xz-java/pull/13/commits/544e449446a3d652de2a5f170d197aef695f12ec#diff-e6858bec0a168955b2ad68bbe89af8ab9ca7b9b1ebf2d9b8bc362fb80dab2967 >I pushed basic docs for getMatchLen. Thanks - that is very helpful. > I can wait for the summary, thanks. The jdk8 changes show nice improvements over head. My assumption is that with less math going on in the offsets of the while loop allowed the jvm to better optimize. Benchmark (file) (preset) Mode Cnt Score Error Units XZCompressionBenchmark.head ihe_ovly_pr.dcm 3 avgt 3 0.617 ± 0.159 ms/op XZCompressionBenchmark.lasse ihe_ovly_pr.dcm 3 avgt 3 0.556 ± 0.163 ms/op XZCompressionBenchmark.head ihe_ovly_pr.dcm 6 avgt 3 2.908 ± 0.346 ms/op XZCompressionBenchmark.lasse ihe_ovly_pr.dcm 6 avgt 3 2.437 ± 0.160 ms/op XZCompressionBenchmark.head image1.dcm 3 avgt 3 2106.954 ± 1295.185 ms/op XZCompressionBenchmark.lasse image1.dcm 3 avgt 3 1703.705 ± 482.628 ms/op XZCompressionBenchmark.head image1.dcm 6 avgt 3 4304.648 ± 1650.114 ms/op XZCompressionBenchmark.lasse image1.dcm 6 avgt 3 3430.697 ± 129.481 ms/op XZCompressionBenchmark.head large.xml 3 avgt 3 805.220 ± 1094.696 ms/op XZCompressionBenchmark.lasse large.xml 3 avgt 3 658.586 ± 31.645 ms/op XZCompressionBenchmark.head large.xml 6 avgt 3 6743.478 ± 1634.641 ms/op XZCompressionBenchmark.lasse large.xml 6 avgt 3 5880.570 ± 587.226 ms/op Defining an interface to defer the implementation to has no impact on performance. Here are the results of LZUtil using an implementation which matches what you have for jdk 8. XZCompressionBenchmark.compress_legacy_loop ihe_ovly_pr.dcm 3 avgt 3 0.548 ± 0.056 ms/op XZCompressionBenchmark.compress_legacy_loop ihe_ovly_pr.dcm 6 avgt 3 2.493 ± 0.097 ms/op XZCompressionBenchmark.compress_legacy_loop image1.dcm 3 avgt 3 1720.038 ± 237.015 ms/op XZCompressionBenchmark.compress_legacy_loop image1.dcm 6 avgt 3 3671.539 ± 2016.282 ms/op XZCompressionBenchmark.compress_legacy_loop large.xml 3 avgt 3 667.045 ± 108.601 ms/op XZCompressionBenchmark.compress_legacy_loop large.xml 6 avgt 3 5842.107 ± 552.634 ms/op > Thanks. I was already tilted towards not using Unsafe and now I'm even > more. The speed benefit of Unsafe over VarHandle should be tiny enough. > It feels better that memory safety isn't ignored on any JDK version. I have no problem with that. The performance differences between Unsafe and VarHandle are very minimal (and sometimes reverse when bounds checks are introduced to use of Unsafe). I am surprised with the binary math behind your handling of long comparisons here: https://github.com/tukaani-project/xz-java/compare/master...array_compare#diff-1c6fd3fbd64728f8d99a692827015a1bd7341a0dc651cf6205cc72024e90b065R141-R147 Specifically you are using subtraction instead of XOR (like I did here) https://github.com/tukaani-project/xz-java/pull/13/files#diff-2fc691ea3e96cf4821f4eceac43919cb659e7ae91b4e6919e35fb25f37439d3dR118-R127 Is there an advantage? By not supporting Unsafe, you do not have to deal with BigEndian, so that makes this approach possible. I personally find XOR to more clearly answer the question being asked (which is first byte with difference). My first reaction was subtraction would not produce the correct results. Generally, I really like what you have. I would propose 2 changes: 1. Use an interface with implementation chosen statically to separate out the implementation options. This makes it much easier to unit test all the implementations. I also find that it makes the code easier to read/reason about by being more modular. 2. Allow specifying the implementation to use with a system property. This would be unlikely to be used outside of benchmarking, but would provide options for users on unusual hardware. Brett On Tue, Mar 12, 2024 at 12:55 PM Lasse Collin <lasse.col...@tukaani.org> wrote: > On 2024-03-12 Brett Okken wrote: > > I am still working on digesting your branch. > > I still need to check a few of your edits if some of them should be > included. :-) > > > The difference in method signature is subtle, but I think a key part > > of the improvements you are getting. Could you add javadoc to more > > clearly describe how the args are to be interpreted and what the > > return value means? > > I pushed basic docs for getMatchLen. > > Once crc64_varhandle2 is merged then array_compare should use ArrayUtil > too. It doesn't make a difference in speed. > > > I am playing with manually unrolling the java 8 byte-by-byte impl > > along with tests comparing unsafe, var handle, and vector approaches. > > These tests take a long time to run, so it will be a couple days > > before I have complete results. Do you want data as I have it (and it > > is interesting), or wait for summary? > > I can wait for the summary, thanks. > > > I am not sure when I will get opportunity to test out arm64. > > If someone has, for example, a Raspberry Pi, the compression of zeros > test is simple enough to do and at least on x86-64 has clear enough > difference. It's an over-simplified test but it's a data point still. > > > I do have some things still on jdk 8, but only decompression. Surveys > > seem to indicate quite a bit of jdk 8 still in use, but I have no > > personal need. > > Thanks. I was already tilted towards not using Unsafe and now I'm even > more. The speed benefit of Unsafe over VarHandle should be tiny enough. > It feels better that memory safety isn't ignored on any JDK version. If > a bug was found, it's nicer to not wonder if Unsafe had a role in it. > This is better for security too. > > In my previous email I wondered if using Unsafe only with Java 8 would > make upgrading to newer JDK look bad if newer JDK used VarHandle > instead of Unsafe. Perhaps that worry was overblown. But the other > reasons and keeping the code simpler make me want to avoid Unsafe. > > (C code via JNI wouldn't be memory safe but then the speed benefits > should be much more significant too.) > > -- > Lasse Collin >