On 2024-02-29 Brett Okken wrote:
> > Thanks! Ideally there would be one commit to add the minimal
> > portable version, then separate commits for each optimized variant.
> 
> Would you like me to remove the Unsafe based impl from
> https://github.com/tukaani-project/xz-java/pull/13?

There are new commits in master now and those might slightly conflict
with your PR (@Override additions). I'm playing around a bit and
learning about the faster methods still. So right now I don't have
wishes for changes; I don't want to request anything when there's a
possibility that some other way might end up looking more preferable.

In general, I would prefer splitting to more commits. Using your PR as
an example:

  1. Adding the changes to lz/*.java and the portable *Array*.java
     code required by those changes.

  2. Adding one advanced implementation that affects only the
     *Array*.java files.

  3. Repeat step 2. until all implementations are added.

When reasonably possible, the line length should be under 80 chars.

> > So far I have given it only a quick try. array_comp_incremental
> > seems faster than xz-java.git master. Compression time was reduced
> > by about 10 %. :-) This is with OpenJDK 21.0.2, only a quick test,
> > and my computer is old so I don't doubt your higher numbers.  
> 
> How are you testing? I am using jmh, so it has a warm up period before
> actually measuring, giving the jvm plenty of opportunity to perform
> optimizations. If you are doing single shot executions to compress a
> file, that could provide pretty different results.

I was simply timing a XZEncDemo at the default preset (6). I had hoped
that big files (binary and source packages) that take tens of seconds
to compress, repeating each test a few times, would work well enough.
But perhaps the difference is big enough only with certain types of
files.

On 2024-03-05 Brett Okken wrote:
> I have added a comment to the PR with updated benchmark results:
> https://github.com/tukaani-project/xz-java/pull/13#issuecomment-1977705691

Thanks! I'm not sure if I read the results well enough. The "Error"
column seems to have oddly high values on several lines. If the same
test set is run again, are the results in the "Score" column similar
enough between the two runs, retaining the speed order of the
implementations being tested?

If the first file is only ~66KB, I wonder if other factors like
initiazing large arrays in the classes take so much time that
differences in array comparison speeds becomes hard to measure.

When each test is repeated by the benchmarking framework, each run has
to allocate the classes again. Perhaps it might trigger garbage
collection. Did you have ArrayCache enabled?

    ArrayCache.setDefaultCache(BasicArrayCache.getInstance());

I suppose optimizing only for new JDK version(s) would be fine if it
makes things easier. That is, it could be enough that performance
doesn't get worse on Java 8.

If the indirection adds overhead, would it make sense to have a
preprocessing step that creates .java file variants that directly use
the optimized methods? So LZMAEncoder.getInstance could choose at
runtime if it should use LZMAEncoderNormalPortable or
LZMAEncoderNormalUnsafe or some other implementation. That is, if this
cannot be done with multi-release JAR. It's not a pretty solution but if
it is faster then it could be one option, maybe.

Negative lenLimit currently occurs in two places (at least). Perhaps it
should be handled in those places instead of requiring the array
comparison to support it (the C code in liblzma does it like that).

-- 
Lasse Collin

Reply via email to