On 2021-02-03 Brett Okken wrote: > On Wed, Feb 3, 2021 at 2:56 PM Lasse Collin > <lasse.col...@tukaani.org> wrote: > > It seems to regress horribly if dist is zero. A file with a very > > long sequence of the same byte is good for testing. > > Would this be a valid test of what you are describing? [...] > The source is effectively 160MB of the same byte value.
Yes, it's fine. > I found a strange bit of behavior with this case in the compression. > In LZMAEncoderNormal.calcLongRepPrices, I am seeing a case where > > int len2Limit = Math.min(niceLen, avail - len - 1); > > results in -1, (avail and len are both 8). This results in calling > LZEncoder.getMatchLen with a lenLimit of -1. Is that expected? I didn't check in detail now, but I think it's expected. There are two such places. A speed optimization was forgotten in liblzma from these two places because of this detail. I finally remembered to add the optimization in 5.2.5. On 2021-02-03 Brett Okken wrote: > I still need to do more testing across jdk 8 and 15, but initial > returns on this are pretty positive. The repeating byte file is > meaningfully faster than baseline. One of my test files (image1.dcm) > does not improve much from baseline, but the other 2 files do. The repeating byte is indeed much faster than the baseline. With normal files the speed seems to be about the same as the version I posted, so a minor improvement over the baseline. With a file with two-byte repeat ("ababababababab"...) it's 50 % slower than the baseline. Calling arraycopy in a loop, copying two bytes at a time, is not efficient. I didn't try look how big the copy needs to be to make the overhead of arraycopy smaller than the benefit but clearly it needs to be bigger than two bytes. The use of Arrays.fill to optimize the case of one repeating byte looks useful especially if it won't hurt performance in other situations. Still, I'm not sure yet if the LZDecoder optimizations should go in 1.9. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode