On Thu, Feb 11, 2021 at 12:51 PM Lasse Collin <lasse.col...@tukaani.org> wrote: > > On 2021-02-05 Brett Okken wrote: > > I worked this out last night. We need to double how much we copy each > > time by not advancing "back". This actually works even better than > > Arrays.fill for the single byte case also. > > This clearly is a good idea in a Java implementation. :-) > > I still worry about short copies. If the file is full of tiny > matches/repeats of 1-3 bytes or so, arraycopy can be slower. Such files > aren't typical at all but I don't want to add a corner case where the > performance drops too much.
Do you have examples of such files, or code on how to generate one? The case of a 1 byte match/repeat is optimized for in my latest patch, as providing that optimization did provide noticeable improvement in the (real) files I have been testing with. While I did observe some 2 and 3 byte repeats, it appears to not be common enough to negatively impact overall throughput. It would be quite helpful to have some examples so that we can at least quantify the impact. > I came up with the following. I haven't decided yet if I like it. On the 3 files I have been testing with, this change is a mixed bag. Compared to trunk 1 regresses by ~8%. While the other 2 do improve, neither are better than my last patch. jdk 11 64 bit TRUNK (file) Mode Cnt Score Error Units ihe_ovly_pr.dcm avgt 3 0.662 ± 0.012 ms/op image1.dcm avgt 3 391.644 ± 13.871 ms/op large.xml avgt 3 225.456 ± 16.265 ms/op (okken last) ihe_ovly_pr.dcm avgt 3 0.607 ± 0.187 ms/op image1.dcm avgt 3 369.419 ± 32.400 ms/op large.xml avgt 3 190.580 ± 7.856 ms/op (lasse new) (file) Mode Cnt Score Error Units ihe_ovly_pr.dcm avgt 3 0.628 ± 0.066 ms/op image1.dcm avgt 3 424.159 ± 14.823 ms/op large.xml avgt 3 192.763 ± 6.831 ms/op I was able to improve this a bit by pulling the handling of small copies outside of the while loop. This eliminates the regressions compared to trunk, but still does not feel like an improvement over my last patch. (lasse + outer switch) (file) Mode Cnt Score Error Units ihe_ovly_pr.dcm avgt 3 0.633 ± 0.032 ms/op image1.dcm avgt 3 390.868 ± 40.598 ms/op large.xml avgt 3 190.030 ± 2.619 ms/op int back = pos - dist - 1; if (back < 0) { // We won't get here if the dictionary isn't full. assert full == bufSize; // The distance wraps around to the end of the cyclic dictionary // buffer. Here we will never copy more than dist + 1 bytes // and so the copying won't repeat from its own output. Thus, // we can always use arraycopy safely. back += bufSize; int copySize = Math.min(bufSize - back, left); assert copySize <= dist + 1; System.arraycopy(buf, back, buf, pos, copySize); pos += copySize; back = 0; left -= copySize; if (left == 0) return; } assert back < pos; assert left > 0; int copySize = Math.min(left, pos - back); switch(copySize) { case 3: buf[pos + 2] = buf[back + 2]; case 2: buf[pos + 1] = buf[back + 1]; case 1: buf[pos] = buf[back]; break; default: System.arraycopy(buf, back, buf, pos, copySize); } pos += copySize; left -= copySize; while (left > 0) { copySize = Math.min(left, copySize << 1); System.arraycopy(buf, back, buf, pos, copySize); pos += copySize; left -= copySize; }