Re: [xz-devel] java crc64 implementation

Lasse Collin Tue, 02 Feb 2021 08:59:51 -0800

Hello!

I need to make a new release in the near future so that a minor problem
can be fixed in .7z support in Apache Commons Compress. I thought I
could include simpler and safer changes from your long list of patches
and the CRC64 improvement might be such.

On 2021-01-21 Brett Okken wrote:
> Here is a slice by 4 implementation. It goes byte by byte to easily be
> compatible with older jdks. Performance wise, it is pretty comparable
> to the java port of Adler's stackoverflow implementation:
> 
> Benchmark                     Mode  Cnt      Score     Error  Units
> Hash64Benchmark.adler         avgt    5   6850.172 ± 251.528  ns/op
> Hash64Benchmark.crc64         avgt    5  16347.986 ±  53.702  ns/op
> Hash64Benchmark.slice4        avgt    5   6842.010 ± 393.149  ns/op

Thank you!

I played around a bit. Seems that the code is *really* sensitive to tiny
changes. It's possible that it depends on the computer and such things
too; I only tried on one machine.

I timed decompression of gigabyte of null bytes using XZDecDemo and
OpenJDK 15 on x86-64. This isn't very accurate but it's enough to sort
them:

    Original                    6.8 s
    Modified original           6.2 s
    Your slicing-by-4           5.8 s
    Modified slicing-by-4       5.6 s
    Misaligned slicing-by-4     5.2 s
    xz -t                       3.6 s

Modified original:

--- a/src/org/tukaani/xz/check/CRC64.java
+++ b/src/org/tukaani/xz/check/CRC64.java
@@ -38,7 +38,8 @@ public class CRC64 extends Check {
         int end = off + len;

         while (off < end)
-            crc = crcTable[(buf[off++] ^ (int)crc) & 0xFF] ^ (crc >>> 8);
+            crc = crcTable[(buf[off++] & 0xFF) ^ ((int)crc & 0xFF)]
+                  ^ (crc >>> 8);
     }

     public byte[] finish() {

Modified slicing-by-4:

    public void update(byte[] buf, int off, int len) {
        final int end = off + len;
        int i = off;

        for (int end4 = end - 3; i < end4; i += 4) {
            final int tmp = (int)crc;
            crc = TABLE[3][(tmp & 0xFF) ^ (buf[i] & 0xFF)] ^
                  TABLE[2][((tmp >>> 8) & 0xFF) ^ (buf[i + 1] & 0XFF)] ^
                  (crc >>> 32) ^
                  TABLE[1][((tmp >>> 16) & 0xFF) ^ (buf[i + 2] & 0XFF)] ^
                  TABLE[0][((tmp >>> 24) & 0xFF) ^ (buf[i + 3] & 0XFF)];
        }

        while (i < end)
            crc = TABLE[0][(buf[i++] & 0xFF) ^ ((int)crc & 0xFF)] ^
                  (crc >>> 8);
    }

Misaligned slicing-by-4 adds an extra while-loop to the beginning:

    public void update(byte[] buf, int off, int len) {
        final int end = off + len;
        int i = off;

        while ((i & 3) != 1 && i < end)
            crc = TABLE[0][(buf[i++] & 0xFF) ^ ((int)crc & 0xFF)] ^
                  (crc >>> 8);

        for (int end4 = end - 3; i < end4; i += 4) {
            final int tmp = (int)crc;
            crc = TABLE[3][(tmp & 0xFF) ^ (buf[i] & 0xFF)] ^
                  TABLE[2][((tmp >>> 8) & 0xFF) ^ (buf[i + 1] & 0XFF)] ^
                  (crc >>> 32) ^
                  TABLE[1][((tmp >>> 16) & 0xFF) ^ (buf[i + 2] & 0XFF)] ^
                  TABLE[0][((tmp >>> 24) & 0xFF) ^ (buf[i + 3] & 0XFF)];
        }

        while (i < end)
            crc = TABLE[0][(buf[i++] & 0xFF) ^ ((int)crc & 0xFF)] ^
                  (crc >>> 8);
    }

If I change the buffer size from 8192 to 8191 in XZDecDemo.java, then
"Modified slicing-by-4" somehow becomes as fast as the "Misaligned
slicing-by-4". On the surface it sounds weird because the buffer still
has the same alignment, it's just one byte smaller at the end.

The same thing happens too if the buffer size is kept at 8192 but first
byte isn't used (making the beginning of the buffer misaligned).

Moving the "(crc32 >> 32)" to a different position in the xor sequence
can affect things too... it's almost spooky. ;-)

It would be nice if you could compare these too and suggest what should
be committed. Maybe you can figure out an even better version.
Different CPU or 32-bit Java or other things may give quite different
results.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

Re: [xz-devel] java crc64 implementation

Reply via email to