On 2021-01-13 Brett Okken wrote:
> Mark Adler has posted an optimized crc64 implementation on
> stackoverflow[1]. This can be reasonably easily ported to java (that
> post has a link to java impl on github[2] which warrants a little
> clean up, but gives a decent idea).
> 
> I did a quick benchmark calculating the crc64 over 8KB and the results
> were impressive:
> 
> Benchmark              Mode  Cnt      Score    Error  Units
> Hash64Benchmark.adler  avgt    5   6908.677 ± 47.790  ns/op
> Hash64Benchmark.crc64  avgt    5  16343.091 ± 64.089  ns/op

The CRC64 implementation in XZ for Java is indeed a basic version. I
wanted to keep things simple in the beginning and didn't think about it
much later since the Java version of XZ is slower than C version for
other reasons anyway.

In XZ Utils, slicing-by-4 method is used for CRC64 and slicing-by-8
for CRC32. A reason for not using by-8 for CRC64 is to reduce CPU L1
cache usage: by-4 with CRC64 needs 8 KiB lookup table, by-8 needs 16
KiB. Micro-benchmarking with big table can look good but when the CRC
is just a small part of the application the results are more
complicated (more cache misses to load the bigger table, more other data
pushed out of cache). It is essential to note that the decisions about
table sizes were made over a decade ago with 32-bit CPUs and it's very
much possible that different decisions would be better nowadays.

The version by Mark Adler [1] uses slicing-by-8 with CRC64. It also
includes a method to combine the CRC values of two blocks which is
great if one uses threads to compute a CRC. Threaded CRC doesn't sound
useful with XZ since LZMA isn't that fast anyway.

A side note: GNU gzip uses the basic method for CRC32 [3] while zlib
uses slicing-by-8. Since Deflate is fast to decode, replacing the CRC32
in GNU gzip would make a clear difference in decompression speed.

[3] http://git.savannah.gnu.org/cgit/gzip.git/tree/util.c#n126

> [1] -
> https://stackoverflow.com/questions/20562546/how-to-get-crc64-distributed-calculation-use-its-linearity-property/20579405#20579405
> 
> [2] -
> https://github.com/MrBuddyCasino/crc-64/blob/master/crc-64/src/main/java/net/boeckling/crc/CRC64.java

I didn't find license information from the [2] repository. XZ for Java
is public domain so the license likely wouldn't match anyway.

Porting from XZ Utils shouldn't be too hard, depending on how much one
wishes to optimize it.
  - src/liblzma/check/crc64_fast.c
  - src/liblzma/check/crc_macros.h
  - src/liblzma/check/crc64_tablegen.c (or should it just include
    pre-computed tables like liblzma and zlib do?)

Unlike the C version in [1], the Java version in [2] reads the input
byte[] array byte-by-byte. Using a fast method to read 8 *aligned*
bytes at a time in native byte order should give more speed; after all,
it's one of the benefits of this method that one can read multiple
input bytes at a time.

A public domain patch for a faster CRC64 to XZ for Java is welcome.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

Reply via email to