On 2021-02-02 Brett Okken wrote: > Thus far I have only tested on jdk 11 64bit windows, but the fairly > clear winner is: > > public void update(byte[] buf, int off, int len) { > final int end = off + len; > int i=off; > if (len > 3) { > switch (i & 3) { > case 3: > crc = TABLE[0][(buf[i++] ^ (int) crc) & 0xFF] ^ > (crc >>> 8); > case 2: > crc = TABLE[0][(buf[i++] ^ (int) crc) & 0xFF] ^ > (crc >>> 8); > case 1: > crc = TABLE[0][(buf[i++] ^ (int) crc) & 0xFF] ^ > (crc >>> 8); > }
To ensure (i & 3) == 0 when entering the main loop, the case-labels should be 1-2-3, not 3-2-1. This may have messed up your tests. :-( With a very quick test I didn't see much difference if I changed the case-label order. On 2021-02-02 Brett Okken wrote: > I tested jdk 15 64bit and jdk 11 32bit, client and server and the > above implementation is consistently quite good. > The alternate in running does not do the leading alignment. This > version is really close in 64 bit testing and slightly faster for 32 > bit. The differences are pretty small, and both are noticeably better > than my original proposal (and all 3 are significantly faster than > current). I think I would lead towards the simplicity of not doing the > leading alignment, but I do not have a strong opinion. Let's go with the simpler option. > switch (len & 3) { > case 3: > crc = TABLE[0][(buf[i++] ^ (int) crc) & 0xFF] ^ > (crc >>> 8); I suppose this should use the same (faster) array indexing style as the main loop: crc = TABLE[0][(buf[off++] & 0xFF) ^ ((int)crc & 0xFF)] ^ (crc >>> 8); Also, does it really help to unroll the loop? With 8191-byte buffers I see no significant difference (in a quick not-very-accurate test) if the switch-statement is replaced with a while-loop. With these two changes the code becomes functionally identical to the version I posted with the name "Modified slicing-by-4". Is that an OK version to commit? Is the following fine to you as the file header? Your email address can be omitted if you prefer that. I will mention in the commit message that you adapted the code from XZ Utils and benchmarked it. /* * CRC64 * * Authors: Brett Okken <EMAIL> * Lasse Collin <EMAIL> * * This file has been put into the public domain. * You can do whatever you want with this file. */ Thanks! -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode