I assume you accidentally didn't post to the list so I'm quoting your
email in full.

On 2021-02-02 Brett Okken wrote:
> > while ((i & 3) != 1 && i < end)  
> 
> Shouldn't that be (i & 3) != 0?
> An offset of 0 should not enter this loop, but 0 & 3 does not equal 1.

The idea really is that offset of 1 doesn't enter the loop, thus the
main slicing-by-4 loop is misaligned. I don't know why it makes a
difference and I'm no longer even sure why I decided to try it. You can
try different (i & 3) != { 0, 1, 2, 3 } combinations.

> > If I change the buffer size from 8192 to 8191 in XZDecDemo.java,
> > then "Modified slicing-by-4" somehow becomes as fast as the
> > "Misaligned slicing-by-4". On the surface it sounds weird because
> > the buffer still has the same alignment, it's just one byte smaller
> > at the end.  
> 
> My guess is that this has to do with how many while loops need to be
> executed/optimized.
> Making it one byte smaller guarantees one of the additional while
> loops actually has to execute. Depending on the initial offset,
> potentially both need to execute.

Maybe you are right, but the confusing thing is that those while-loops
are supposedly slower than the for-loop. :-)

> > It would be nice if you could compare these too and suggest what
> > should be committed. Maybe you can figure out an even better
> > version. Different CPU or 32-bit Java or other things may give
> > quite different results.  
> 
> Truncating the crc to an int 1 time in the loop seems like a clear
> winner. I will play with this in my benchmark.
> My benchmark is calculating the crc64 of 8k of random bytes. I will
> change it to include misaligned read as well.

Thanks.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

Reply via email to