On Sun, Mar 29, 2009 at 03:32:20PM -0700, Mark Butler wrote: > > a0 += ip[0] + (ip[0] >> 32);
That has certain weaknesses too. What we really want is a 128-bit add across two 64-bit registers. If you write the code like this: value = ip[0]; a0 += value; if (a0 < value) /* 64-bit overflow implies need to carry */ a1++; b0 += a0; if (b0 < a0) b1++; then you get the desired effect. The pair a1:a0 is the 128-bit sum of the 64-bit ip[] values, and the pair b1:b0 is the 128-bit sum of the a1:a0 values. Best of all, the compiler (at least, our compiler) is smart enough to detect the carry-detection construct and turn it into branchless add-with-carry instructions. Very efficient. We've been meaning to introduce this "fletcher2c" for some time -- it just got lost in the sea of things to do. Thanks for the reminder. Jeff