On Sun, Mar 29, 2009 at 03:32:20PM -0700, Mark Butler wrote:
> 
>         a0 += ip[0] + (ip[0] >> 32);

That has certain weaknesses too.  What we really want is a 128-bit add
across two 64-bit registers.  If you write the code like this:

        value = ip[0];
        a0 += value;
        if (a0 < value) /* 64-bit overflow implies need to carry */
                a1++;
        b0 += a0;
        if (b0 < a0)
                b1++;

then you get the desired effect.  The pair a1:a0 is the 128-bit sum
of the 64-bit ip[] values, and the pair b1:b0 is the 128-bit sum of
the a1:a0 values.  Best of all, the compiler (at least, our compiler)
is smart enough to detect the carry-detection construct and turn it
into branchless add-with-carry instructions.  Very efficient.

We've been meaning to introduce this "fletcher2c" for some time --
it just got lost in the sea of things to do.  Thanks for the reminder.

Jeff

Reply via email to