ni...@lysator.liu.se (Niels Möller) writes:

> If this works,
> FOLD would turn into something like
>
>       sldi    F0, $1, 32
>       srdi    F1, $1, 32
>       subfc   F2, $1, F0
>       addme   F3, F1

I'm looking at a different approach (experimenting on ARM64, which is
quite similar to powerpc, but I don't yet have working code). To
understand what the redc code is doing we need to keep in mind that what
one folding step does is to compute

   <U4,U3,U2,U1,U0> + U0*p 

which cancels the low limb, since p = -1 (mod 2^64). So since the low
limb always cancel, what we need is

   <U4,U3,U2,U1> + U0*((p+1)/2^64) 
 
The x86_64 code does this by splitting U0*p into 2^{256} U0 - (2^{256} -
p) * U0, subtracting in the folding step, and adding in the high part
later. But one doesn't have to do it that way. One could instead use a
FOLD macro that computes

  (2^{192} - 2^{160} + 2^{128} + 2^{32}) U0

I also wonder of there's some way to use carry out from one fold step
and apply it at the right place while preparing the F0,F1,F2,F3 for the next 
step.

Regards,
/Niels

-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to