Hi Niels,

On Tue, 2022-01-04 at 20:54 +0100, Niels Möller wrote:
> ni...@lysator.liu.se (Niels Möller) writes:
> 
> > ni...@lysator.liu.se (Niels Möller) writes:
> > 
> > > I think it should be possible to reduce number of needed
> > > registers, and
> > > completely avoid using callee-save registers (load the values now
> > > in
> > > U4-U7 one at a time a bit closer to the place where they are
> > > needed in),
> > > and replace F3 with $1 in the FOLD and FOLDC macros.
> > 
> > Attaching a variant to do this. Passes tests with qemu, but I
> > haven't
> > benchmarked it on any real hardware.
> 
> Would you like to test and benchmark this on relevant real hardware,
> before I merged this version?
> 
> Code still below, and committed to the branch ppc-secp256-tweaks.

Compared to the current version in master branch, this version
definitely improves the performance of the reduction code.

On POWER9, the reduction code shows 7% speed up when tested separately.

The improvement in P256 sign/verify is marginal.  Here are the numbers
from hogweed-benchmark on POWER9.

 
            name size   sign/ms verify/ms
           ecdsa  256   11.1013    3.5713  (master)
           ecdsa  256   11.1527    3.6011  (this patch)


Amitay.
-- 

People on the net are always telling other people to "get a life." It 
would be so much simlper if there were on available under GPL. "If you
use this life, you must tell other people where to get a life of their
own."  - Christopher Davis
_______________________________________________
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to