Re: powerpc ecc 256 redc (was Re: x86_64 ecc_256_redc)

2022-01-10 Thread Niels Möller
Amitay Isaacs writes: > Compared to the current version in master branch, this version > definitely improves the performance of the reduction code. > > On POWER9, the reduction code shows 7% speed up when tested separately. > > The improvement in P256 sign/verify is marginal. Here are the

Re: powerpc ecc 256 redc (was Re: x86_64 ecc_256_redc)

2022-01-09 Thread Amitay Isaacs
Hi Niels, On Tue, 2022-01-04 at 20:54 +0100, Niels Möller wrote: > ni...@lysator.liu.se (Niels Möller) writes: > > > ni...@lysator.liu.se (Niels Möller) writes: > > > > > I think it should be possible to reduce number of needed > > > registers, and > > > completely avoid using callee-save

Re: powerpc ecc 256 redc (was Re: x86_64 ecc_256_redc)

2022-01-04 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > ni...@lysator.liu.se (Niels Möller) writes: > >> I think it should be possible to reduce number of needed registers, and >> completely avoid using callee-save registers (load the values now in >> U4-U7 one at a time a bit closer to the place where

Re: powerpc ecc 256 redc (was Re: x86_64 ecc_256_redc)

2021-12-09 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > Thanks! Merged to master-updates for ci testing. And now merged to the master branch. > I think it should be possible to reduce number of needed registers, and > completely avoid using callee-save registers (load the values now in > U4-U7 one at a

Re: powerpc ecc 256 redc (was Re: x86_64 ecc_256_redc)

2021-12-07 Thread Niels Möller
Amitay Isaacs writes: > On POWER9, the new code gives ~20% speedup for ecc_secp256r1_redc in > isolation, and ~1% speedup for ecdsa sign and verify over the earlier > assembly version. Thanks! Merged to master-updates for ci testing. I think it should be possible to reduce number of needed

Re: powerpc ecc 256 redc (was Re: x86_64 ecc_256_redc)

2021-12-06 Thread Amitay Isaacs
Hi Niels, On Mon, 2021-12-06 at 22:29 +0100, Niels Möller wrote: > ni...@lysator.liu.se (Niels Möller) writes: > > > I think the approach should apply to other 64-bit archs (should > > probably > > work also on x86_64, where it's sometimes tricky to avoid x86_64 > > instructions clobbering the