Amitay Isaacs writes:
> Compared to the current version in master branch, this version
> definitely improves the performance of the reduction code.
>
> On POWER9, the reduction code shows 7% speed up when tested separately.
>
> The improvement in P256 sign/verify is marginal. Here are the
Hi Niels,
On Tue, 2022-01-04 at 20:54 +0100, Niels Möller wrote:
> ni...@lysator.liu.se (Niels Möller) writes:
>
> > ni...@lysator.liu.se (Niels Möller) writes:
> >
> > > I think it should be possible to reduce number of needed
> > > registers, and
> > > completely avoid using callee-save
ni...@lysator.liu.se (Niels Möller) writes:
> ni...@lysator.liu.se (Niels Möller) writes:
>
>> I think it should be possible to reduce number of needed registers, and
>> completely avoid using callee-save registers (load the values now in
>> U4-U7 one at a time a bit closer to the place where
ni...@lysator.liu.se (Niels Möller) writes:
> Thanks! Merged to master-updates for ci testing.
And now merged to the master branch.
> I think it should be possible to reduce number of needed registers, and
> completely avoid using callee-save registers (load the values now in
> U4-U7 one at a
Amitay Isaacs writes:
> On POWER9, the new code gives ~20% speedup for ecc_secp256r1_redc in
> isolation, and ~1% speedup for ecdsa sign and verify over the earlier
> assembly version.
Thanks! Merged to master-updates for ci testing.
I think it should be possible to reduce number of needed
Hi Niels,
On Mon, 2021-12-06 at 22:29 +0100, Niels Möller wrote:
> ni...@lysator.liu.se (Niels Möller) writes:
>
> > I think the approach should apply to other 64-bit archs (should
> > probably
> > work also on x86_64, where it's sometimes tricky to avoid x86_64
> > instructions clobbering the