Re: [Arm64, PowerPC64, S390x] Optimize Poly1305

2022-01-24 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > This is the speed I get for C implementations of poly1305_update on my > x86_64 laptop: > > * Radix 26: 1.2 GByte/s (old code) > > * Radix 32: 1.3 GByte/s > > * Radix 64: 2.2 GByte/s > > It would be interesting with benchmarks on actual 32-bit

Re: [PATCH v2 0/6] Add powerpc64 assembly for elliptic curves

2022-01-24 Thread Niels Möller
Amitay Isaacs writes: > I posted the modified codes in the earlier email thread, but I think > posting them as a seperate series will make them easier to cherry pick. Thanks! > V2 changes: > - Use actual register names when storing/restoring from stack > - Drop m4 definitions which are not

Re: [PATCH v2 5/6] ecc: Add powerpc64 assembly for ecc_25519_modp

2022-01-24 Thread Niels Möller
Amitay Isaacs writes: > --- /dev/null > +++ b/powerpc64/ecc-curve25519-modp.asm > @@ -0,0 +1,101 @@ > +C powerpc64/ecc-25519-modp.asm > +define(`RP', `r4') > +define(`XP', `r5') > + > +define(`U0', `r6') C Overlaps unused modulo input > +define(`U1', `r7') > +define(`U2', `r8') > +define(`U3',

Re: [Arm64, PowerPC64, S390x] Optimize Poly1305

2022-01-24 Thread Niels Möller
Maamoun TK writes: > I made a performance test of this patch on the available architectures I > have access to. > > Arm64 (gcc117 gfarm): > * Radix 26: 0.65 GByte/s > * Radix 26 (2-way interleaved): 0.92 GByte/s > * Radix 32: 0.55 GByte/s > * Radix 64: 0.58 GByte/s > POWER9: > * Radix 26: 0.47