ni...@lysator.liu.se (Niels Möller) writes:
> This is the speed I get for C implementations of poly1305_update on my
> x86_64 laptop:
>
> * Radix 26: 1.2 GByte/s (old code)
>
> * Radix 32: 1.3 GByte/s
>
> * Radix 64: 2.2 GByte/s
>
> It would be interesting with benchmarks on actual 32-bit
Amitay Isaacs writes:
> I posted the modified codes in the earlier email thread, but I think
> posting them as a seperate series will make them easier to cherry pick.
Thanks!
> V2 changes:
> - Use actual register names when storing/restoring from stack
> - Drop m4 definitions which are not
Amitay Isaacs writes:
> --- /dev/null
> +++ b/powerpc64/ecc-curve25519-modp.asm
> @@ -0,0 +1,101 @@
> +C powerpc64/ecc-25519-modp.asm
> +define(`RP', `r4')
> +define(`XP', `r5')
> +
> +define(`U0', `r6') C Overlaps unused modulo input
> +define(`U1', `r7')
> +define(`U2', `r8')
> +define(`U3',
Maamoun TK writes:
> I made a performance test of this patch on the available architectures I
> have access to.
>
> Arm64 (gcc117 gfarm):
> * Radix 26: 0.65 GByte/s
> * Radix 26 (2-way interleaved): 0.92 GByte/s
> * Radix 32: 0.55 GByte/s
> * Radix 64: 0.58 GByte/s
> POWER9:
> * Radix 26: 0.47