Re: [Arm64, PowerPC64, S390x] Optimize Poly1305

2022-01-25 Thread Maamoun TK
On Tue, Jan 25, 2022 at 10:24 PM Niels Möller wrote: > Maamoun TK writes: > > > It looks like wider multiplication would achieve higher speed on > different > > aarch64 instance on gfarm. Here are the numbers on gcc185 instance: > > > > * Radix 26: 0.83 GByte/s > > * Radix 26 (2-way

Re: [Arm64, PowerPC64, S390x] Optimize Poly1305

2022-01-25 Thread Niels Möller
Maamoun TK writes: > It looks like wider multiplication would achieve higher speed on different > aarch64 instance on gfarm. Here are the numbers on gcc185 instance: > > * Radix 26: 0.83 GByte/s > * Radix 26 (2-way interleaved): 0.70 GByte/s > * Radix 64 (Latest version): 1.25 GByte/s > > These

Re: [Arm64, PowerPC64, S390x] Optimize Poly1305

2022-01-25 Thread Maamoun TK
On Mon, Jan 24, 2022 at 12:58 AM David Edelsohn wrote: > On Sun, Jan 23, 2022 at 4:41 PM Maamoun TK > wrote: > > > > On Sun, Jan 23, 2022 at 9:10 PM Niels Möller > wrote: > > > > > ni...@lysator.liu.se (Niels Möller) writes: > > > > > > > The current C implementation uses radix 26, and 25