Re: [Arm64, PowerPC64, S390x] Optimize Poly1305

2022-05-03 Thread Maamoun TK
On Tue, May 3, 2022 at 9:26 AM Niels Möller wrote: > Maamoun TK writes: > > > hmm right, didn't cross my mind. I'll add 2^64 -> 2^26 procedure at > > prologue of _nettle_poly1305_4core() and 2^26 -> 2^64 at epilogue to > > workaround this. > > If possible, I think it would be nice to let

Re: [Arm64, PowerPC64, S390x] Optimize Poly1305

2022-05-03 Thread Niels Möller
Maamoun TK writes: > hmm right, didn't cross my mind. I'll add 2^64 -> 2^26 procedure at > prologue of _nettle_poly1305_4core() and 2^26 -> 2^64 at epilogue to > workaround this. If possible, I think it would be nice to let subkeys stored in struct poly1305_ctx stay in radix-2^64, and compute

Re: [Arm64, PowerPC64, S390x] Optimize Poly1305

2022-05-03 Thread Maamoun TK
On Tue, May 3, 2022 at 8:43 AM Niels Möller wrote: > Maamoun TK writes: > > > I've added Poly1305 optimization based on radix 26 using AVX2 extension > for > > x86_64 architecture with fat build support, the patch yields significant > > speedup compared to upstream. > >

Re: [Arm64, PowerPC64, S390x] Optimize Poly1305

2022-05-03 Thread Niels Möller
Maamoun TK writes: > I've added Poly1305 optimization based on radix 26 using AVX2 extension for > x86_64 architecture with fat build support, the patch yields significant > speedup compared to upstream. > https://git.lysator.liu.se/nettle/nettle/-/merge_requests/46 Cool. Do I get it right,