Re: Multi-block poly1303 code

2023-04-07 Thread Niels Möller
Maamoun TK writes: > Yes, this is exactly how I do it. Four messages arranged vertically in YMM > registers. Could you add comments explaining the register layout in a bit more detail? From this, I take it you use 5 message registers, each one holding 26 bits from each of 4 messages (in which

Re: Multi-block poly1303 code

2023-04-03 Thread Maamoun TK
On Sun, Apr 2, 2023 at 3:52 PM Niels Möller wrote: > Maamoun TK writes: > > > I apologize for the delays. I pushed a patch that implements 4-way block > > processing of poly1305 using AVX2 instructions based on radix 26. > > > > https://git.lysator.liu.se/nettle/nettle/-/merge_requests/58 > >

Re: Multi-block poly1303 code

2023-04-02 Thread Niels Möller
Maamoun TK writes: > I apologize for the delays. I pushed a patch that implements 4-way block > processing of poly1305 using AVX2 instructions based on radix 26. > > https://git.lysator.liu.se/nettle/nettle/-/merge_requests/58 Let me see if I understand the main idea. In radix 26 (or rather,

Re: Multi-block poly1303 code

2023-03-23 Thread Maamoun TK
I apologize for the delays. I pushed a patch that implements 4-way block processing of poly1305 using AVX2 instructions based on radix 26. https://git.lysator.liu.se/nettle/nettle/-/merge_requests/58 regards. Mamone On Sun, Nov 6, 2022 at 8:08 AM Maamoun TK wrote: > On Fri, Nov 4, 2022 at

Re: Multi-block poly1303 code

2022-11-06 Thread Maamoun TK
On Fri, Nov 4, 2022 at 9:00 AM Niels Möller wrote: > Maamoun TK writes: > > > I got your point. Small macros are easy to handle but maybe adding some > > in-between comments would also make it easier to go through by reader. > > please check out my last edit on poly1305.m4 > > Thanks, I've had

Re: Multi-block poly1303 code

2022-11-04 Thread Niels Möller
Maamoun TK writes: > I got your point. Small macros are easy to handle but maybe adding some > in-between comments would also make it easier to go through by reader. > please check out my last edit on poly1305.m4 Thanks, I've had a look at

Re: Multi-block poly1303 code

2022-11-01 Thread Maamoun TK
On Mon, Oct 31, 2022 at 10:58 PM Niels Möller wrote: > Maamoun TK writes: > > > Sounds good. I modified the MR to have its own poly1305-blocks.asm with > > poly1305.m4 file for shared macros. > > Nice. I wonder if it's possible to organize the macro as smaller easier > to understand pieces. I'm

Re: Multi-block poly1303 code

2022-10-31 Thread Niels Möller
Maamoun TK writes: > Sounds good. I modified the MR to have its own poly1305-blocks.asm with > poly1305.m4 file for shared macros. Nice. I wonder if it's possible to organize the macro as smaller easier to understand pieces. I'm afraid I have no concrete suggestion; maybe I'll try something

Re: Multi-block poly1303 code

2022-10-31 Thread Maamoun TK
On Sun, Oct 30, 2022 at 11:26 AM Niels Möller wrote: > ni...@lysator.liu.se (Niels Möller) writes: > > > But I still want to find a way to merge the refactoring branch without > > breaking the ppc build (in the current state, the branch fails with link > > errors on ppc). > > I think the

Re: Multi-block poly1303 code

2022-10-30 Thread Maamoun TK
On Sun, Oct 30, 2022 at 10:46 AM Niels Möller wrote: > Maamoun TK writes: > > > On Sat, Oct 29, 2022 at 11:31 AM Niels Möller > wrote: > > > >> I think I'd like to merge the multi-block refactoring branch > >> (refactor-poly1305) before your radix 2^44 code. But that breaks current > >> power

Re: Multi-block poly1303 code

2022-10-30 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > But I still want to find a way to merge the refactoring branch without > breaking the ppc build (in the current state, the branch fails with link > errors on ppc). I think the simplest way is to just move _nettle_poly1305_blocks to its own optional

Re: Multi-block poly1303 code

2022-10-30 Thread Niels Möller
Maamoun TK writes: > On Sat, Oct 29, 2022 at 11:31 AM Niels Möller wrote: > >> I think I'd like to merge the multi-block refactoring branch >> (refactor-poly1305) before your radix 2^44 code. But that breaks current >> power assembly, since that branch currently requires that any assembly >>

Re: Multi-block poly1303 code

2022-10-29 Thread Maamoun TK
On Sat, Oct 29, 2022 at 11:31 AM Niels Möller wrote: > Maamoun TK writes: > > > I will give multiblock radix-2^64 a try on ppc to examine the result. For > > now, I'm trying to apply your previous note on radix 44 for ppc to > improve > > the speed of reduction phase. > > I think I'd like to

Re: Multi-block poly1303 code

2022-10-29 Thread Niels Möller
Maamoun TK writes: > I will give multiblock radix-2^64 a try on ppc to examine the result. For > now, I'm trying to apply your previous note on radix 44 for ppc to improve > the speed of reduction phase. I think I'd like to merge the multi-block refactoring branch (refactor-poly1305) before

Re: Multi-block poly1303 code

2022-10-29 Thread Niels Möller
David Edelsohn writes: > Why is it early to use POWER10 instructions? Various BLAS libraries > have been updated to use POWER10 instructions. I'm not so familiar with the family of power processors, and I don't know which versions are the most common in today's running machines. But in

Re: Multi-block poly1303 code

2022-10-28 Thread David Edelsohn
On Fri, Oct 28, 2022 at 1:30 PM Maamoun TK wrote: > On Fri, Oct 28, 2022 at 3:58 PM David Edelsohn wrote: > >> On Fri, Oct 28, 2022 at 7:34 AM Maamoun TK >> wrote: >> >>> I created a new merge request for poly1305 multi block implementation >>>

Re: Multi-block poly1303 code

2022-10-28 Thread Maamoun TK
On Fri, Oct 28, 2022 at 3:58 PM David Edelsohn wrote: > On Fri, Oct 28, 2022 at 7:34 AM Maamoun TK > wrote: > >> I created a new merge request for poly1305 multi block implementation >> https://git.lysator.liu.se/nettle/nettle/-/merge_requests/55 >> The patch is based on radix 2^44 with linear

Re: Multi-block poly1303 code

2022-10-28 Thread David Edelsohn
On Fri, Oct 28, 2022 at 7:34 AM Maamoun TK wrote: > I created a new merge request for poly1305 multi block implementation > https://git.lysator.liu.se/nettle/nettle/-/merge_requests/55 > The patch is based on radix 2^44 with linear carry addition for reduction. > I tried multiple scenarios to

Re: Multi-block poly1303 code

2022-10-28 Thread Maamoun TK
I created a new merge request for poly1305 multi block implementation https://git.lysator.liu.se/nettle/nettle/-/merge_requests/55 The patch is based on radix 2^44 with linear carry addition for reduction. I tried multiple scenarios to best performing _nettle_poly1305_blocks on ppc and got

Re: Multi-block poly1303 code

2022-10-25 Thread Niels Möller
Maamoun TK writes: > I did the benchmark on my laptop too. I got a speed of 3964.37 GB/s on > upstream and 5054.32 GB/s benchmarking poly1305 update on the new branch. I > wonder if the result numbers are truncated on your end because that would > keep the improvement on context with my test

Re: Multi-block poly1303 code (was: Re: Fwd: [Arm64, PowerPC64, S390x] Optimize Poly1305)

2022-10-25 Thread Maamoun TK
On Tue, Oct 25, 2022 at 7:04 PM Maamoun TK wrote: > On Mon, Oct 24, 2022 at 9:44 PM Niels Möller wrote: > >> Maamoun TK writes: >> >> > I think the design could be as simple as always padding each block with >> > 0x01 in _nettle_poly1305_update while keeping _nettle_poly1305_block >> that >> >

Re: Multi-block poly1303 code (was: Re: Fwd: [Arm64, PowerPC64, S390x] Optimize Poly1305)

2022-10-25 Thread Maamoun TK
On Mon, Oct 24, 2022 at 9:44 PM Niels Möller wrote: > Maamoun TK writes: > > > I think the design could be as simple as always padding each block with > > 0x01 in _nettle_poly1305_update while keeping _nettle_poly1305_block that > > is responsible for processing last block takes variable

Multi-block poly1303 code (was: Re: Fwd: [Arm64, PowerPC64, S390x] Optimize Poly1305)

2022-10-24 Thread Niels Möller
Maamoun TK writes: > I think the design could be as simple as always padding each block with > 0x01 in _nettle_poly1305_update while keeping _nettle_poly1305_block that > is responsible for processing last block takes variable padding values (0 > or 1). I committed an update in >