Maamoun TK writes:
> Yes, this is exactly how I do it. Four messages arranged vertically in YMM
> registers.
Could you add comments explaining the register layout in a bit more
detail? From this, I take it you use 5 message registers, each one
holding 26 bits from each of 4 messages (in which
On Sun, Apr 2, 2023 at 3:52 PM Niels Möller wrote:
> Maamoun TK writes:
>
> > I apologize for the delays. I pushed a patch that implements 4-way block
> > processing of poly1305 using AVX2 instructions based on radix 26.
> >
> > https://git.lysator.liu.se/nettle/nettle/-/merge_requests/58
>
>
Maamoun TK writes:
> I apologize for the delays. I pushed a patch that implements 4-way block
> processing of poly1305 using AVX2 instructions based on radix 26.
>
> https://git.lysator.liu.se/nettle/nettle/-/merge_requests/58
Let me see if I understand the main idea.
In radix 26 (or rather,
I apologize for the delays. I pushed a patch that implements 4-way block
processing of poly1305 using AVX2 instructions based on radix 26.
https://git.lysator.liu.se/nettle/nettle/-/merge_requests/58
regards.
Mamone
On Sun, Nov 6, 2022 at 8:08 AM Maamoun TK wrote:
> On Fri, Nov 4, 2022 at
On Fri, Nov 4, 2022 at 9:00 AM Niels Möller wrote:
> Maamoun TK writes:
>
> > I got your point. Small macros are easy to handle but maybe adding some
> > in-between comments would also make it easier to go through by reader.
> > please check out my last edit on poly1305.m4
>
> Thanks, I've had
Maamoun TK writes:
> I got your point. Small macros are easy to handle but maybe adding some
> in-between comments would also make it easier to go through by reader.
> please check out my last edit on poly1305.m4
Thanks, I've had a look at
On Mon, Oct 31, 2022 at 10:58 PM Niels Möller wrote:
> Maamoun TK writes:
>
> > Sounds good. I modified the MR to have its own poly1305-blocks.asm with
> > poly1305.m4 file for shared macros.
>
> Nice. I wonder if it's possible to organize the macro as smaller easier
> to understand pieces. I'm
Maamoun TK writes:
> Sounds good. I modified the MR to have its own poly1305-blocks.asm with
> poly1305.m4 file for shared macros.
Nice. I wonder if it's possible to organize the macro as smaller easier
to understand pieces. I'm afraid I have no concrete suggestion; maybe
I'll try something
On Sun, Oct 30, 2022 at 11:26 AM Niels Möller wrote:
> ni...@lysator.liu.se (Niels Möller) writes:
>
> > But I still want to find a way to merge the refactoring branch without
> > breaking the ppc build (in the current state, the branch fails with link
> > errors on ppc).
>
> I think the
On Sun, Oct 30, 2022 at 10:46 AM Niels Möller wrote:
> Maamoun TK writes:
>
> > On Sat, Oct 29, 2022 at 11:31 AM Niels Möller
> wrote:
> >
> >> I think I'd like to merge the multi-block refactoring branch
> >> (refactor-poly1305) before your radix 2^44 code. But that breaks current
> >> power
ni...@lysator.liu.se (Niels Möller) writes:
> But I still want to find a way to merge the refactoring branch without
> breaking the ppc build (in the current state, the branch fails with link
> errors on ppc).
I think the simplest way is to just move _nettle_poly1305_blocks to its
own optional
Maamoun TK writes:
> On Sat, Oct 29, 2022 at 11:31 AM Niels Möller wrote:
>
>> I think I'd like to merge the multi-block refactoring branch
>> (refactor-poly1305) before your radix 2^44 code. But that breaks current
>> power assembly, since that branch currently requires that any assembly
>>
On Sat, Oct 29, 2022 at 11:31 AM Niels Möller wrote:
> Maamoun TK writes:
>
> > I will give multiblock radix-2^64 a try on ppc to examine the result. For
> > now, I'm trying to apply your previous note on radix 44 for ppc to
> improve
> > the speed of reduction phase.
>
> I think I'd like to
Maamoun TK writes:
> I will give multiblock radix-2^64 a try on ppc to examine the result. For
> now, I'm trying to apply your previous note on radix 44 for ppc to improve
> the speed of reduction phase.
I think I'd like to merge the multi-block refactoring branch
(refactor-poly1305) before
David Edelsohn writes:
> Why is it early to use POWER10 instructions? Various BLAS libraries
> have been updated to use POWER10 instructions.
I'm not so familiar with the family of power processors, and I don't
know which versions are the most common in today's running machines.
But in
On Fri, Oct 28, 2022 at 1:30 PM Maamoun TK
wrote:
> On Fri, Oct 28, 2022 at 3:58 PM David Edelsohn wrote:
>
>> On Fri, Oct 28, 2022 at 7:34 AM Maamoun TK
>> wrote:
>>
>>> I created a new merge request for poly1305 multi block implementation
>>>
On Fri, Oct 28, 2022 at 3:58 PM David Edelsohn wrote:
> On Fri, Oct 28, 2022 at 7:34 AM Maamoun TK
> wrote:
>
>> I created a new merge request for poly1305 multi block implementation
>> https://git.lysator.liu.se/nettle/nettle/-/merge_requests/55
>> The patch is based on radix 2^44 with linear
On Fri, Oct 28, 2022 at 7:34 AM Maamoun TK
wrote:
> I created a new merge request for poly1305 multi block implementation
> https://git.lysator.liu.se/nettle/nettle/-/merge_requests/55
> The patch is based on radix 2^44 with linear carry addition for reduction.
> I tried multiple scenarios to
I created a new merge request for poly1305 multi block implementation
https://git.lysator.liu.se/nettle/nettle/-/merge_requests/55
The patch is based on radix 2^44 with linear carry addition for reduction.
I tried multiple scenarios to best performing _nettle_poly1305_blocks on
ppc and got
Maamoun TK writes:
> I did the benchmark on my laptop too. I got a speed of 3964.37 GB/s on
> upstream and 5054.32 GB/s benchmarking poly1305 update on the new branch. I
> wonder if the result numbers are truncated on your end because that would
> keep the improvement on context with my test
On Tue, Oct 25, 2022 at 7:04 PM Maamoun TK
wrote:
> On Mon, Oct 24, 2022 at 9:44 PM Niels Möller wrote:
>
>> Maamoun TK writes:
>>
>> > I think the design could be as simple as always padding each block with
>> > 0x01 in _nettle_poly1305_update while keeping _nettle_poly1305_block
>> that
>> >
On Mon, Oct 24, 2022 at 9:44 PM Niels Möller wrote:
> Maamoun TK writes:
>
> > I think the design could be as simple as always padding each block with
> > 0x01 in _nettle_poly1305_update while keeping _nettle_poly1305_block that
> > is responsible for processing last block takes variable
Maamoun TK writes:
> I think the design could be as simple as always padding each block with
> 0x01 in _nettle_poly1305_update while keeping _nettle_poly1305_block that
> is responsible for processing last block takes variable padding values (0
> or 1). I committed an update in
>
23 matches
Mail list logo