On Wed, Nov 22, 2023 at 1:50 PM Niels Möller wrote:
> David Edelsohn writes:
>
> > Calls impose a lot of overhead on Power.
>
> Thanks, that's good to know.
>
> > And both the efficient loop instruction and the preferred indirect call
> > instruction use the CTR register.
>
> That's one thing I
David Edelsohn writes:
> Calls impose a lot of overhead on Power.
Thanks, that's good to know.
> And both the efficient loop instruction and the preferred indirect call
> instruction use the CTR register.
That's one thing I wonder after having a closer look at the AES loops.
One rather common
On Wed, Nov 22, 2023 at 10:37 AM Danny Tsen wrote:
>
>
> > On Nov 22, 2023, at 2:27 AM, Niels Möller wrote:
> >
> > Danny Tsen writes:
> >
> >> Interleaving at the instructions level may be a good option but due to
> >> PPC instruction pipeline this may need to have sufficient
> >> registers/ve
> On Nov 22, 2023, at 2:27 AM, Niels Möller wrote:
>
> Danny Tsen writes:
>
>> Interleaving at the instructions level may be a good option but due to
>> PPC instruction pipeline this may need to have sufficient
>> registers/vectors. Use same vectors to change contents in successive
>> instruc
Danny Tsen writes:
> Interleaving at the instructions level may be a good option but due to
> PPC instruction pipeline this may need to have sufficient
> registers/vectors. Use same vectors to change contents in successive
> instructions may require more cycles. In that case, more
> vectors/scala
> From: Niels Möller
> Sent: Tuesday, November 21, 2023 1:07 PM
> To: Danny Tsen
> Cc: nettle-bugs@lists.lysator.liu.se ;
> George Wilson
> Subject: [EXTERNAL] Re: Fw: ppc64: AES/GCM Performance improvement with
> stitched im
de dosen't call
_ghash_update. But I guess I can use m4 macro instead.
Thanks.
-Danny
From: Niels Möller
Sent: Tuesday, November 21, 2023 1:07 PM
To: Danny Tsen
Cc: nettle-bugs@lists.lysator.liu.se ; George
Wilson
Subject: [EXTERNAL] Re: Fw: ppc64: AES/GCM Performa
Danny Tsen writes:
> This patch provides a performance improvement over AES/GCM with stitched
> implementation for ppc64. The code is a wrapper in assembly to handle
> multiple 8
> blocks and handle big and little endian.
>
> The overall improvement is based on the nettle-benchmark with ~80%
>
To Whom It May Concern,
This patch provides a performance improvement over AES/GCM with stitched
implementation for ppc64. The code is a wrapper in assembly to handle multiple
8 blocks and handle big and little endian.
The overall improvement is based on the nettle-benchmark with ~80% impro
To Whom It May Concern,
This patch provides a performance improvement over AES/GCM with stitched
implementation for ppc64. The code is a wrapper in assembly to handle multiple
8 blocks and handle big and little endian.
The overall improvement is based on th
10 matches
Mail list logo