Fw: ppc64: AES/GCM Performance improvement with stitched implementation

2023-11-20 Thread Danny Tsen
To Whom It May Concern, This patch provides a performance improvement over AES/GCM with stitched implementation for ppc64. The code is a wrapper in assembly to handle multiple 8 blocks and handle big and little endian. The overall improvement is based on th

Fw: ppc64: AES/GCM Performance improvement with stitched implementation

2023-11-21 Thread Danny Tsen
To Whom It May Concern, This patch provides a performance improvement over AES/GCM with stitched implementation for ppc64. The code is a wrapper in assembly to handle multiple 8 blocks and handle big and little endian. The overall improvement is based on the nettle-benchmark with ~80% impro

Re: ppc64: AES/GCM Performance improvement with stitched implementation

2023-11-21 Thread Danny Tsen
> From: Niels Möller > Sent: Tuesday, November 21, 2023 1:07 PM > To: Danny Tsen > Cc: nettle-bugs@lists.lysator.liu.se ; > George Wilson > Subject: [EXTERNAL] Re: Fw: ppc64: AES/GCM Performance improvement with > stitched im

Re: ppc64: AES/GCM Performance improvement with stitched implementation

2023-11-22 Thread Niels Möller
Danny Tsen writes: > Interleaving at the instructions level may be a good option but due to > PPC instruction pipeline this may need to have sufficient > registers/vectors. Use same vectors to change contents in successive > instructions may require more cycles. In that case, more > vectors/scala

Re: ppc64: AES/GCM Performance improvement with stitched implementation

2023-11-22 Thread Danny Tsen
> On Nov 22, 2023, at 2:27 AM, Niels Möller wrote: > > Danny Tsen writes: > >> Interleaving at the instructions level may be a good option but due to >> PPC instruction pipeline this may need to have sufficient >> registers/vectors. Use same vectors to change contents in successive >> instruc

Re: ppc64: AES/GCM Performance improvement with stitched implementation

2023-11-22 Thread David Edelsohn
On Wed, Nov 22, 2023 at 10:37 AM Danny Tsen wrote: > > > > On Nov 22, 2023, at 2:27 AM, Niels Möller wrote: > > > > Danny Tsen writes: > > > >> Interleaving at the instructions level may be a good option but due to > >> PPC instruction pipeline this may need to have sufficient > >> registers/ve

Re: ppc64: AES/GCM Performance improvement with stitched implementation

2023-11-22 Thread Niels Möller
David Edelsohn writes: > Calls impose a lot of overhead on Power. Thanks, that's good to know. > And both the efficient loop instruction and the preferred indirect call > instruction use the CTR register. That's one thing I wonder after having a closer look at the AES loops. One rather common

Re: ppc64: AES/GCM Performance improvement with stitched implementation

2023-11-22 Thread David Edelsohn
On Wed, Nov 22, 2023 at 1:50 PM Niels Möller wrote: > David Edelsohn writes: > > > Calls impose a lot of overhead on Power. > > Thanks, that's good to know. > > > And both the efficient loop instruction and the preferred indirect call > > instruction use the CTR register. > > That's one thing I

Re: Fw: ppc64: AES/GCM Performance improvement with stitched implementation

2023-11-21 Thread Niels Möller
Danny Tsen writes: > This patch provides a performance improvement over AES/GCM with stitched > implementation for ppc64. The code is a wrapper in assembly to handle > multiple 8 > blocks and handle big and little endian. > > The overall improvement is based on the nettle-benchmark with ~80% >

RE: Fw: ppc64: AES/GCM Performance improvement with stitched implementation

2023-11-21 Thread Danny Tsen
de dosen't call _ghash_update. But I guess I can use m4 macro instead. Thanks. -Danny From: Niels Möller Sent: Tuesday, November 21, 2023 1:07 PM To: Danny Tsen Cc: nettle-bugs@lists.lysator.liu.se ; George Wilson Subject: [EXTERNAL] Re: Fw: ppc64: AES/GCM Performa