Re: ppc64 micro optimization

2024-03-05 Thread Danny Tsen
Hi Niels, My fault. I did not include the gym-aes-crypt.c in the patch. Here is the updated patch. Please apply this one and we can work from there. Thanks. -Danny > On Mar 5, 2024, at 1:08 PM, Niels Möller wrote: > > Danny Tsen writes: > >> Please let me know when

RE: ppc64 micro optimization

2024-02-26 Thread Danny Tsen
Hi Niels, Please let me know when you merge the code and we can work from there. Thanks. -Danny From: Niels Möller Sent: Friday, February 23, 2024 1:07 AM To: Danny Tsen Cc: nettle-bugs@lists.lysator.liu.se ; George Wilson Subject: [EXTERNAL] Re: ppc64 micro

Re: ppc64 micro optimization

2024-02-20 Thread Danny Tsen
Hi Niels, Here is the v5 patch from your comments. Please review. Thanks. -Danny > On Feb 14, 2024, at 8:46 AM, Niels Möller wrote: > > Danny Tsen writes: > >> Here is the new patch v4 for AES/GCM stitched implementation and >> benchmark based on the current repo.

Re: ppc64 micro optimization

2024-02-03 Thread Danny Tsen
Hi Niels, Here is the new patch v4 for AES/GCM stitched implementation and benchmark based on the current repo. Thanks. -Danny > On Jan 31, 2024, at 4:35 AM, Niels Möller wrote: > > Niels Möller writes: > >> While the powerpc64 vncipher instruction really wants the original >> subkeys, not

RE: ppc64 micro optimization

2024-01-24 Thread Danny Tsen
, January 25, 2024 3:58 AM To: Danny Tsen Cc: nettle-bugs@lists.lysator.liu.se ; George Wilson Subject: [EXTERNAL] Re: ppc64 micro optimization Danny Tsen writes: > Thanks for merging the stitched implementation for PPC64 with your > detailed information and efforts We're not quite the

Re: ppc64 micro optimization

2024-01-22 Thread Danny Tsen
Hi Niels, Thanks for merging the stitched implementation for PPC64 with your detailed information and efforts Thanks. -Danny > On Jan 21, 2024, at 11:27 PM, Niels Möller wrote: > > In preparing for merging the gcm-aes "stitched" implementation, I'm > reviewing the existing ghash code. WIP

Re: ppc64: v3: AES/GCM Performance improvement with stitched implementation

2023-12-18 Thread Danny Tsen
, at 9:01 AM, Danny Tsen wrote: On Dec 11, 2023, at 10:32 AM, Niels Möller wrote: Danny Tsen writes: Here is the version 2 for AES/GCM stitched patch. The stitched code is in all assembly and m4 macros are used. The overall performance improved around ~110% and 120% for encrypt and decrypt

Re: ppc64: v2, AES/GCM Performance improvement with stitched implementation

2023-12-18 Thread Danny Tsen
Correction, 719 bytes test vector. On Dec 12, 2023, at 9:01 AM, Danny Tsen wrote: On Dec 11, 2023, at 10:32 AM, Niels Möller wrote: Danny Tsen writes: Here is the version 2 for AES/GCM stitched patch. The stitched code is in all assembly and m4 macros are used. The overall performance

Re: ppc64: v2, AES/GCM Performance improvement with stitched implementation

2023-12-12 Thread Danny Tsen
> On Dec 11, 2023, at 10:32 AM, Niels Möller wrote: > > Danny Tsen writes: > >> Here is the version 2 for AES/GCM stitched patch. The stitched code is >> in all assembly and m4 macros are used. The overall performance >> improved around ~110% and 120% for e

Re: ppc64: v2, AES/GCM Performance improvement with stitched implementation

2023-12-07 Thread Danny Tsen
Nov 22, 2023, at 2:27 AM, Niels Möller wrote: > > Danny Tsen writes: > >> Interleaving at the instructions level may be a good option but due to >> PPC instruction pipeline this may need to have sufficient >> registers/vectors. Use same vectors to change contents in

Re: ppc64: AES/GCM Performance improvement with stitched implementation

2023-11-22 Thread Danny Tsen
> On Nov 22, 2023, at 2:27 AM, Niels Möller wrote: > > Danny Tsen writes: > >> Interleaving at the instructions level may be a good option but due to >> PPC instruction pipeline this may need to have sufficient >> registers/vectors. Use same vectors to

Re: ppc64: AES/GCM Performance improvement with stitched implementation

2023-11-21 Thread Danny Tsen
Hi Niels, More comments. Please see inline. > On Nov 21, 2023, at 1:46 PM, Danny Tsen wrote: > > Hi Niels, > > Thanks for the quick response. > > I'll think more thru your comments here and it may take some more time to get > an update. And just a quick answe

RE: Fw: ppc64: AES/GCM Performance improvement with stitched implementation

2023-11-21 Thread Danny Tsen
_ghash_update. But I guess I can use m4 macro instead. Thanks. -Danny From: Niels Möller Sent: Tuesday, November 21, 2023 1:07 PM To: Danny Tsen Cc: nettle-bugs@lists.lysator.liu.se ; George Wilson Subject: [EXTERNAL] Re: Fw: ppc64: AES/GCM Performance improvement

Fw: ppc64: AES/GCM Performance improvement with stitched implementation

2023-11-21 Thread Danny Tsen
To Whom It May Concern, This patch provides a performance improvement over AES/GCM with stitched implementation for ppc64. The code is a wrapper in assembly to handle multiple 8 blocks and handle big and little endian. The overall improvement is based on the nettle-benchmark with ~80%

Fw: ppc64: AES/GCM Performance improvement with stitched implementation

2023-11-20 Thread Danny Tsen
To Whom It May Concern, This patch provides a performance improvement over AES/GCM with stitched implementation for ppc64. The code is a wrapper in assembly to handle multiple 8 blocks and handle big and little endian. The overall improvement is based on