Re: Fwd: [PowerPC] GCM optimization

2020-12-01 Thread George Wilson
On Tue, Dec 01, 2020 at 07:55:05PM +0200, Maamoun TK wrote: > Hi George, > I'll start writing a white paper called "Optimizing Galois-Counter-Mode on > PowerPC Architecture Processors". Once I finish the first draft I'll send > it to Neils to review it. > > > > What do you need from the IBM

Re: Fwd: [PowerPC] GCM optimization

2020-12-01 Thread Maamoun TK
Hi George, I'll start writing a white paper called "Optimizing Galois-Counter-Mode on PowerPC Architecture Processors". Once I finish the first draft I'll send it to Neils to review it. > What do you need from the IBM side? I may be able to help. We'd > definitely > like to support you and

Re: Fwd: [PowerPC] GCM optimization

2020-11-30 Thread George Wilson
On Thu, Nov 12, 2020 at 07:45:14PM +0200, Maamoun TK wrote: > -- Forwarded message - > From: Maamoun TK > Date: Thu, Nov 12, 2020 at 7:42 PM > Subject: Re: [PowerPC] GCM optimization > To: Niels Möller > > > On Thu, Nov 12, 2020 at 6:40 PM Niels Möll

Re: [PowerPC] GCM optimization

2020-11-28 Thread Niels Möller
Maamoun TK writes: > On Wed, Nov 25, 2020 at 10:13 PM Maamoun TK > wrote: > >> I'll make a pull request for fat build support. The gcm code is now merged to the master branch. Thanks! Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is

Re: [PowerPC] GCM optimization

2020-11-28 Thread Niels Möller
Maamoun TK writes: > But if HAVE_NATIVE_gcm_init_key and HAVE_NATIVE_gcm_hash are not defined, > there are no definitions for _nettle_gcm_init_key() and _nettle_gcm_hash() > respectively. Maybe it doesn't yield a warning or error because it's ok for > the compiler to have a prototype declaration

Re: [PowerPC] GCM optimization

2020-11-27 Thread Maamoun TK
On Fri, Nov 27, 2020 at 8:13 PM Niels Möller wrote: > I wonder if gcm-internal.h can be cut down a bit, to > > /* Functions available only in some configurations */ > void > _nettle_gcm_init_key (union nettle_block16 *table); > > void > _nettle_gcm_hash(const struct gcm_key *key, union

Re: [PowerPC] GCM optimization

2020-11-27 Thread Niels Möller
Maamoun TK writes: > I made a pull request in the repository. Merged, thanks! I wonder if gcm-internal.h can be cut down a bit, to /* Functions available only in some configurations */ void _nettle_gcm_init_key (union nettle_block16 *table); void _nettle_gcm_hash(const struct

Re: [PowerPC] GCM optimization

2020-11-27 Thread Maamoun TK
I made a pull request in the repository. regards, Mamone On Thu, Nov 26, 2020 at 11:41 PM Niels Möller wrote: > Maamoun TK writes: > > > To suppress these warnings we need to declare a prototype for > > _nettle_gcm_init_key() and _nettle_gcm_hash() if > "HAVE_NATIVE_gcm_init_key" > > and

Re: [PowerPC] GCM optimization

2020-11-26 Thread Niels Möller
Maamoun TK writes: > To suppress these warnings we need to declare a prototype for > _nettle_gcm_init_key() and _nettle_gcm_hash() if "HAVE_NATIVE_gcm_init_key" > and "HAVE_NATIVE_gcm_hash" are defined respectively. Could be fixed in the new gcm-internal.h file. (I don't quite like that it

Re: [PowerPC] GCM optimization

2020-11-26 Thread Maamoun TK
Great. It works on PowerPC with configure options "./configure", "./configure --enable-power-crypto-ext", and "./configure --enable-fat" and get the expected results. However, there are two warning popped up when configured with --enable-power-crypto-ext gcm.c: In function ‘nettle_gcm_set_key’:

Re: [PowerPC] GCM optimization

2020-11-26 Thread Niels Möller
Niels Möller writes: > Maamoun TK writes: > >>> I'll make a pull request for fat build support. >>> >> >> Done! > > I added two comments on the merge request. I reorganized the ifdefs a bit more, and pushed to the ppc-gcm branch. Tested on gcc112. Please try it out. Regards, /Niels --

Re: [PowerPC] GCM optimization

2020-11-25 Thread Niels Möller
Maamoun TK writes: >> I'll make a pull request for fat build support. >> > > Done! I added two comments on the merge request. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance.

Re: [PowerPC] GCM optimization

2020-11-25 Thread Maamoun TK
On Wed, Nov 25, 2020 at 10:13 PM Maamoun TK wrote: > I'll make a pull request for fat build support. > Done! ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Re: [PowerPC] GCM optimization

2020-11-25 Thread Maamoun TK
On Wed, Nov 25, 2020 at 9:21 PM Niels Möller wrote: It remains to wire it up for fat-ppc.c. Anything else that is missing? > No, I'll make a pull request for fat build support. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se

Re: [PowerPC] GCM optimization

2020-11-25 Thread Niels Möller
Niels Möller writes: > Maamoun TK writes: > >> Sure. I updated the pull request. > > Thanks. Merged (first time I try the merge button on gitlab). It remains to wire it up for fat-ppc.c. Anything else that is missing? Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid

Re: [PowerPC] GCM optimization

2020-11-25 Thread Niels Möller
Maamoun TK writes: > Sure. I updated the pull request. Thanks. Merged (first time I try the merge button on gitlab). > Yes, it makes sense to avoid unaligned loads in the main loop by checking > low-order bits of address, but still I can't imagine it would be more > simple in this case.

Re: [PowerPC] GCM optimization

2020-11-25 Thread Maamoun TK
On Wed, Nov 25, 2020 at 10:15 AM Niels Möller wrote: > Maamoun TK writes: > > Let's leave that as is, then. Do you want to make another pull request > with only the fixes for register usage? > Sure. I updated the pull request. > I was thinking of something similar to how the unaligned input

Re: [PowerPC] GCM optimization

2020-11-25 Thread Niels Möller
Maamoun TK writes: > I'm not aware of a simple way to accomplish either approaches on POWER8, I > recommend to use allocated stack buffer Let's leave that as is, then. Do you want to make another pull request with only the fixes for register usage? > to assist handling leftovers rather >

Re: [PowerPC] GCM optimization

2020-11-24 Thread Maamoun TK
I'm not aware of a simple way to accomplish either approaches on POWER8, I recommend to use allocated stack buffer to assist handling leftovers rather than making it complicated or we can use POWER9 specific instruction 'lxvll' which can used to load vector with length passed to general register

Re: [PowerPC] GCM optimization

2020-11-22 Thread Niels Möller
Maamoun TK writes: > It generates a mask compatible with the length of leftovers, for example if > the length is 1 then the mask generated is > 0xFF00 then the mask is ANDed with the vector > register of leftovers to clear the extra unneeded bytes. It's not exactly >

Re: [PowerPC] GCM optimization

2020-11-22 Thread Maamoun TK
020 at 7:42 PM > > Subject: Re: [PowerPC] GCM optimization > > To: Niels Möller > > > > On Thu, Nov 12, 2020 at 6:40 PM Niels Möller > wrote: > > > > > I gave it a test run on gcc112 in the gcc compile farm, and speedup of > > > gcm update seems to

Re: [PowerPC] GCM optimization

2020-11-22 Thread Jeffrey Walton
On Fri, Nov 20, 2020 at 3:39 PM Maamoun TK wrote: > > -- Forwarded message - > From: Maamoun TK > Date: Thu, Nov 12, 2020 at 7:42 PM > Subject: Re: [PowerPC] GCM optimization > To: Niels Möller > > On Thu, Nov 12, 2020 at 6:40 PM Niels Möller wrote: &g

Re: [PowerPC] GCM optimization

2020-11-22 Thread Maamoun TK
On Sat, Nov 21, 2020 at 5:32 PM Niels Möller wrote: > Is this a loop, part of a > loop, or is there some vector load instruction that lets you pass a byte > length? > It generates a mask compatible with the length of leftovers, for example if the length is 1 then the mask generated is

Re: [PowerPC] GCM optimization

2020-11-21 Thread Niels Möller
Maamoun TK writes: > For the first approach I can think of this method: > lxvd2x VSR(C0),0,DATA > IF_LE(` > vperm C0,C0,C0,LE_MASK > ') > slwiLENGTH,LENGTH,4 (Shift left 4 bitls because vsro get > bit[121:124]) > vspltisbv10,-1 > (0x) >

Re: [PowerPC] GCM optimization

2020-11-20 Thread Maamoun TK
Another patch for register defines. I apologize for that. regards, Mamone On Tue, Nov 17, 2020 at 11:10 PM Maamoun TK wrote: > I replaced the method of using the stack to handle the leftovers with the > first approach, also I changed some vector registers in the defines because > I defined

Re: [PowerPC] GCM optimization

2020-11-20 Thread Maamoun TK
I replaced the method of using the stack to handle the leftovers with the first approach, also I changed some vector registers in the defines because I defined `LE_MASK' in a non-volatile register which is not always preserved. This patch is built on the top ppc-gcm branch. regards, Mamone On

Re: [PowerPC] GCM optimization

2020-11-20 Thread Maamoun TK
The result of benchmark this implementation on POWER9. |*| | vs. lookup table based C implementation | vs. hardware acceleration using Intel optimization documents

Re: [PowerPC] GCM optimization

2020-11-20 Thread Niels Möller
Maamoun TK writes: > +Lmod: > +C --- process the modulo bytes, padding the low-order bytes with zeros > --- > + > +cmpldi LENGTH,0 > +beqLdone > + > +C load table elements > +li r8,1*TableElemAlign > +lxvd2x VSR(H1M),0,TABLE > +

Re: [PowerPC] GCM optimization

2020-11-20 Thread Maamoun TK
For the first approach I can think of this method: lxvd2x VSR(C0),0,DATA IF_LE(` vperm C0,C0,C0,LE_MASK ') slwiLENGTH,LENGTH,4 (Shift left 4 bitls because vsro get bit[121:124]) vspltisbv10,-1 (0x) mtvrwz v11,LENGTH

Re: [PowerPC] GCM optimization

2020-11-20 Thread Maamoun TK
I reuploaded the patch as attachment since it didn't apply due to email line breaks, I also fixed the gcm table alignment issue, Thanks to Niels Möller. On Tue, Nov 10, 2020 at 6:25 AM Maamoun TK wrote: > This implementation takes advantage of research made by Niels Möller to > optimize GCM on

Re: [PowerPC] GCM optimization

2020-11-20 Thread Niels Möller
Maamoun TK writes: > This implementation takes advantage of research made by Niels Möller to > optimize GCM on PowerPC, this optimization yields a +27.7% performance > boost on POWER8 over the previous implementation that was based on intel > documents. The performance comparison is made by

Fwd: [PowerPC] GCM optimization

2020-11-20 Thread Maamoun TK
-- Forwarded message - From: Maamoun TK Date: Thu, Nov 12, 2020 at 7:42 PM Subject: Re: [PowerPC] GCM optimization To: Niels Möller On Thu, Nov 12, 2020 at 6:40 PM Niels Möller wrote: > I gave it a test run on gcc112 in the gcc compile farm, and speedup of > gcm update

Re: [PowerPC] GCM optimization

2020-11-11 Thread George Wilson
On Wed, Nov 11, 2020 at 02:17:41AM +0200, Maamoun TK wrote: > I think I mislabeled the percentage of performance comparison, the new > method achieved 27.7% reduction in time on POWER8 that corresponds to 37.9% > increase in performance. Hi Maamoun, Many thanks to you and Niels. We plan to test

Re: [PowerPC] GCM optimization

2020-11-10 Thread Maamoun TK
I think I mislabeled the percentage of performance comparison, the new method achieved 27.7% reduction in time on POWER8 that corresponds to 37.9% increase in performance. On Tue, Nov 10, 2020 at 6:25 AM Maamoun TK wrote: > This implementation takes advantage of research made by Niels Möller to

[PowerPC] GCM optimization

2020-11-09 Thread Maamoun TK
This implementation takes advantage of research made by Niels Möller to optimize GCM on PowerPC, this optimization yields a +27.7% performance boost on POWER8 over the previous implementation that was based on intel documents. The performance comparison is made by processing 4 blocks per loop