[AArch64] Optimize GHASH

2020-12-14 Thread Maamoun TK
I made a merge request in the main repo that enables optimized GHASH on AArch64 architecture. The implementation is based on Niels Möller's enhanced algorithm which yields more speedup on AArch64 arch in comparison with intel algorithm. Using the Karatsuba algorithm with Intel algorithm yielded an

Re: [AArch64] Optimize GHASH

2020-12-14 Thread Maamoun TK
I forgot to mention that I made the benchmark test on gcc17 in GCC Farm. regards, Mamone On Tue, Dec 15, 2020 at 12:12 AM Maamoun TK wrote: > I made a merge request in the main repo that enables optimized GHASH on > AArch64 architecture. The implementation is based on Niels Möller's > enhanced

Re: [AArch64] Optimize GHASH

2020-12-17 Thread Niels Möller
Maamoun TK writes: > I made a merge request in the main repo that enables optimized GHASH on > AArch64 architecture. Nice! I've had a quick first look. For the organization, I think aarch64 assembly should go in it's own directory, arm64/, like it's done for x86 and sparc. I wonder which assemb

Re: [AArch64] Optimize GHASH

2020-12-17 Thread Maamoun TK
> > I wonder which assembly files we should use if target host is aarch64, > but ABI=32? I guess the arm/v6/ code can be used unconditionally. Can > we also use arm/neon/ code unconditionally? > It seems gcc for aarch64 doesn't support building 32-bit binaries, maybe we should remove the check of

Re: [AArch64] Optimize GHASH

2020-12-18 Thread Maamoun TK
I created a couple of merge requests in the repo, with those MRs merged I think the powerpc code is stable to be included in the upcoming version of nettle. regards, Mamone On Thu, Dec 17, 2020 at 12:28 PM Maamoun TK wrote: > I wonder which assembly files we should use if target host is aarch64

Re: [AArch64] Optimize GHASH

2020-12-18 Thread Niels Möller
Maamoun TK writes: > It seems gcc for aarch64 doesn't support building 32-bit binaries, maybe we > should remove the check of ABI since 64-bit is the only option. Ok, that's a bit confusing. There's a command line flag for it, not -m32 but -mabi=ilp32, but that doesn't work out of the box with m

Re: [AArch64] Optimize GHASH

2020-12-19 Thread Niels Möller
Maamoun TK writes: > I created a couple of merge requests in the repo, with those MRs merged I > think the powerpc code is stable to be included in the upcoming version of > nettle. Thanks. I've merged the "Use 32-bit offset to load data". For the other one, https://git.lysator.liu.se/nettle/n

Re: [AArch64] Optimize GHASH

2020-12-19 Thread Jeffrey Walton
On Fri, Dec 18, 2020 at 11:31 AM Niels Möller wrote: > > Maamoun TK writes: > > > It seems gcc for aarch64 doesn't support building 32-bit binaries, maybe we > > should remove the check of ABI since 64-bit is the only option. > > Ok, that's a bit confusing. There's a command line flag for it, not

Re: [AArch64] Optimize GHASH

2020-12-19 Thread Maamoun TK
On Sat, Dec 19, 2020 at 11:27 AM Niels Möller wrote: > For the other one, > https://git.lysator.liu.se/nettle/nettle/-/merge_requests/15 "Use signal > to detect CPU features when getauxval() isn't available", can you > explain for which systems is that needed? In the current code, you > handle gn

Re: [AArch64] Optimize GHASH

2020-12-19 Thread Niels Möller
Maamoun TK writes: > fat-ppc.c uses getauxval() function to detect cpu features for Linux > systems, the problem is that getauxval was introduced in glibc v2.16 which > released in 2012 so in case fat option enabled, the build will fail for > older glibc versions. I agree it's not so nice that t

Re: [AArch64] Optimize GHASH

2020-12-20 Thread Maamoun TK
On Sat, Dec 19, 2020 at 9:05 PM Niels Möller wrote: > Do you have any idea how common such old systems might be? > I don't have a specific number but I think using that old versions of glibc is uncommon specially for POWER8 and above processors considering those versions are more than 8 years ol

Re: [AArch64] Optimize GHASH

2020-12-20 Thread David Edelsohn
On Sun, Dec 20, 2020 at 12:14 PM Maamoun TK wrote: > > On Sat, Dec 19, 2020 at 9:05 PM Niels Möller wrote: > > > Do you have any idea how common such old systems might be? > > > > I don't have a specific number but I think using that old versions of glibc > is uncommon specially for POWER8 and ab

Re: [AArch64] Optimize GHASH

2020-12-20 Thread Niels Möller
Maamoun TK writes: >> Some preprocessor check of glibc version in fat-ppc.c could work too, if >> that's simpler. >> > > That's what I ended up with, I made a new merge request for these changes > and closed the old one. Thanks, looks pretty good. I added a few minor comments on the mr (https://

Re: [AArch64] Optimize GHASH

2020-12-21 Thread Maamoun TK
On Mon, Dec 21, 2020 at 9:29 AM Niels Möller wrote: > Thanks, looks pretty good. I added a few minor comments on the mr > (https://git.lysator.liu.se/nettle/nettle/-/merge_requests/16 for > reference). > Thank you, I made a commit with the changes. regards, Mamone __

Re: [AArch64] Optimize GHASH

2020-12-21 Thread Niels Möller
Maamoun TK writes: > Thank you, I made a commit with the changes. Thanks! Merged now. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ nettle-bugs

Re: [AArch64] Optimize GHASH

2021-01-05 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > Maamoun TK writes: > >> I made a merge request in the main repo that enables optimized GHASH on >> AArch64 architecture. > > Nice! I've had a quick first look. For the organization, I think aarch64 > assembly should go in it's own directory, arm64/, l

Re: [AArch64] Optimize GHASH

2021-01-05 Thread Jeffrey Walton
On Tue, Jan 5, 2021 at 8:23 AM Niels Möller wrote: > > ni...@lysator.liu.se (Niels Möller) writes: > > > ... > The reference manual says > > Armv8 can support the following levels of support for Advanced SIMD and > floating-point instructions: > > *Full SIMD and floating-point support without

Re: [AArch64] Optimize GHASH

2021-01-05 Thread Maamoun TK
On Tue, Jan 5, 2021 at 3:23 PM Niels Möller wrote: > I've made a new branch "arm64" with the configure changes. If you think > that looks ok, can you add your new ghash code on top of that? > Great. I'll add the ghash code to the branch once I finish the big-endian support. > (It would be good

Re: [AArch64] Optimize GHASH

2021-01-05 Thread Michael Weiser
Hello Maamoun, On Tue, Jan 05, 2021 at 05:52:35PM +0200, Maamoun TK wrote: > > I've made a new branch "arm64" with the configure changes. If you think > > that looks ok, can you add your new ghash code on top of that? > Great. I'll add the ghash code to the branch once I finish the big-endian > s

Re: [AArch64] Optimize GHASH

2021-01-05 Thread Maamoun TK
Thank you, I will keep you updated about progress of big-endian support for GHASH on arm64 arch so we can test the patch on real device before sending it to Niels. regards, Mamone On Tue, Jan 5, 2021 at 8:00 PM Michael Weiser wrote: > Hello Maamoun, > > On Tue, Jan 05, 2021 at 05:52:35PM +0200,

Re: [AArch64] Optimize GHASH

2021-01-10 Thread Michael Weiser
Hello Maamoun, On Tue, Jan 05, 2021 at 09:04:59PM +0200, Maamoun TK wrote: > Thank you, I will keep you updated about progress of big-endian support for > GHASH on arm64 arch so we can test the patch on real device before sending > it to Niels. I've added aarch64_be buildroot toolchain container

Re: [AArch64] Optimize GHASH

2021-01-11 Thread Maamoun TK
I have tuned the ghash patch to support big-endian mode but I'm really having difficulties testing it out through emulating, I'll attach the patch here so you can test it but I'm not sure how I can fix the bugs on big-endian system if any, you can feel free to send debugging info or setup a remote

Re: [AArch64] Optimize GHASH

2021-01-13 Thread Michael Weiser
Hello Mamone, On Mon, Jan 11, 2021 at 11:39:43PM +0200, Maamoun TK wrote: > I have tuned the ghash patch to support big-endian mode but I'm really > having difficulties testing it out through emulating, I'll attach the patch > here so you can test it but I'm not sure how I can fix the bugs on > b

Re: [AArch64] Optimize GHASH

2021-01-18 Thread Maamoun TK
Hi Michael, On Wed, Jan 13, 2021 at 8:00 PM Michael Weiser wrote: > Out of curiosity as I can't seem to find the beginning of the > discussion: Is there anyone but me with an actual use-case for > big-endian arm64 here? If not, I'd hate to cause a lot of effort for you > and would certainly put

Re: [AArch64] Optimize GHASH

2021-01-19 Thread Michael Weiser
Hello Mamone, On Mon, Jan 18, 2021 at 06:27:40PM +0200, Maamoun TK wrote: > It would be nice to get the implementation of the enhanced algorithm > working for both endian modes as it yields a good performance boost. Also, > there is no much effort here, the only thing I'm struggling with is to ge

Re: [AArch64] Optimize GHASH

2021-01-20 Thread Maamoun TK
Hello Michael, On Tue, Jan 19, 2021 at 11:45 PM Michael Weiser wrote: > Yes, there are no packages for aarch64_be in any mainstream distribution > I'm aware of. Buildroot and Gentoo are the ones I know that can target > it, Yocto likely as well. All are compile-yourself-distributions and not > f

Re: [AArch64] Optimize GHASH

2021-01-21 Thread Maamoun TK
On Tue, Jan 5, 2021 at 5:52 PM Maamoun TK wrote: > On Tue, Jan 5, 2021 at 3:23 PM Niels Möller wrote: > >> > I wonder which assembly files we should use if target host is aarch64, >> > but ABI=32? I guess the arm/v6/ code can be used unconditionally. Can >> > we also use arm/neon/ code unconditi

Re: [AArch64] Optimize GHASH

2021-01-21 Thread Michael Weiser
Hello Mamone, On Wed, Jan 20, 2021 at 10:25:19PM +0200, Maamoun TK wrote: > I'm trying to install Gentoo on VMware by walking through this receip > https://medium.com/@steensply/vmware-installation-of-gentoo-linux-from-scratch-on-an-encrypted-partition-9e4665f638e2 > I'm in the middle of receip n

Re: [AArch64] Optimize GHASH

2021-01-22 Thread Maamoun TK
On Fri, Jan 22, 2021 at 1:45 AM Michael Weiser wrote: > Longer story: ldr does a 128bit load. This loads bytes in exactly > reverse order into the register on LE and BE. As you describe above, the > macros for LE expect a state which is neither of those: The bytes > transposed as if BE but the do

Re: [AArch64] Optimize GHASH

2021-01-22 Thread Maamoun TK
On Fri, Jan 22, 2021 at 1:45 AM Michael Weiser wrote: > Do you think it makes sense to try and adjust the code to work with the > BE layout natively and have a full 128bit reverse after ldr-like loads > on LE instead (considering that 99,999% of aarch64 users will run LE)? > If you don't have a

Re: [AArch64] Optimize GHASH

2021-01-22 Thread Jeffrey Walton
On Fri, Jan 22, 2021 at 5:48 PM Maamoun TK wrote: > > On Fri, Jan 22, 2021 at 1:45 AM Michael Weiser > wrote: > > > Do you think it makes sense to try and adjust the code to work with the > > BE layout natively and have a full 128bit reverse after ldr-like loads > > on LE instead (considering tha

Re: [AArch64] Optimize GHASH

2021-01-22 Thread Michael Weiser
Hello Mamone, On Fri, Jan 22, 2021 at 10:14:36PM +0200, Maamoun TK wrote: > > The difference in index in dup EMSB nicely shows the doubleword > > transposition compared to LE. If on LE the dup was done after the rev64, > > it'd be H.b[7] vs. H.b[15]. > I see what you did here, but I'm confused ab

Re: [AArch64] Optimize GHASH

2021-01-23 Thread Michael Weiser
Hi Mamone, Jeff, sorry for the duplication, used the wrong sender address for the list again. On Fri, Jan 22, 2021 at 07:07:46PM -0500, Jeffrey Walton wrote: > > > Do you think it makes sense to try and adjust the code to work with the > > > BE layout natively and have a full 128bit reverse afte

Re: [AArch64] Optimize GHASH

2021-01-23 Thread Maamoun TK
Hello Michael, On Sat, Jan 23, 2021 at 2:45 AM Michael Weiser wrote: > I've just retested and reread some ARM documents. Here's a patch that > uses ld1.16b and thus eliminates almost all special BE treatment but > subsequently has to leave in all the rev64s as well. This has the > testsuite pass

Re: [AArch64] Optimize GHASH

2021-01-24 Thread Michael Weiser
Hello Mamone, On Sat, Jan 23, 2021 at 08:52:30PM +0200, Maamoun TK wrote: > > @@ -280,9 +266,9 @@ L1x: > > tstLENGTH,#-16 > > b.eq Lmod > > > > -ld1{H1M.16b,H1L.16b},[TABLE] > > +ld1{H1M.2d,H1L.2d},[TABLE] > > > > -ld1

Re: [AArch64] Optimize GHASH

2021-01-24 Thread Maamoun TK
Hello Michael, On Sun, Jan 24, 2021 at 3:15 PM Michael Weiser wrote: > I think there might be a misunderstanding here (possibly caused by > my attemps at explaining what ldr does, sorry): > > On arm(32) and aarch64, endianness is also exclusively handled on > load and store operations. Register

Re: [AArch64] Optimize GHASH

2021-01-24 Thread Niels Möller
Maamoun TK writes: > subkey 'H' value is calculated by enciphering (usually using AES) a > sequence of ZERO data, then gcm_set_key() assign the calculated value > (subkey 'H') at the middle of TABLE array, that is TABLE[80], And the reason for it being stored in the *middle* is the "unnatural" g

Re: [AArch64] Optimize GHASH

2021-01-25 Thread Michael Weiser
Hello Mamone, On Sun, Jan 24, 2021 at 06:44:33PM +0200, Maamoun TK wrote: > > representation. As for arm and aarch64, little-endian is the default, do > > you think, the routine could be changed to move the special endianness > > treatment using rev64 to BE mode, i.e. avoid them in the standard L

Re: [AArch64] Optimize GHASH

2021-01-26 Thread Maamoun TK
Hello Michael, On Mon, Jan 25, 2021 at 8:45 PM Michael Weiser wrote: > Attached are the current > patches, the first being your original. What do you think? > I liked how the patch ended up so far, just give me one or two days to give the patch additional review before letting it up to Neils.

Re: [AArch64] Optimize GHASH

2021-01-26 Thread Michael Weiser
Hello Mamone, On Tue, Jan 26, 2021 at 07:15:22PM +0200, Maamoun TK wrote: > > Attached are the current > > patches, the first being your original. What do you think? > I liked how the patch ended up so far, just give me one or two days to give > the patch additional review before letting it up to

Re: [AArch64] Optimize GHASH

2021-01-26 Thread Niels Möller
Maamoun TK writes: > Are you looking for removing rev64s on LE? If so, I don't think we can > figure a variant that allows us continue working on an unsorted register > value on LE as pmull requires the input to be sorted properly, that is > transposed doublewords. I haven't been following along

Re: [AArch64] Optimize GHASH

2021-01-30 Thread Maamoun TK
On Mon, Jan 25, 2021 at 8:45 PM Michael Weiser wrote: > Attached are the current > patches. > Everything looks fine to me, I made an additional review and the code seems good for both endianness modes. The patches pass the testsuite on little-endian and big-endian (Thanks to Michael Weiser for p

Re: [AArch64] Optimize GHASH

2021-01-30 Thread Maamoun TK
On Wed, Jan 27, 2021 at 12:45 AM Michael Weiser wrote: > I've caused enough effort with my little > hobby of running an ARM BE system for now. :) > Thank you for the great work, we're now able to run the optimized gcm core on big-endian arm64 systems. I enjoyed working with you in order to get t

Re: [AArch64] Optimize GHASH

2021-01-30 Thread Niels Möller
Maamoun TK writes: > Everything looks fine to me, I made an additional review and the code seems > good for both endianness modes. > The patches pass the testsuite on little-endian and big-endian (Thanks > to Michael Weiser for providing a ready to go environment to test the patch > on big-endian

Re: [AArch64] Optimize GHASH

2021-01-30 Thread Maamoun TK
On Sat, Jan 30, 2021 at 6:07 PM Niels Möller wrote: > Is 0001-Mamone-s-unmodified-patch.patch the same as > https://git.lysator.liu.se/nettle/nettle/-/merge_requests/13? Do you > want to update the merge request with recent changes (on top of the > current arm64 branch), or should I merge mr13 as

Re: [AArch64] Optimize GHASH

2021-01-30 Thread Maamoun TK
This is a new patch to fix the clang build if "armv8-a-crypto" is enabled and should be applied on top of the previous patches. regards, Mamone On Sun, Jan 31, 2021 at 1:17 AM Maamoun TK wrote: > On Sat, Jan 30, 2021 at 6:07 PM Niels Möller wrote: > >> Is 0001-Mamone-s-unmodified-patch.patch t

Re: [AArch64] Optimize GHASH

2021-01-31 Thread Niels Möller
Maamoun TK writes: > This is a new patch to fix the clang build if "armv8-a-crypto" is enabled > and should be applied on top of the previous patches. Thanks, merged all the changes to the arm64 branch. Let me know if there's anything I missed. I have a few comments on the main patch, I'll write

Re: [AArch64] Optimize GHASH

2021-01-31 Thread Niels Möller
Michael Weiser writes: > Subject: [PATCH 1/4] Mamone's unmodified patch Hi, I've merged this, but I have a couple of comments and questions. > --- a/Makefile.in > +++ b/Makefile.in > @@ -616,6 +616,7 @@ distdir: $(DISTFILES) > set -e; for d in sparc32 sparc64 x86 \ > x86_64

Re: [AArch64] Optimize GHASH

2021-01-31 Thread Michael Weiser
Hello Niels, > I think this would be more user-friendle without the "a", > --enable-armv8-crypto, or --enable-arm64-crypto. Or do you foresee any > collision with an incompatible ARMv8-M crypto extension or the like? FWIW, I like --enable-arm64-crypto because it would nicely match with a director

Re: [AArch64] Optimize GHASH

2021-02-01 Thread Maamoun TK
On Sun, Jan 31, 2021 at 10:00 PM Michael Weiser wrote: > It might as well be that llvm-as just knows the > pmull instruction and assembles it fine but can't check if the target > CPU will be able to run it. > llvm-as wouldn't recognize pmull instruction without adding -march=armv8-a+crypto flag

Re: [AArch64] Optimize GHASH

2021-02-01 Thread Niels Möller
Maamoun TK writes: > llvm-as wouldn't recognize pmull instruction without > adding -march=armv8-a+crypto flag at least with the version I use "3.8.1" > I tried both .arch armv8-a+crypto and .arch_extension crypto and they > worked only for gas while llvm-as produces errors for pmull using. Is th

Re: [AArch64] Optimize GHASH

2021-02-02 Thread Michael Weiser
Hello Niels, On Tue, Feb 02, 2021 at 07:40:44AM +0100, Niels Möller wrote: > > llvm-as wouldn't recognize pmull instruction without > > adding -march=armv8-a+crypto flag at least with the version I use "3.8.1" 3.8.1 was released in 2017. It might not support recent aarch64 additions regarding .a

Re: [AArch64] Optimize GHASH

2021-02-02 Thread Jeffrey Walton
On Tue, Feb 2, 2021 at 8:00 AM Michael Weiser wrote: > > > > llvm-as wouldn't recognize pmull instruction without > > > adding -march=armv8-a+crypto flag at least with the version I use "3.8.1" > > 3.8.1 was released in 2017. It might not support recent > aarch64 additions regarding .arch directiv

Re: [AArch64] Optimize GHASH

2021-02-02 Thread Jeffrey Walton
On Tue, Feb 2, 2021 at 8:19 AM Jeffrey Walton wrote: > > On Tue, Feb 2, 2021 at 8:00 AM Michael Weiser wrote: > > > > > > llvm-as wouldn't recognize pmull instruction without > > > > adding -march=armv8-a+crypto flag at least with the version I use > > > > "3.8.1" > > > > 3.8.1 was released in 2

Re: [AArch64] Optimize GHASH

2021-02-02 Thread Michael Weiser
Hi all, On Tue, Feb 02, 2021 at 08:23:39AM -0500, Jeffrey Walton wrote: > > > I think my mentioning of llvm-as was a red herring. Looking at the > > > output of clang -v, llvm-as isn't involved at all. This is supported by > > > the man page stating that llvm-as accepts LLVM assembly and emits LL

Re: [AArch64] Optimize GHASH

2021-02-02 Thread Niels Möller
Michael Weiser writes: > I've downloaded binary builds of clang for aarch64 from > https://releases.llvm.org/download.html. 3.9.1 was the oldest prebuilt > toolchain I could find there and 11.0.0 the most recent. [...] > They also all support the .arch directive: > > $ cat t.s > .arch armv8-a+c

Re: [AArch64] Optimize GHASH

2021-02-02 Thread Niels Möller
Michael Weiser writes: > FWIW, I like --enable-arm64-crypto because it would nicely match with a > directory arm64/crypto for the source and the idea of enabling the > crypto extension for the arm64 target of nettle and be in line with > --enable-arm-neon and arm/neon as well. I'll rename both t

Re: [AArch64] Optimize GHASH

2021-02-02 Thread Maamoun TK
On Tue, Feb 2, 2021 at 7:22 PM Niels Möller wrote: > Michael Weiser writes: > > > FWIW, I like --enable-arm64-crypto because it would nicely match with a > > directory arm64/crypto for the source and the idea of enabling the > > crypto extension for the arm64 target of nettle and be in line with

Re: [AArch64] Optimize GHASH

2021-02-02 Thread Maamoun TK
On Sun, Jan 31, 2021 at 10:35 AM Niels Möller wrote: > > --- /dev/null > > +++ b/arm64/v8/gcm-hash.asm > > @@ -0,0 +1,343 @@ > > > +C common macros: > > +.macro PMUL in, param1, param2 > > +pmull F.1q,\param2\().1d,\in\().1d > > +pmull2 F1.1q,\param2\().2d,\in\().2d > > +

Re: [AArch64] Optimize GHASH

2021-02-02 Thread Niels Möller
Maamoun TK writes: > On Sun, Jan 31, 2021 at 10:35 AM Niels Möller wrote: > >> For consistency, I'd prefer defining all needed macros using m4. > > The macros in gcm-hash.asm file are dependent on defines in the same file > (shared for macros and function implementation) as they are relevant wit

Re: [AArch64] Optimize GHASH

2021-02-02 Thread Martin Storsjö
On Tue, 2 Feb 2021, Michael Weiser wrote: clang does not, however, support the .arch_extension directive. 3.9.1 complains about the directive, 11.0.0 seems to silently ignore it: $ cat t.s .arch_extension crypto pmull v2.1q, v2.1d, v1.1d $ aarch64-unknown-linux-gnu-as -o t.o t.s $ clang+llvm-3.

Re: [AArch64] Optimize GHASH

2021-02-06 Thread Michael Weiser
Hello Niels, On Tue, Feb 02, 2021 at 06:09:42PM +0100, Niels Möller wrote: > > I've downloaded binary builds of clang for aarch64 from > > https://releases.llvm.org/download.html. 3.9.1 was the oldest prebuilt > > toolchain I could find there and 11.0.0 the most recent. > [...] > > They also all

Re: [AArch64] Optimize GHASH

2021-02-06 Thread Niels Möller
Michael Weiser writes: > The arm64 branch builds and passes the testsuite on aarch64 and > aarch64_be with gcc 10.2 and clang 11.0.1 with and without the optimized > assembly routines on my pine64 boards. This is with the .arch directive > instead of modifying CFLAGS and the new configure option