This is a *huge* effort. Thank you. On Sun, Jun 28, 2020 at 03:27:56AM +0000, Taylor R Campbell wrote: > > Date: Mon, 22 Jun 2020 23:43:20 +0000 > > From: Taylor R Campbell <riastr...@netbsd.org> > > > > There is some more room for improvement -- SSSE3 provides PSHUFB which > > can sequentially speed up parts of AES, and is supported by a good > > number of amd64 CPUs starting around 14 years ago that lack AES-NI -- > > but there are diminishing returns for increasing implementation and > > maintenance effort, so I'd like to focus on making an impact on > > systems that matter. (That includes non-x86 CPUs -- e.g., we could > > probably easily adapt the Intel SSE2 logic to ARM NEON -- but I would > > like to focus on systems where there is demand.) > > I drafted derivatives of Mike Hamburg's vpaes code using Intel SSSE3 > and using ARM NEON / aarch64 SIMD. In principle the ARM NEON code > should work on armv7, but I have only compile-tested it there, and > there are a few kinks to be worked out before it can be used in the > kernel on armv7. > > I pushed it to the riastradh-kernelcrypto topic on hg src-draft, and I > updated the userland aestest utility if you want to get a rough idea > of the performance without updating your kernel (see previous message > for usage instructions): > > https://www.NetBSD.org/~riastradh/tmp/20200627/aestest.tgz > > The summary of the patch set now is (kernel only -- no userland > changes): > > - every architecture gets constant-time AES, with BearSSL's aes_ct > 32-bit bitsliced implementation -- there is no more vulnerable AES > code in the NetBSD kernel, although there is a substantial > performance hit on many platforms > > - every architecture gets new cgd(4) support for Adiantum, which is > generally as fast as or faster than AES-CBC and AES-XTS were before > and provides better security (and has lots of room to be sped up; > any speedups would also be applicable to other purposes too, like > Wireguard) > > - most high-end x86 of the past decade gets much much faster AES with > AES-NI CPU support (no 32-bit yet) > > - almost all x86 of the past decade gets faster or much faster AES > with a vpaes-style SSSE3-based implementation (32-bit included) > > - most x86 of the past two decades, including all amd64, mitigates the > performance hit with a bitsliced SSE2-based implementation (32-bit > included) > > - VIA gets much faster AES with VIA ACE (for all users in the kernel, > including cgd, not just those that use opencrypto as we had before > with the via_padlock.c driver) > > - almost all aarch64 (except rpi) gets much much faster AES with > ARMv8.0-AES CPU support > > - 64-bit rpi (and, with a little more work, armv7 with NEON) mitigates > the performance hit -- and may get faster -- with a vpaes-style > NEON-based implementation > > Some other CPUs like modern POWER have AES CPU instructions these days > too. The vpaes approach could probably be adapted to PowerPC Altivec, > and maybe some other vector units I'm not as familiar with (MIPS SIMD > Architecture, MSA?). BearSSL's aes_ct64 64-bit bitsliced > implementation might be worth adopting for 64-bit CPUs without a > vector unit, if anyone cares -- maybe alpha or mips64. But I think > I'm at the limit of what I'm willing to do for fun with the hardware I > have easy access to.
-- Thor Lancelot Simon t...@panix.com "Whether or not there's hope for change is not the question. If you want to be a free person, you don't stand up for human rights because it will work, but because it is right." --Andrei Sakharov