This is a *huge* effort.  Thank you.

On Sun, Jun 28, 2020 at 03:27:56AM +0000, Taylor R Campbell wrote:
> > Date: Mon, 22 Jun 2020 23:43:20 +0000
> > From: Taylor R Campbell <riastr...@netbsd.org>
> > 
> > There is some more room for improvement -- SSSE3 provides PSHUFB which
> > can sequentially speed up parts of AES, and is supported by a good
> > number of amd64 CPUs starting around 14 years ago that lack AES-NI --
> > but there are diminishing returns for increasing implementation and
> > maintenance effort, so I'd like to focus on making an impact on
> > systems that matter.  (That includes non-x86 CPUs -- e.g., we could
> > probably easily adapt the Intel SSE2 logic to ARM NEON -- but I would
> > like to focus on systems where there is demand.)
> 
> I drafted derivatives of Mike Hamburg's vpaes code using Intel SSSE3
> and using ARM NEON / aarch64 SIMD.  In principle the ARM NEON code
> should work on armv7, but I have only compile-tested it there, and
> there are a few kinks to be worked out before it can be used in the
> kernel on armv7.
> 
> I pushed it to the riastradh-kernelcrypto topic on hg src-draft, and I
> updated the userland aestest utility if you want to get a rough idea
> of the performance without updating your kernel (see previous message
> for usage instructions):
> 
> https://www.NetBSD.org/~riastradh/tmp/20200627/aestest.tgz
> 
> The summary of the patch set now is (kernel only -- no userland
> changes):
> 
> - every architecture gets constant-time AES, with BearSSL's aes_ct
>   32-bit bitsliced implementation -- there is no more vulnerable AES
>   code in the NetBSD kernel, although there is a substantial
>   performance hit on many platforms
> 
> - every architecture gets new cgd(4) support for Adiantum, which is
>   generally as fast as or faster than AES-CBC and AES-XTS were before
>   and provides better security (and has lots of room to be sped up;
>   any speedups would also be applicable to other purposes too, like
>   Wireguard)
> 
> - most high-end x86 of the past decade gets much much faster AES with
>   AES-NI CPU support (no 32-bit yet)
> 
> - almost all x86 of the past decade gets faster or much faster AES
>   with a vpaes-style SSSE3-based implementation (32-bit included)
> 
> - most x86 of the past two decades, including all amd64, mitigates the
>   performance hit with a bitsliced SSE2-based implementation (32-bit
>   included)
> 
> - VIA gets much faster AES with VIA ACE (for all users in the kernel,
>   including cgd, not just those that use opencrypto as we had before
>   with the via_padlock.c driver)
> 
> - almost all aarch64 (except rpi) gets much much faster AES with
>   ARMv8.0-AES CPU support
> 
> - 64-bit rpi (and, with a little more work, armv7 with NEON) mitigates
>   the performance hit -- and may get faster -- with a vpaes-style
>   NEON-based implementation
> 
> Some other CPUs like modern POWER have AES CPU instructions these days
> too.  The vpaes approach could probably be adapted to PowerPC Altivec,
> and maybe some other vector units I'm not as familiar with (MIPS SIMD
> Architecture, MSA?).  BearSSL's aes_ct64 64-bit bitsliced
> implementation might be worth adopting for 64-bit CPUs without a
> vector unit, if anyone cares -- maybe alpha or mips64.  But I think
> I'm at the limit of what I'm willing to do for fun with the hardware I
> have easy access to.

-- 
 Thor Lancelot Simon                                         t...@panix.com
  "Whether or not there's hope for change is not the question.  If you
   want to be a free person, you don't stand up for human rights because
   it will work, but because it is right."      --Andrei Sakharov

Reply via email to