ok, let's give it a shot then.

And watch for behaviour changes...


Mark Kettenis <mark.kette...@xs4all.nl> wrote:

> > From: "Theo de Raadt" <dera...@openbsd.org>
> > Date: Sat, 01 Oct 2022 09:37:01 -0600
> > 
> > Mark Kettenis <mark.kette...@xs4all.nl> wrote:
> > 
> > > Armv8.4 introduced a feature that provides data independent timing for
> > > data processing instructions.  This feature is obviously introduced to
> > > mitigate timing side-channel attacks.  Presumably enabling the feature
> > > has some impact on performance as it would disable certain
> > > optimizations to guarantee constant-time execution of instructions.
> > 
> > But what impact does it have on all regular code?  I cannot find that
> > answer in the quoted email thread.  And I don't even see the question
> > being asked, because those people aren't talking about changing the
> > cpu to this mode permanently & completely.
> > 
> > > The only hardware that implements this feature that I've seen so far
> > > is Apple's Icestorm/Firestorm (M1) and Avalanche/Blizzard (M2) cores.
> > > I ran some benchmarks on an M2 Macbook Air.  In particular, I ran
> > > "eopenssl30 speed" bound to a "performance" core.
> > 
> > That is testing the performance of a program which uses a very narrow
> > set of cpu behaviours.  For example, it has almost no crossings in & out
> > of the kernel: system calls and page faults.  The class of operations
> > being tested are mostly pure compute against the register set rather
> > than memory, and when it does perform memory loads, it does so in quite
> > linear fashion.  It also does relatively few memory writes.
> 
> I also tested kernel builds.  I don't see any evidence of any
> significant impact on those.  
> 
> > It is a program that would be slower if they implimented the feature
> > poorly, but using such a small subset of system behaviours, I don't
> > think it can identify things which might be slower in part, and thus
> > have an effect on whole system performance.
> 
> The ARM ARM is rather explicit on the instructions that might be
> affected by those flags.  That list makes me believe any performance
> impact would show up most prominantly in code that uses the ARMv8
> crypto instructions.
> 
> > > I could not detect a significant slowdown with this feature enabled.
> > 
> > Then why did they make it a chicken bit?  Why did the cpu makers not
> > simply make the cpus always act this way?  There must be a reason,
> > probably undisclosed.  They have been conservative for some reasons.
> 
> I wouldn't call it a chicken bit.  But obviously those in charge of
> the architecture spec anticipated some significant speedup from having
> instructions that have data-dependent timings.  It appears that
> Apple's implementation doesn't though.
> 
> What might be going on here is that Apple is just ticking boxes to
> make their implementation spec compliant.  The feature is required for
> ARMv8.4 and above.  But if none of their instructions have
> data-dependent timing, they could implement the "chicken bit" without
> it actually having an effect.
> 
> > Is there an impact on the performance of building a full snapshot?  That
> > at least has a richer use of all code flows, with lots of kernel crossings,
> > as opposed to the openssl speed command.
> 
> I didn't test full snapshot builds.  I think the kernel builds I did
> should be representative enough.  But I certainly can do more
> benchmarking if you think that would be desirable.
> 
> > > Therefore I think we should enable this feature by default on OpenBSD.
> > 
> > Maybe.  But I am a bit unconvinced.
> > 
> > > The use of this feature is still being discussed in the wider
> > > comminity.  See for example the following thread:
> > > 
> > >   
> > > https://lore.kernel.org/linux-arm-kernel/ywgcrqutxmx0w...@gmail.com/T/#mfcba14511c69461bd8921fef758baae120d090dc
> > > 
> > 
> > That discussion is talking about providing the ability for programs to
> > request that mode of operation.  They are not talking about switching
> > into that mode for the kernel and all processes.
> > 
> > That seems like a big difference.
> > 
> > >From a security perspective it might make some sense, but this isn't
> > like some x86 catastrophy level speculation. I'm not sure there is
> > enough evidence yet that this is required for all modes of operation.
> 
> In principle this should be only necessary for code that is sensitive
> to timing side-channel attacks.  But the way I see it, the ecosystem
> will need (a lot of) time to figure out how to enable this mode around
> the bits of code where it *does* matter.
> 
> > > On arm64, the feature can be controlled from userland, so even if we
> > > turn it on by default, userland code can still make its own decisions
> > > on whether it wants the feature enabled or disabled.  We may have to
> > > expose the ID_AA64PFR0_EL1 register to userland when code shows uo
> > > that attempts to do that.
> > 
> > I suspect that is this will go.  Programs with specific libraries would
> > then request the feature on behalf of their entire runtime.  Something
> > like a constructor (or startup function) in libcrypto would enable it.
> > Meaning this program "might" care about timingsafe behaviour, so let's
> > enable it, for the remainder of the life of that program.
> 

Reply via email to