> From: "Theo de Raadt" <dera...@openbsd.org> > Date: Sat, 01 Oct 2022 09:37:01 -0600 > > Mark Kettenis <mark.kette...@xs4all.nl> wrote: > > > Armv8.4 introduced a feature that provides data independent timing for > > data processing instructions. This feature is obviously introduced to > > mitigate timing side-channel attacks. Presumably enabling the feature > > has some impact on performance as it would disable certain > > optimizations to guarantee constant-time execution of instructions. > > But what impact does it have on all regular code? I cannot find that > answer in the quoted email thread. And I don't even see the question > being asked, because those people aren't talking about changing the > cpu to this mode permanently & completely. > > > The only hardware that implements this feature that I've seen so far > > is Apple's Icestorm/Firestorm (M1) and Avalanche/Blizzard (M2) cores. > > I ran some benchmarks on an M2 Macbook Air. In particular, I ran > > "eopenssl30 speed" bound to a "performance" core. > > That is testing the performance of a program which uses a very narrow > set of cpu behaviours. For example, it has almost no crossings in & out > of the kernel: system calls and page faults. The class of operations > being tested are mostly pure compute against the register set rather > than memory, and when it does perform memory loads, it does so in quite > linear fashion. It also does relatively few memory writes.
I also tested kernel builds. I don't see any evidence of any significant impact on those. > It is a program that would be slower if they implimented the feature > poorly, but using such a small subset of system behaviours, I don't > think it can identify things which might be slower in part, and thus > have an effect on whole system performance. The ARM ARM is rather explicit on the instructions that might be affected by those flags. That list makes me believe any performance impact would show up most prominantly in code that uses the ARMv8 crypto instructions. > > I could not detect a significant slowdown with this feature enabled. > > Then why did they make it a chicken bit? Why did the cpu makers not > simply make the cpus always act this way? There must be a reason, > probably undisclosed. They have been conservative for some reasons. I wouldn't call it a chicken bit. But obviously those in charge of the architecture spec anticipated some significant speedup from having instructions that have data-dependent timings. It appears that Apple's implementation doesn't though. What might be going on here is that Apple is just ticking boxes to make their implementation spec compliant. The feature is required for ARMv8.4 and above. But if none of their instructions have data-dependent timing, they could implement the "chicken bit" without it actually having an effect. > Is there an impact on the performance of building a full snapshot? That > at least has a richer use of all code flows, with lots of kernel crossings, > as opposed to the openssl speed command. I didn't test full snapshot builds. I think the kernel builds I did should be representative enough. But I certainly can do more benchmarking if you think that would be desirable. > > Therefore I think we should enable this feature by default on OpenBSD. > > Maybe. But I am a bit unconvinced. > > > The use of this feature is still being discussed in the wider > > comminity. See for example the following thread: > > > > > > https://lore.kernel.org/linux-arm-kernel/ywgcrqutxmx0w...@gmail.com/T/#mfcba14511c69461bd8921fef758baae120d090dc > > > > That discussion is talking about providing the ability for programs to > request that mode of operation. They are not talking about switching > into that mode for the kernel and all processes. > > That seems like a big difference. > > >From a security perspective it might make some sense, but this isn't > like some x86 catastrophy level speculation. I'm not sure there is > enough evidence yet that this is required for all modes of operation. In principle this should be only necessary for code that is sensitive to timing side-channel attacks. But the way I see it, the ecosystem will need (a lot of) time to figure out how to enable this mode around the bits of code where it *does* matter. > > On arm64, the feature can be controlled from userland, so even if we > > turn it on by default, userland code can still make its own decisions > > on whether it wants the feature enabled or disabled. We may have to > > expose the ID_AA64PFR0_EL1 register to userland when code shows uo > > that attempts to do that. > > I suspect that is this will go. Programs with specific libraries would > then request the feature on behalf of their entire runtime. Something > like a constructor (or startup function) in libcrypto would enable it. > Meaning this program "might" care about timingsafe behaviour, so let's > enable it, for the remainder of the life of that program.