ok, let's give it a shot then. And watch for behaviour changes...
Mark Kettenis <mark.kette...@xs4all.nl> wrote: > > From: "Theo de Raadt" <dera...@openbsd.org> > > Date: Sat, 01 Oct 2022 09:37:01 -0600 > > > > Mark Kettenis <mark.kette...@xs4all.nl> wrote: > > > > > Armv8.4 introduced a feature that provides data independent timing for > > > data processing instructions. This feature is obviously introduced to > > > mitigate timing side-channel attacks. Presumably enabling the feature > > > has some impact on performance as it would disable certain > > > optimizations to guarantee constant-time execution of instructions. > > > > But what impact does it have on all regular code? I cannot find that > > answer in the quoted email thread. And I don't even see the question > > being asked, because those people aren't talking about changing the > > cpu to this mode permanently & completely. > > > > > The only hardware that implements this feature that I've seen so far > > > is Apple's Icestorm/Firestorm (M1) and Avalanche/Blizzard (M2) cores. > > > I ran some benchmarks on an M2 Macbook Air. In particular, I ran > > > "eopenssl30 speed" bound to a "performance" core. > > > > That is testing the performance of a program which uses a very narrow > > set of cpu behaviours. For example, it has almost no crossings in & out > > of the kernel: system calls and page faults. The class of operations > > being tested are mostly pure compute against the register set rather > > than memory, and when it does perform memory loads, it does so in quite > > linear fashion. It also does relatively few memory writes. > > I also tested kernel builds. I don't see any evidence of any > significant impact on those. > > > It is a program that would be slower if they implimented the feature > > poorly, but using such a small subset of system behaviours, I don't > > think it can identify things which might be slower in part, and thus > > have an effect on whole system performance. > > The ARM ARM is rather explicit on the instructions that might be > affected by those flags. That list makes me believe any performance > impact would show up most prominantly in code that uses the ARMv8 > crypto instructions. > > > > I could not detect a significant slowdown with this feature enabled. > > > > Then why did they make it a chicken bit? Why did the cpu makers not > > simply make the cpus always act this way? There must be a reason, > > probably undisclosed. They have been conservative for some reasons. > > I wouldn't call it a chicken bit. But obviously those in charge of > the architecture spec anticipated some significant speedup from having > instructions that have data-dependent timings. It appears that > Apple's implementation doesn't though. > > What might be going on here is that Apple is just ticking boxes to > make their implementation spec compliant. The feature is required for > ARMv8.4 and above. But if none of their instructions have > data-dependent timing, they could implement the "chicken bit" without > it actually having an effect. > > > Is there an impact on the performance of building a full snapshot? That > > at least has a richer use of all code flows, with lots of kernel crossings, > > as opposed to the openssl speed command. > > I didn't test full snapshot builds. I think the kernel builds I did > should be representative enough. But I certainly can do more > benchmarking if you think that would be desirable. > > > > Therefore I think we should enable this feature by default on OpenBSD. > > > > Maybe. But I am a bit unconvinced. > > > > > The use of this feature is still being discussed in the wider > > > comminity. See for example the following thread: > > > > > > > > > https://lore.kernel.org/linux-arm-kernel/ywgcrqutxmx0w...@gmail.com/T/#mfcba14511c69461bd8921fef758baae120d090dc > > > > > > > That discussion is talking about providing the ability for programs to > > request that mode of operation. They are not talking about switching > > into that mode for the kernel and all processes. > > > > That seems like a big difference. > > > > >From a security perspective it might make some sense, but this isn't > > like some x86 catastrophy level speculation. I'm not sure there is > > enough evidence yet that this is required for all modes of operation. > > In principle this should be only necessary for code that is sensitive > to timing side-channel attacks. But the way I see it, the ecosystem > will need (a lot of) time to figure out how to enable this mode around > the bits of code where it *does* matter. > > > > On arm64, the feature can be controlled from userland, so even if we > > > turn it on by default, userland code can still make its own decisions > > > on whether it wants the feature enabled or disabled. We may have to > > > expose the ID_AA64PFR0_EL1 register to userland when code shows uo > > > that attempts to do that. > > > > I suspect that is this will go. Programs with specific libraries would > > then request the feature on behalf of their entire runtime. Something > > like a constructor (or startup function) in libcrypto would enable it. > > Meaning this program "might" care about timingsafe behaviour, so let's > > enable it, for the remainder of the life of that program. >