Mark Kettenis <mark.kette...@xs4all.nl> wrote: > Armv8.4 introduced a feature that provides data independent timing for > data processing instructions. This feature is obviously introduced to > mitigate timing side-channel attacks. Presumably enabling the feature > has some impact on performance as it would disable certain > optimizations to guarantee constant-time execution of instructions.
But what impact does it have on all regular code? I cannot find that answer in the quoted email thread. And I don't even see the question being asked, because those people aren't talking about changing the cpu to this mode permanently & completely. > The only hardware that implements this feature that I've seen so far > is Apple's Icestorm/Firestorm (M1) and Avalanche/Blizzard (M2) cores. > I ran some benchmarks on an M2 Macbook Air. In particular, I ran > "eopenssl30 speed" bound to a "performance" core. That is testing the performance of a program which uses a very narrow set of cpu behaviours. For example, it has almost no crossings in & out of the kernel: system calls and page faults. The class of operations being tested are mostly pure compute against the register set rather than memory, and when it does perform memory loads, it does so in quite linear fashion. It also does relatively few memory writes. It is a program that would be slower if they implimented the feature poorly, but using such a small subset of system behaviours, I don't think it can identify things which might be slower in part, and thus have an effect on whole system performance. > I could not detect a significant slowdown with this feature enabled. Then why did they make it a chicken bit? Why did the cpu makers not simply make the cpus always act this way? There must be a reason, probably undisclosed. They have been conservative for some reasons. Is there an impact on the performance of building a full snapshot? That at least has a richer use of all code flows, with lots of kernel crossings, as opposed to the openssl speed command. > Therefore I think we should enable this feature by default on OpenBSD. Maybe. But I am a bit unconvinced. > The use of this feature is still being discussed in the wider > comminity. See for example the following thread: > > > https://lore.kernel.org/linux-arm-kernel/ywgcrqutxmx0w...@gmail.com/T/#mfcba14511c69461bd8921fef758baae120d090dc > That discussion is talking about providing the ability for programs to request that mode of operation. They are not talking about switching into that mode for the kernel and all processes. That seems like a big difference. >From a security perspective it might make some sense, but this isn't like some x86 catastrophy level speculation. I'm not sure there is enough evidence yet that this is required for all modes of operation. > On arm64, the feature can be controlled from userland, so even if we > turn it on by default, userland code can still make its own decisions > on whether it wants the feature enabled or disabled. We may have to > expose the ID_AA64PFR0_EL1 register to userland when code shows uo > that attempts to do that. I suspect that is this will go. Programs with specific libraries would then request the feature on behalf of their entire runtime. Something like a constructor (or startup function) in libcrypto would enable it. Meaning this program "might" care about timingsafe behaviour, so let's enable it, for the remainder of the life of that program.