Mark Kettenis <mark.kette...@xs4all.nl> wrote:

> Armv8.4 introduced a feature that provides data independent timing for
> data processing instructions.  This feature is obviously introduced to
> mitigate timing side-channel attacks.  Presumably enabling the feature
> has some impact on performance as it would disable certain
> optimizations to guarantee constant-time execution of instructions.

But what impact does it have on all regular code?  I cannot find that
answer in the quoted email thread.  And I don't even see the question
being asked, because those people aren't talking about changing the
cpu to this mode permanently & completely.

> The only hardware that implements this feature that I've seen so far
> is Apple's Icestorm/Firestorm (M1) and Avalanche/Blizzard (M2) cores.
> I ran some benchmarks on an M2 Macbook Air.  In particular, I ran
> "eopenssl30 speed" bound to a "performance" core.

That is testing the performance of a program which uses a very narrow
set of cpu behaviours.  For example, it has almost no crossings in & out
of the kernel: system calls and page faults.  The class of operations
being tested are mostly pure compute against the register set rather
than memory, and when it does perform memory loads, it does so in quite
linear fashion.  It also does relatively few memory writes.

It is a program that would be slower if they implimented the feature
poorly, but using such a small subset of system behaviours, I don't
think it can identify things which might be slower in part, and thus
have an effect on whole system performance.

> I could not detect a significant slowdown with this feature enabled.

Then why did they make it a chicken bit?  Why did the cpu makers not
simply make the cpus always act this way?  There must be a reason,
probably undisclosed.  They have been conservative for some reasons.

Is there an impact on the performance of building a full snapshot?  That
at least has a richer use of all code flows, with lots of kernel crossings,
as opposed to the openssl speed command.

> Therefore I think we should enable this feature by default on OpenBSD.

Maybe.  But I am a bit unconvinced.

> The use of this feature is still being discussed in the wider
> comminity.  See for example the following thread:
> 
>   
> https://lore.kernel.org/linux-arm-kernel/ywgcrqutxmx0w...@gmail.com/T/#mfcba14511c69461bd8921fef758baae120d090dc
> 

That discussion is talking about providing the ability for programs to
request that mode of operation.  They are not talking about switching
into that mode for the kernel and all processes.

That seems like a big difference.

>From a security perspective it might make some sense, but this isn't
like some x86 catastrophy level speculation. I'm not sure there is
enough evidence yet that this is required for all modes of operation.

> On arm64, the feature can be controlled from userland, so even if we
> turn it on by default, userland code can still make its own decisions
> on whether it wants the feature enabled or disabled.  We may have to
> expose the ID_AA64PFR0_EL1 register to userland when code shows uo
> that attempts to do that.

I suspect that is this will go.  Programs with specific libraries would
then request the feature on behalf of their entire runtime.  Something
like a constructor (or startup function) in libcrypto would enable it.
Meaning this program "might" care about timingsafe behaviour, so let's
enable it, for the remainder of the life of that program.


Reply via email to