On 16/02/2022 03:46, Stefano Stabellini wrote:
> On Mon, 14 Feb 2022, Julien Grall wrote:
>> On 14/02/2022 12:50, Andrew Cooper wrote:
>>> There are exactly 3 callers of sort() in the hypervisor.  Callbacks in a
>>> tight
>>> loop like this are problematic for performance, especially with Spectre v2
>>> protections, which is why extern inline is used commonly by libraries.
>>>
>>> Both ARM callers pass in NULL for the swap function, and while this might
>>> seem
>>> like an attractive option at first, it causes generic_swap() to be used,
>>> which
>>> forced a byte-wise copy.  Provide real swap functions so the compiler can
>>> optimise properly, which is very important for ARM downstreams where
>>> milliseconds until the system is up matters.
>> Did you actually benchmark it? Both those lists will have < 128 elements in
>> them. So I would be extremely surprised if you save more than a few hundreds
>> microseconds with this approach.
>>
>> So, my opinion on this approach hasn't changed. On v1, we discussed an
>> approach that would suit both Stefano and I. Jan seemed to confirm that would
>> also suit x86.
> This patch series has become 70 patches and for the sake of helping
> Andrew move forward in the quickest and most painless way possible, I
> append the following using generic_swap as static inline.
>
> Julien, Bertrand, is that acceptable to you?
>
> Andrew, I know this is not your favorite approach but you have quite a
> lot of changes to handle -- probably not worth focussing on one detail
> which is pretty minor?

It's not pretty minor.  My version really is the best thing for ARM.

The perf improvement alone, marginal as it may be in practice, is
justification alone for the patch, and Bertrand's R-by is testament to this.

But the reasons why getting rid the swap functions is important for
CET-IBT on x86 are exactly the same as why getting rid of them on ARM
will be important for BTI support.  A tagged function doing an arbitrary
bytewise swap from two parameters controlled by the third is far more
valuable to an attackers gadget library than a typical function.

i.e. this proposed intermediary, if it compiles, is just busywork which
someone else is going to have to revert in the future, along with having
this argument again.

~Andrew

Reply via email to