sort: Switch to an extern inline implementation

Bertrand Marquis Wed, 16 Feb 2022 03:55:48 -0800

Hi,

> On 16 Feb 2022, at 11:46, Julien Grall <jul...@xen.org> wrote:
> 
> Hi,
> 
> On 16/02/2022 10:44, Andrew Cooper wrote:
>> On 16/02/2022 03:46, Stefano Stabellini wrote:
>>> On Mon, 14 Feb 2022, Julien Grall wrote:
>>>> On 14/02/2022 12:50, Andrew Cooper wrote:
>>>>> There are exactly 3 callers of sort() in the hypervisor.  Callbacks in a
>>>>> tight
>>>>> loop like this are problematic for performance, especially with Spectre v2
>>>>> protections, which is why extern inline is used commonly by libraries.
>>>>> 
>>>>> Both ARM callers pass in NULL for the swap function, and while this might
>>>>> seem
>>>>> like an attractive option at first, it causes generic_swap() to be used,
>>>>> which
>>>>> forced a byte-wise copy.  Provide real swap functions so the compiler can
>>>>> optimise properly, which is very important for ARM downstreams where
>>>>> milliseconds until the system is up matters.
>>>> Did you actually benchmark it? Both those lists will have < 128 elements in
>>>> them. So I would be extremely surprised if you save more than a few 
>>>> hundreds
>>>> microseconds with this approach.
>>>> 
>>>> So, my opinion on this approach hasn't changed. On v1, we discussed an
>>>> approach that would suit both Stefano and I. Jan seemed to confirm that 
>>>> would
>>>> also suit x86.
>>> This patch series has become 70 patches and for the sake of helping
>>> Andrew move forward in the quickest and most painless way possible, I
>>> append the following using generic_swap as static inline.
>>> 
>>> Julien, Bertrand, is that acceptable to you?
>>> 
>>> Andrew, I know this is not your favorite approach but you have quite a
>>> lot of changes to handle -- probably not worth focussing on one detail
>>> which is pretty minor?
>> It's not pretty minor.  My version really is the best thing for ARM.
> >
>> The perf improvement alone, marginal as it may be in practice, is
> 
> Our take on Arm has always been to avoid micro-optimization when the 
> resulting code is more difficult to maintain.
> 
>> justification alone for the patch, and Bertrand's R-by is testament to this.
> 
> I am not sure why calling out that Bertrand agreed means that everyone else 
> should accept your approach.
> 
> This reminds me other series that have been blocked for a long time by you. 
> Yet you made no effort to compromise... How ironic.
> 
>> But the reasons why getting rid the swap functions is important for
>> CET-IBT on x86 are exactly the same as why getting rid of them on ARM
>> will be important for BTI support.  A tagged function doing an arbitrary
>> bytewise swap from two parameters controlled by the third is far more
>> valuable to an attackers gadget library than a typical function.
> 
> This is a more compelling reason. However, I am a bit puzzled why it took you 
> so long to come up with this reason.
> 
>> i.e. this proposed intermediary, if it compiles, is just busywork which
>> someone else is going to have to revert in the future, along with having
>> this argument again.
> 
> Well, this argument would have never happened if your commit message 
> contained information about BTI.


I agree that this would be nice to have in the commit message as a 
justification for the change.

Cheers
Bertrand

> 
> Instead you decided to just mention the performance part despite me objecting 
> it and requesting for a different reason in v1 (see [1]).
> 
> Anyway, I will wait for a reword of the commit message before lifting my 
> Nacked-by.
> 
> Cheers,
> 
> [1] 
> https://lore.kernel.org/xen-devel/f7bb7a08-4721-f2a8-8602-0cd1baf1f...@xen.org/
> 
> -- 
> Julien Grall

Re: [PATCH v2 02/70] xen/sort: Switch to an extern inline implementation

Reply via email to