Revert of the 4.17 hypercall handler changes Re: [PATCH-for-4.17] xen: fix generated code for calling hypercall handlers

Andrew Cooper Thu, 03 Nov 2022 22:02:14 -0700

On 03/11/2022 16:36, Juergen Gross wrote:
> The code generated for the call_handlers_*() macros needs to avoid
> undefined behavior when multiple handlers share the same priority.
> The issue is the hypercall number being unverified fed into the macros
> and then used to set a mask via "mask = 1ULL << <hypercall-number>".
>
> Avoid a shift amount of more than 63 by setting mask to zero in case
> the hypercall number is too large.
>
> Fixes: eca1f00d0227 ("xen: generate hypercall interface related code")
> Signed-off-by: Juergen Gross <jgr...@suse.com>


This is not a suitable fix.  There being a security issue is just the
tip of the iceberg. 

The changes broke the kexec_op() ABI and this is a blocking regression
vs 4.16.

In lieu of having time to do
https://gitlab.com/xen-project/xen/-/issues/93, here's the abridged list
of errors

The series claims "This is beneficial to performance and avoids
speculation issues.", c/s 8523851dbc4.

That half sentence is literally the sum total of justification given for
this being related to speculation.

The other half of the sentence claims performance.  But no performance
testing was done; the cover letter talks about one test with specifics,
but it describes a scenario where the delta was a handful of cycles
difference, as one part in multi-millions, probably billions.  There is
no plausible way that whatever raw data lead to the "<1% improvement"
claim was statistically significant.

The reason a performance improvement cannot be measured is that a big
out-of-order core can easily absorb the hit in the shadow of other
operations.   Smaller cores cannot, and I'm confident that adequate
performance testing would have demonstrated this.

Unaddressed is the code bloat from the change; relevant because it is
the negative half of the tradeoff on what is allegedly a net improvement
on a fastpath.  Actually trying to reason about the code bloat would
have highlighted why it's rather important that the logic be implemented
as a real function rather than a macro.

Also unaddressed is whether the multi-nesting even has any utility, and
if it does, what it does to the other kinds of workloads.

Unaddressed too is the impact from XSAs 398 and 407 which, as members of
the security team, you had substantially more exposure to than most.


Taking a step back from low level issues.

This series introduces a NIH domain-specific language for describing
hypercalls, but lacking in any documentation.  As an exercise to others,
time how long it takes you to get compile a hypervisor with a new
hypercall that takes e.g. one integer and one pointer parameter.  There
should be a whole lot more acks on that patch for it to be considered to
have an adequate review.

Somewhere (I can't recall where, but it's 4 in the morning so I'm not
looking for it now), a statement was made that if issues were found they
could be addressed going forwards.  But the series was committed without
any possibility for anyone to perform the testing requested of the
original submission.

There was one redeeming property of the series, and yet there was no
discussion anywhere about function pointer casts.  But given that the
premise was disputed to begin with, and the performance testing that
stood an outside chance of countering the dispute was ignored, and
/then/ that my objections were disregarded and the series committed
without calling a vote, I have to say that I'm very displeased with how
this went.

~Andrew

Revert of the 4.17 hypercall handler changes Re: [PATCH-for-4.17] xen: fix generated code for calling hypercall handlers

Reply via email to