Further issues Re: [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist

Andrew Cooper Wed, 02 Nov 2022 10:26:45 -0700

Hi,

To be clear, what is presented here are clear improvements in the status
quo, and qualify for inclusion on their own merits.  And definitely
should be considered.



That said, this is a swamp with future problems, and one rather
fundamental one in Xen which I is not going to be easy to fix.

1) (simple), there are a bunch of stubs, including
stub_xc_domain_getinfo() which don't use
caml_{enter,leave}_blocking_section() when they should.

2) stub_xc_domain_getinfo() reuses xc_domain_getinfolist() meaning that
it uses XEN_SYSCTL_getdomaininfolist rather than
XEN_DOMCTL_getdomaininfo, which is a problem because...

3) While both of these hypercalls have crazy APIs leading to loads of
broken callers, at least the DOMCTL has a fastpath for when you specify
a valid domid.  The SYSCTL (and DOMCTL slowpath) is an O(N) search of
all domains starting from d0 to find the first domain with an id >= the
input request.

The DOMCTL slowpath is useless.  Not a single caller (I've ever
encountered) wants that behaviour, whereas I've needed to bugfix caller
which didn't have an "&& info.domid == domid" to have one, to get the
semantics they wanted.  Cleaning this up will be a good improvement.

4) The (adjusted) algorithm in patch 1 still loops until it gets a
result with no entries.  Meaning that it's still going to make one
hypercall too many, searching the entire domlist, to return no data.  In
principle you could optimise to stop at any hypercall which returns
fewer than the requested number of domains, except...

5) ... if you ever use more than a single hypercall, Xen has dropped and
re-acquired the domlist read lock, meaning that you can't use the result
anyway.  Causality couldn't be broken when domains were allocated
sequentially, but we have a random allocation mode now so you can
observe things out of order.

The only safe action is to ask for all 32k domains in a single go, and
use a single hypercall.

~Andrew

Further issues Re: [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist

Reply via email to