It has been noticed in 2013 that shadow allocation sometimes returns the
wrong value, which got worked around by adding a limit to the shadow
multiplier of 1000 and ignoring the value from Xen in that case
to avoid a shadow multiplier causing a VM to request 6PB of memory for
example:
https://github.com/xapi-project/xen-api/pull/1215/commits/be55a8c30b41d1cd7596fc100ab1cfd3539f74eb

However that is just a workaround, and I've just reproduced this by
killing a VM mid migration, which resulted in a shadow multiplier of
629.42, rendering the VM unbootable even after a host reboot.

The real bug is in Xen: when a VM is dying it will return '0' for paging
op domctls and log a message at info level
'Ignoring paging op on dying domain', which leaves the 'mb' parameter
uninitialized upon return from the domctl.

The binding also doesn't initialize the 'c->mb' parameter (it is meant
to be used only when setting, not when querying the allocation),
which results in the VM getting a shadow allocation (and thus multiplier)
set based on what value happened to be currently on the stack.

Explicitly initialize the value passed to the domctl, and detect the 
uninitialized
case (shadow allocation of 0), and raise an exception in that case.
The exception will cause xenopsd to skip setting the shadow multiplier.

Note that the behaviour of Xen here is inconsistent between x86 and ARM:
ARM would return EINVAL when it gets a paging op on a dying domain,
and X86-64 would return 0 with possibly uninitialized data.

It might be desirable to change the x86 path in the hypervisor to return
EINVAL, although that would require more testing in case it breaks
somethig.
But the bindings should be defensive anyway against bugs like this.

Signed-off-by: Edwin Török <edvin.to...@citrix.com>
---
Reason for inclusion in 4.17:
- fixes a long-standing (>9y old) bug that is still happening today

Changes since v2:
- new in v3
---
 tools/ocaml/libs/xc/xenctrl_stubs.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c 
b/tools/ocaml/libs/xc/xenctrl_stubs.c
index e2d897581f..9681a74e40 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1019,7 +1019,7 @@ CAMLprim value stub_shadow_allocation_get(value xch, 
value domid)
 {
     CAMLparam2(xch, domid);
     CAMLlocal1(mb);
-    unsigned int c_mb;
+    unsigned int c_mb = 0;
     int ret;
 
     caml_enter_blocking_section();
@@ -1029,6 +1029,9 @@ CAMLprim value stub_shadow_allocation_get(value xch, 
value domid)
     caml_leave_blocking_section();
     if (ret != 0)
         failwith_xc(_H(xch));
+    if ( !c_mb )
+        caml_failwith("domctl returned uninitialized data for shadow "
+                      "allocation, dying domain?");
 
     mb = Val_int(c_mb);
     CAMLreturn(mb);
-- 
2.34.1


Reply via email to