Dan Kenigsberg has submitted this change and it was merged. Change subject: vm: add known case for graceful destroy to fail ......................................................................
vm: add known case for graceful destroy to fail In commit 09c2f40ebc318bbd188696fd7d02d2dc73d03256 we changed the releaseVm flow to be more strict and do not swallow libvirt exceptions when domain destruction fails. In these cases the destroy() API call will now fail. Unfortunately, one particular case escaped verification, and broke. When a migration succesfully ends, libvirt notifies a 'Stopped' event, with detail VIR_DOMAIN_EVENT_STOPPED_MIGRATED. In response to this, VDSM correctly puts the VM as Down, but does not trigger the internal onQemuDeath callback, which will invalidate the domain handle (Vm._dom field). The sequence of events is further demonstrated by this excerpt (Logs trimmed for brevity): vm.Vm::(_onLibvirtLifecycleEvent) vmId=`56d1c657-dd76-4609-a207-c050699be5be`::event Stopped detail 3 opaque None vm.Vm::(cancel) vmId=`56d1c657-dd76-4609-a207-c050699be5be`::canceling migration downtime thread vm.Vm::(stop) vmId=`56d1c657-dd76-4609-a207-c050699be5be`::stopping migration monitor thread vm.Vm::(run) vmId=`56d1c657-dd76-4609-a207-c050699be5be`::migration downtime thread exiting root::(wrapper) Unknown libvirterror: ecode: 42 edom: 10 level: 2 message: Domain not found: no domain with matching uuid '56d1c657-dd76-4609-a207-c050699be5be' root::(wrapper) Unknown libvirterror: ecode: 42 edom: 10 level: 2 message: Domain not found: no domain with matching uuid '56d1c657-dd76-4609-a207-c050699be5be' vm.Vm::(setDownStatus) vmId=`56d1c657-dd76-4609-a207-c050699be5be`::Changed state to Down: Migration succeeded (code=4) Later, when Engine sends a Destroy command as part of the normal flow, in releaseVm we will find a supposedly valid _dom handle (being not None), and then the domain destruction will be triggred. But being actually the handle stale, the call and will fail for missing domain; then in turn the whole destroy() API call will fail. It is worth to be noted that 09c2f40ebc318bbd188696fd7d02d2dc73d03256 only made evident that in this case VDSM was out of sync with libvirt. This patch address this problem by adding a known benign case on which the gracful destruction can fail. Change-Id: I42bf1fd78e439988e0cc60258a51fa2bc447e0f1 Signed-off-by: Francesco Romani <[email protected]> Reviewed-on: http://gerrit.ovirt.org/28952 Reviewed-by: Dan Kenigsberg <[email protected]> --- M vdsm/virt/vm.py 1 file changed, 11 insertions(+), 5 deletions(-) Approvals: Dan Kenigsberg: Looks good to me, approved Francesco Romani: Verified -- To view, visit http://gerrit.ovirt.org/28952 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: merged Gerrit-Change-Id: I42bf1fd78e439988e0cc60258a51fa2bc447e0f1 Gerrit-PatchSet: 2 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Francesco Romani <[email protected]> Gerrit-Reviewer: Dan Kenigsberg <[email protected]> Gerrit-Reviewer: Francesco Romani <[email protected]> Gerrit-Reviewer: Michal Skrivanek <[email protected]> Gerrit-Reviewer: Vinzenz Feenstra <[email protected]> Gerrit-Reviewer: [email protected] Gerrit-Reviewer: oVirt Jenkins CI Server _______________________________________________ vdsm-patches mailing list [email protected] https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches
