I've recently discovered some issues with the VirtualDomain RA if
libvitd isn't running - the script makes several calls to virsh, all of
which will fail if the daemon isn't listening.  This can be a common
occurrence during probes, depending on how/when libvirtd is started in
pacemaker.

I've tried to code round this with the following patches, though I'd
appreciate some comments on whether or not this is 'the right way to do it.'

There are 3 major changes:

The first one is to try to avoid running virsh to determine a default
for the 'hypervisor' param if its already set - though this just moves
the problem. (This might also not be the 'best' way of implementing this
if test, but I didn't want to start fiddling with the code too much).

The second is to expect failures from virsh for probes in the Status
function and handle them cleanly.

The third is to just assume here that all probes will fail, rather than
test for certain cases in which probes are known to possibly fail - my
logic is that if its safe to fail for one situation, its probably safe
to just always fail.

Comments, changes etc appreciated.

Matthew


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
--- VirtualDomain.orig	2011-11-18 13:07:58.000000000 +0000
+++ VirtualDomain	2011-11-22 17:00:34.000000000 +0000
@@ -18,7 +18,10 @@
 
 # Defaults
 OCF_RESKEY_force_stop_default=0
-OCF_RESKEY_hypervisor_default="$(virsh --quiet uri)"
+#Running virsh can fail during probes, so try to avoid if possible
+if [ -z "$OCF_RESKEY_hypervisor" ]; then
+    OCF_RESKEY_hypervisor_default=$(virsh --quiet uri)
+fi
 
 : ${OCF_RESKEY_force_stop=${OCF_RESKEY_force_stop_default}}
 : ${OCF_RESKEY_hypervisor=${OCF_RESKEY_hypervisor_default}}
@@ -212,7 +215,9 @@
 		fi
 		;;
             *)
-		# any other output is unexpected.
+		# a probe may fail if libvirtd isn't running
+		ocf_is_probe && return $OCF_NOT_RUNNING;
+                # any other output is unexpected.
                 ocf_log error "Virtual domain $DOMAIN_NAME has unknown status \"$status\"!"
                 ;;
         esac
@@ -386,7 +391,7 @@
     # than $OCF_SUCCESS, something is definitely wrong.
     VirtualDomain_Status
     rc=$?
-    if [ ${rc} -eq ${OCF_SUCCESS} ]; then
+    if [ ${rc} -eq ${OCF_SUCCESS} ] || [ ${rc} -eq ${OCF_NOT_RUNNING} ]; then
 	# OK, the generic status check turned out fine.  Now, if we
 	# have monitor scripts defined, run them one after another.
 	for script in ${OCF_RESKEY_monitor_scripts}; do
@@ -447,14 +452,12 @@
 # Everything except usage and meta-data must pass the validate test
 VirtualDomain_Validate_All || exit $?
 
-# During a probe, it is permissible for the config file to not be
-# readable (it might be on shared storage not available during the
-# probe). In that case, VirtualDomain_Define can't work and we're
-# unable to get the domain name. Thus, we also can't check whether the
-# domain is running. The only thing we can do here is to assume that
-# it is not running.
+# There are many reasons why the following code might fail during a probe
+# (config on shared storage, libvirtd not running, etc).
+# Therefore the only safe thing to do is assume the domain isn't running.
+ocf_is_probe && exit $OCF_NOT_RUNNING
+
 if [ ! -r $OCF_RESKEY_config ]; then
-    ocf_is_probe && exit $OCF_NOT_RUNNING
     [ "$__OCF_ACTION" = "stop" ] && exit $OCF_SUCCESS
 fi
 

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to