I've recently discovered some issues with the VirtualDomain RA if libvitd isn't running - the script makes several calls to virsh, all of which will fail if the daemon isn't listening. This can be a common occurrence during probes, depending on how/when libvirtd is started in pacemaker.
I've tried to code round this with the following patches, though I'd appreciate some comments on whether or not this is 'the right way to do it.' There are 3 major changes: The first one is to try to avoid running virsh to determine a default for the 'hypervisor' param if its already set - though this just moves the problem. (This might also not be the 'best' way of implementing this if test, but I didn't want to start fiddling with the code too much). The second is to expect failures from virsh for probes in the Status function and handle them cleanly. The third is to just assume here that all probes will fail, rather than test for certain cases in which probes are known to possibly fail - my logic is that if its safe to fail for one situation, its probably safe to just always fail. Comments, changes etc appreciated. Matthew -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
--- VirtualDomain.orig 2011-11-18 13:07:58.000000000 +0000 +++ VirtualDomain 2011-11-22 17:00:34.000000000 +0000 @@ -18,7 +18,10 @@ # Defaults OCF_RESKEY_force_stop_default=0 -OCF_RESKEY_hypervisor_default="$(virsh --quiet uri)" +#Running virsh can fail during probes, so try to avoid if possible +if [ -z "$OCF_RESKEY_hypervisor" ]; then + OCF_RESKEY_hypervisor_default=$(virsh --quiet uri) +fi : ${OCF_RESKEY_force_stop=${OCF_RESKEY_force_stop_default}} : ${OCF_RESKEY_hypervisor=${OCF_RESKEY_hypervisor_default}} @@ -212,7 +215,9 @@ fi ;; *) - # any other output is unexpected. + # a probe may fail if libvirtd isn't running + ocf_is_probe && return $OCF_NOT_RUNNING; + # any other output is unexpected. ocf_log error "Virtual domain $DOMAIN_NAME has unknown status \"$status\"!" ;; esac @@ -386,7 +391,7 @@ # than $OCF_SUCCESS, something is definitely wrong. VirtualDomain_Status rc=$? - if [ ${rc} -eq ${OCF_SUCCESS} ]; then + if [ ${rc} -eq ${OCF_SUCCESS} ] || [ ${rc} -eq ${OCF_NOT_RUNNING} ]; then # OK, the generic status check turned out fine. Now, if we # have monitor scripts defined, run them one after another. for script in ${OCF_RESKEY_monitor_scripts}; do @@ -447,14 +452,12 @@ # Everything except usage and meta-data must pass the validate test VirtualDomain_Validate_All || exit $? -# During a probe, it is permissible for the config file to not be -# readable (it might be on shared storage not available during the -# probe). In that case, VirtualDomain_Define can't work and we're -# unable to get the domain name. Thus, we also can't check whether the -# domain is running. The only thing we can do here is to assume that -# it is not running. +# There are many reasons why the following code might fail during a probe +# (config on shared storage, libvirtd not running, etc). +# Therefore the only safe thing to do is assume the domain isn't running. +ocf_is_probe && exit $OCF_NOT_RUNNING + if [ ! -r $OCF_RESKEY_config ]; then - ocf_is_probe && exit $OCF_NOT_RUNNING [ "$__OCF_ACTION" = "stop" ] && exit $OCF_SUCCESS fi
signature.asc
Description: OpenPGP digital signature
_______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/