Hello Adam Litke, Allon Mureinik,
I'd like you to do a code review. Please visit
https://gerrit.ovirt.org/46331
to review the following change.
Change subject: safelease: Unbreak safelease on systemd
......................................................................
safelease: Unbreak safelease on systemd
spmprotect.sh depends on Vdsm pid file for fencing Vdsm when the SPM
lease cannot be renewed. However, on systemd, we do not create Vdsm pid
file. When spmprotect tries to fence Vdsm it fails and reboots the
host, killing all running vms. This issue effects only v1 storage
domains, typically old systems using dc compatibility version 3.0.
Now we pass vdsm pid to spmprotect.sh helper script via command line
argument, restoring safelease operation.
Adding another argument to a script with 9 arguments is ugly, but I
don't want to make risky changes to this delicate and critical code.
Change-Id: I230b6909781269531eab3d71b516b28ab22de856
Bug-Url: https://bugzilla.redhat.com/1222564
Signed-off-by: Nir Soffer <[email protected]>
Reviewed-on: https://gerrit.ovirt.org/45963
Continuous-Integration: Jenkins CI
Reviewed-by: Allon Mureinik <[email protected]>
Reviewed-by: Adam Litke <[email protected]>
---
M vdsm/storage/clusterlock.py
M vdsm/storage/protect/spmprotect.sh.in
2 files changed, 11 insertions(+), 8 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/31/46331/1
diff --git a/vdsm/storage/clusterlock.py b/vdsm/storage/clusterlock.py
index 0eba4e6..2a2effc 100644
--- a/vdsm/storage/clusterlock.py
+++ b/vdsm/storage/clusterlock.py
@@ -128,7 +128,8 @@
acquireLockCommand = subprocess.list2cmdline([
lockUtil, "start", self._sdUUID, str(hostID),
str(self._lockRenewalIntervalSec), str(self._leasesPath),
- str(leaseTimeMs), str(ioOpTimeoutMs), str(self._leaseFailRetry)
+ str(leaseTimeMs), str(ioOpTimeoutMs),
+ str(self._leaseFailRetry), str(os.getpid())
])
cmd = [constants.EXT_SU, misc.IOUSER, '-s', constants.EXT_SH, '-c',
diff --git a/vdsm/storage/protect/spmprotect.sh.in
b/vdsm/storage/protect/spmprotect.sh.in
index 101cac0..e7bce80 100755
--- a/vdsm/storage/protect/spmprotect.sh.in
+++ b/vdsm/storage/protect/spmprotect.sh.in
@@ -31,8 +31,6 @@
CHECKVDSM=${CHECKVDSM:-"/usr/bin/pgrep vdsm"}
REBOOTCMD=${REBOOTCMD:-"sudo /sbin/reboot -f"}
RENEWDIR="/var/run/vdsm/spmprotect/$$"
-VDSM_PIDFILE="/var/run/vdsm/vdsmd.pid"
-VDSM_PID=`/bin/cat "$VDSM_PIDFILE"`
function usage() {
if [ -n "$1" ]; then
@@ -41,7 +39,7 @@
trap EXIT
echo "usage: $0 COMMAND PARAMETERS"
echo "Commands:"
- echo " start { sdUUID hostId renewal_interval_sec lease_path[:offset]
lease_time_ms io_op_timeout_ms fail_retries }"
+ echo " start { sdUUID hostId renewal_interval_sec lease_path[:offset]
lease_time_ms io_op_timeout_ms fail_retries vdsm_pid [debug] }"
echo "Parameters:"
echo " sdUUID - domain uuid"
echo " hostId - host id in pool"
@@ -51,6 +49,8 @@
echo " lease_time_ms - time limit within which lease must be
renewed (at least 2*renewal_interval_sec)"
echo " io_op_timeout_ms - I/O operation timeout"
echo " fail_retries - Maximal number of attempts to retry to
renew the lease before fencing (<= lease_time_ms/renewal_interval_sec)"
+ echo " vdsm_pid - Vdsm process ID"
+ echo " debug - enable debug mode (optional)"
exit 1
}
@@ -119,12 +119,14 @@
LEASE_TIME_MS="$4"
IO_OP_TIMEOUT_MS="$5"
LAST_RENEWAL="$6"
+ VDSM_PID="$7"
# Make sure params are integers
[ "$RENEWAL_INTERVAL" -eq "$RENEWAL_INTERVAL" 2>/dev/null ] || usage
"error - Renewal interval not an integer"
[ "$LEASE_TIME_MS" -eq "$LEASE_TIME_MS" 2>/dev/null ] || usage "error -
Lease time not an integer"
[ "$LEASE_TIME_MS" -ge $((RENEWAL_INTERVAL*2)) ] || usage "error - Lease
time too small"
[ "$IO_OP_TIMEOUT_MS" -eq "$IO_OP_TIMEOUT_MS" 2>/dev/null ] || usage
"error - IO op timeout not an integer"
+ [ "$VDSM_PID" -eq "$VDSM_PID" 2>/dev/null ] || usage "error - Vdsm PID is
not a process ID"
}
function renew() {
@@ -195,12 +197,12 @@
####################################################### Main
###################################################
-if [ "$#" -lt 8 ]; then
+if [ "$#" -lt 9 ]; then
usage "error - wrong number of arguments"
fi
-validate_args $3 $4 $5 $6 $7 $8
-DEBUG="$9"
+validate_args $3 $4 $5 $6 $7 $8 $9
+DEBUG="${10}"
dbg=""
if [ "$DEBUG" -eq "$DEBUG" 2>/dev/null ]; then
dbg="-d"
@@ -221,7 +223,7 @@
trap release USR1
exec 0>&- && exec 1>&- && exec 2>&- # Close stdin, stdout and stderr
- $SETSID $0 renew $sdUUID $ID $RENEWAL_INTERVAL $LEASE_FILE $LEASE_TIME_MS
$IO_OP_TIMEOUT_MS $LAST_RENEWAL $DEBUG >> $LOGFILE 2>&1 &
+ $SETSID $0 renew $sdUUID $ID $RENEWAL_INTERVAL $LEASE_FILE $LEASE_TIME_MS
$IO_OP_TIMEOUT_MS $LAST_RENEWAL $VDSM_PID $DEBUG >> $LOGFILE 2>&1 &
trap EXIT
exit 0
;;
--
To view, visit https://gerrit.ovirt.org/46331
To unsubscribe, visit https://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I230b6909781269531eab3d71b516b28ab22de856
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: ovirt-3.6
Gerrit-Owner: Nir Soffer <[email protected]>
Gerrit-Reviewer: Adam Litke <[email protected]>
Gerrit-Reviewer: Allon Mureinik <[email protected]>
_______________________________________________
vdsm-patches mailing list
[email protected]
https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches