Hello Adam Litke, Allon Mureinik,

I'd like you to do a code review.  Please visit

    https://gerrit.ovirt.org/46331

to review the following change.

Change subject: safelease: Unbreak safelease on systemd
......................................................................

safelease: Unbreak safelease on systemd

spmprotect.sh depends on Vdsm pid file for fencing Vdsm when the SPM
lease cannot be renewed. However, on systemd, we do not create Vdsm pid
file.  When spmprotect tries to fence Vdsm it fails and reboots the
host, killing all running vms.  This issue effects only v1 storage
domains, typically old systems using dc compatibility version 3.0.

Now we pass vdsm pid to spmprotect.sh helper script via command line
argument, restoring safelease operation.

Adding another argument to a script with 9 arguments is ugly, but I
don't want to make risky changes to this delicate and critical code.

Change-Id: I230b6909781269531eab3d71b516b28ab22de856
Bug-Url: https://bugzilla.redhat.com/1222564
Signed-off-by: Nir Soffer <[email protected]>
Reviewed-on: https://gerrit.ovirt.org/45963
Continuous-Integration: Jenkins CI
Reviewed-by: Allon Mureinik <[email protected]>
Reviewed-by: Adam Litke <[email protected]>
---
M vdsm/storage/clusterlock.py
M vdsm/storage/protect/spmprotect.sh.in
2 files changed, 11 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/31/46331/1

diff --git a/vdsm/storage/clusterlock.py b/vdsm/storage/clusterlock.py
index 0eba4e6..2a2effc 100644
--- a/vdsm/storage/clusterlock.py
+++ b/vdsm/storage/clusterlock.py
@@ -128,7 +128,8 @@
             acquireLockCommand = subprocess.list2cmdline([
                 lockUtil, "start", self._sdUUID, str(hostID),
                 str(self._lockRenewalIntervalSec), str(self._leasesPath),
-                str(leaseTimeMs), str(ioOpTimeoutMs), str(self._leaseFailRetry)
+                str(leaseTimeMs), str(ioOpTimeoutMs),
+                str(self._leaseFailRetry), str(os.getpid())
             ])
 
             cmd = [constants.EXT_SU, misc.IOUSER, '-s', constants.EXT_SH, '-c',
diff --git a/vdsm/storage/protect/spmprotect.sh.in 
b/vdsm/storage/protect/spmprotect.sh.in
index 101cac0..e7bce80 100755
--- a/vdsm/storage/protect/spmprotect.sh.in
+++ b/vdsm/storage/protect/spmprotect.sh.in
@@ -31,8 +31,6 @@
 CHECKVDSM=${CHECKVDSM:-"/usr/bin/pgrep vdsm"}
 REBOOTCMD=${REBOOTCMD:-"sudo /sbin/reboot -f"}
 RENEWDIR="/var/run/vdsm/spmprotect/$$"
-VDSM_PIDFILE="/var/run/vdsm/vdsmd.pid"
-VDSM_PID=`/bin/cat "$VDSM_PIDFILE"`
 
 function usage() {
     if [ -n "$1" ]; then
@@ -41,7 +39,7 @@
     trap EXIT
     echo "usage: $0 COMMAND PARAMETERS"
     echo "Commands:"
-    echo "  start { sdUUID hostId renewal_interval_sec lease_path[:offset] 
lease_time_ms io_op_timeout_ms fail_retries }"
+    echo "  start { sdUUID hostId renewal_interval_sec lease_path[:offset] 
lease_time_ms io_op_timeout_ms fail_retries vdsm_pid [debug] }"
     echo "Parameters:"
     echo "  sdUUID -                domain uuid"
     echo "  hostId -                host id in pool"
@@ -51,6 +49,8 @@
     echo "  lease_time_ms -         time limit within which lease must be 
renewed (at least 2*renewal_interval_sec)"
     echo "  io_op_timeout_ms -      I/O operation timeout"
     echo "  fail_retries -          Maximal number of attempts to retry to 
renew the lease before fencing (<= lease_time_ms/renewal_interval_sec)"
+    echo "  vdsm_pid -              Vdsm process ID"
+    echo "  debug -                 enable debug mode (optional)"
     exit 1
 }
 
@@ -119,12 +119,14 @@
     LEASE_TIME_MS="$4"
     IO_OP_TIMEOUT_MS="$5"
     LAST_RENEWAL="$6"
+    VDSM_PID="$7"
 
     # Make sure params are integers
     [ "$RENEWAL_INTERVAL" -eq "$RENEWAL_INTERVAL" 2>/dev/null ] || usage 
"error - Renewal interval not an integer"
     [ "$LEASE_TIME_MS" -eq "$LEASE_TIME_MS" 2>/dev/null ] || usage "error - 
Lease time not an integer"
     [ "$LEASE_TIME_MS" -ge $((RENEWAL_INTERVAL*2)) ] || usage "error - Lease 
time too small"
     [ "$IO_OP_TIMEOUT_MS" -eq "$IO_OP_TIMEOUT_MS" 2>/dev/null ] || usage 
"error - IO op timeout not an integer"
+    [ "$VDSM_PID" -eq "$VDSM_PID" 2>/dev/null ] || usage "error - Vdsm PID is 
not a process ID"
 }
 
 function renew() {
@@ -195,12 +197,12 @@
 
 ####################################################### Main 
###################################################
 
-if [ "$#" -lt 8 ]; then
+if [ "$#" -lt 9 ]; then
     usage "error - wrong number of arguments"
 fi
 
-validate_args $3 $4 $5 $6 $7 $8
-DEBUG="$9"
+validate_args $3 $4 $5 $6 $7 $8 $9
+DEBUG="${10}"
 dbg=""
 if [ "$DEBUG" -eq "$DEBUG" 2>/dev/null ]; then
     dbg="-d"
@@ -221,7 +223,7 @@
     trap release USR1
 
     exec 0>&- && exec 1>&- && exec 2>&- # Close stdin, stdout and stderr
-    $SETSID $0 renew $sdUUID $ID $RENEWAL_INTERVAL $LEASE_FILE $LEASE_TIME_MS 
$IO_OP_TIMEOUT_MS $LAST_RENEWAL $DEBUG >> $LOGFILE 2>&1 &
+    $SETSID $0 renew $sdUUID $ID $RENEWAL_INTERVAL $LEASE_FILE $LEASE_TIME_MS 
$IO_OP_TIMEOUT_MS $LAST_RENEWAL $VDSM_PID $DEBUG >> $LOGFILE 2>&1 &
     trap EXIT
     exit 0
     ;;


-- 
To view, visit https://gerrit.ovirt.org/46331
To unsubscribe, visit https://gerrit.ovirt.org/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I230b6909781269531eab3d71b516b28ab22de856
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: ovirt-3.6
Gerrit-Owner: Nir Soffer <[email protected]>
Gerrit-Reviewer: Adam Litke <[email protected]>
Gerrit-Reviewer: Allon Mureinik <[email protected]>
_______________________________________________
vdsm-patches mailing list
[email protected]
https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches

Reply via email to