Hello again, Long time I don't show up... I was finishing up details of Ubuntu 20.04 HA packages (with lots of other stuff), so sorry for not being active until now (about to change). During my regression lab preparation, as I spoke in latest HA conf, I'm facing a situation I'd like to have some inputs on if anyone has...
I'm clearing up needed fence_mpath/fence_iscsi setup for all Ubuntu versions: https://bugs.launchpad.net/ubuntu/+source/fence-agents/+bug/1864404 and I just faced this: - 3 x node cluster setup - 3 x nodes share 4 paths to /dev/mapper/volume{00..10} - Using /dev/mapper/volume01 for fencing tests - softdog configured for /dev/watchdog - fence_mpath_check installed in /etc/watchdog.d/ ---- (k)rafaeldtinoco@clusterg01:~$ crm configure show node 1: clusterg01 node 2: clusterg02 node 3: clusterg03 primitive fence-mpath-clusterg01 stonith:fence_mpath \ params pcmk_on_timeout=70 pcmk_off_timeout=70 pcmk_host_list=clusterg01 pcmk_monitor_action=metadata pcmk_reboot_action=off key=59450000 devices="/dev/mapper/volume01" power_wait=65 \ meta provides=unfencing target-role=Started primitive fence-mpath-clusterg02 stonith:fence_mpath \ params pcmk_on_timeout=70 pcmk_off_timeout=70 pcmk_host_list=clusterg02 pcmk_monitor_action=metadata pcmk_reboot_action=off key=59450001 devices="/dev/mapper/volume01" power_wait=65 \ meta provides=unfencing target-role=Started primitive fence-mpath-clusterg03 stonith:fence_mpath \ params pcmk_on_timeout=70 pcmk_off_timeout=70 pcmk_host_list=clusterg03 pcmk_monitor_action=metadata pcmk_reboot_action=off key=59450002 devices="/dev/mapper/volume01" power_wait=65 \ meta provides=unfencing target-role=Started property cib-bootstrap-options: \ have-watchdog=false \ dc-version=2.0.3-4b1f869f0f \ cluster-infrastructure=corosync \ cluster-name=clusterg \ stonith-enabled=true \ no-quorum-policy=stop \ last-lrm-refresh=1590773755 ---- (k)rafaeldtinoco@clusterg03:~$ crm status Cluster Summary: * Stack: corosync * Current DC: clusterg02 (version 2.0.3-4b1f869f0f) - partition with quorum * Last updated: Mon Jun 1 12:55:13 2020 * Last change: Mon Jun 1 04:35:07 2020 by root via cibadmin on clusterg03 * 3 nodes configured * 3 resource instances configured Node List: * Online: [ clusterg01 clusterg02 clusterg03 ] Full List of Resources: * fence-mpath-clusterg01 (stonith:fence_mpath): Started clusterg02 * fence-mpath-clusterg02 (stonith:fence_mpath): Started clusterg03 * fence-mpath-clusterg03 (stonith:fence_mpath): Started clusterg01 ---- (k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -r /dev/mapper/volume01 PR generation=0x2d, Reservation follows: Key = 0x59450001 scope = LU_SCOPE, type = Write Exclusive, registrants only (k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -k /dev/mapper/volume01 PR generation=0x2d, 12 registered reservation keys follow: 0x59450001 0x59450001 0x59450001 0x59450001 0x59450002 0x59450002 0x59450002 0x59450002 0x59450000 0x59450000 0x59450000 0x59450000 ---- You can see that everything looks fine. If I disable the 2 interconnects I have for corosync: (k)rafaeldtinoco@clusterg01:~$ sudo corosync-quorumtool -a Quorum information ------------------ Date: Mon Jun 1 12:56:00 2020 Quorum provider: corosync_votequorum Nodes: 3 Node ID: 1 Ring ID: 1.120 Quorate: Yes Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 3 Quorum: 2 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 1 1 clusterg01, clusterg01bkp (local) 2 1 clusterg02, clusterg02bkp 3 1 clusterg03, clusterg03bkp for node clusterg01 I have it fenced correctly: Pending Fencing Actions: * reboot of clusterg01 pending: client=pacemaker-controld.906, origin=clusterg02 (k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -r /dev/mapper/volume01 PR generation=0x2e, Reservation follows: Key = 0x59450001 scope = LU_SCOPE, type = Write Exclusive, registrants only (k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -k /dev/mapper/volume01 PR generation=0x2e, 8 registered reservation keys follow: 0x59450001 0x59450001 0x59450001 0x59450001 0x59450002 0x59450002 0x59450002 0x59450002 and watchdog reboots it.. but.. turns out that it returns with just 1 reservation key for 1 path (instead of 4). I was wondering if that was because of the async nature of the combination: systemd + open-iscsi + multipath-tools + pacemaker service startup. Check: (k)rafaeldtinoco@clusterg01:~$ uptime 12:58:22 up 0 min, 0 users, load average: 0.31, 0.09, 0.03 (k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -r /dev/mapper/volume01 PR generation=0x2f, Reservation follows: Key = 0x59450001 scope = LU_SCOPE, type = Write Exclusive, registrants only (k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -k /dev/mapper/volume01 PR generation=0x2f, 9 registered reservation keys follow: 0x59450001 0x59450001 0x59450001 0x59450001 0x59450002 0x59450002 0x59450002 0x59450002 0x59450000 After this ^ I have to run: (k)rafaeldtinoco@clusterg01:~$ sudo mpathpersist --out --register --param-rk=0x59450000 /dev/mapper/volume01 persistent reserve out: scsi status: Reservation Conflict PR out: command failed (k)rafaeldtinoco@clusterg01:~$ sudo fence_mpath -v -d /dev/mapper/volume01 -n 59450000 -o on 2020-06-01 12:59:46,388 INFO: Executing: /usr/sbin/mpathpersist -i -k -d /dev/mapper/volume01 To guarantee all reservations are correctly placed again, after the fence was done: (k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -k /dev/mapper/volume01 PR generation=0x33, 12 registered reservation keys follow: 0x59450001 0x59450001 0x59450001 0x59450001 0x59450002 0x59450002 0x59450002 0x59450002 0x59450000 0x59450000 0x59450000 0x59450000 I was wondering if "resource-agents-deps.target" being RequiredBy in [Install] systemd section for open-iscsi.service and multipath-tools.service, together with "Before=resource-agents-deps.target" in [Unit] section, would be enough but in this case I think it is not enough. Any idea why this happens ? Did the agent start with there was a single path available to the disk when iscsi session was being established and multipath-tools had scanned a single path only ? I tend to think that, if this was the case, sometimes I would have 1 path, sometimes 2, etc.. and not a single path with reservations all the time (missing 3 reservations). OR there is something else about the PERSIST RESERVATION I'm missing from SBC-3/4. Any thoughts ?
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/