Hello, I've configured a drbd/pacemaker cluster with 2 nodes and I'm doing some tests for failover. Basically my cluster is quite simple: I have 2 drbd resources configured in pacemaker: [root@pcmk2 ~]# pcs resource show DrbdRes Resource: DrbdRes (class=ocf provider=linbit type=drbd) Attributes: drbd_resource=myres Operations: demote interval=0s timeout=90 (DrbdRes-demote-interval-0s) monitor interval=29s role=Master (DrbdRes-monitor-interval-29s) monitor interval=31s role=Slave (DrbdRes-monitor-interval-31s) promote interval=0s timeout=90 (DrbdRes-promote-interval-0s) start interval=0s timeout=240 (DrbdRes-start-interval-0s) stop interval=0s timeout=100 (DrbdRes-stop-interval-0s) [root@pcmk2 ~]# pcs resource show DrbdResClone Master: DrbdResClone Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1 clone-node-max=1 Resource: DrbdRes (class=ocf provider=linbit type=drbd) Attributes: drbd_resource=myres Operations: demote interval=0s timeout=90 (DrbdRes-demote-interval-0s) monitor interval=29s role=Master (DrbdRes-monitor-interval-29s) monitor interval=31s role=Slave (DrbdRes-monitor-interval-31s) promote interval=0s timeout=90 (DrbdRes-promote-interval-0s) start interval=0s timeout=240 (DrbdRes-start-interval-0s) stop interval=0s timeout=100 (DrbdRes-stop-interval-0s) [root@pcmk2 ~]#
Furthermore, in /etc/drbd.d/myres.res I have: disk { fencing resource-only; } handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; } So, I'm testing various cases for stonith / failover and high availability in general: 1) pcs cluster standby / unstandby first on the secondary node then on the primary node 2) stonith_admin --reboot=pcmk[12] 3) Shutdown one vm at a time causing a failover of all resources and a resync after the node returns up 4) nmcli connection down corosync-network 5) nmcli connection down replication-network All tests have been passed except the last one. Please note that I have 2 separated networks on each node: one for corosync and another for drbd replication. When I try to simulate a down on the replication network, I see resources: On the secondary: 0:myres/0 WFConnection Secondary/Unknown UpToDate/DUnknown On the primary: 0:myres/0 StandAlone Primary/Unknown UpToDate/Outdated Is this normal? It seems that I have to manually do some actions to adjust the cluster: drdbadm connect --discard-my-data myres <<--- On the secondary drbdadm connect myres <--- On the primary Is there an automated way to do this when the replication network returns up?? Thank you
_______________________________________________ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user