pcs resource create drbd_iscsivg0 ocf:linbit:drbd drbd_resource=iscsivg0 op monitor interval="29s" role="Master" op monitor interval="31s" role="Slave"
Luke Pascoe *E* l...@osnz.co.nz * P* +64 (9) 296 2961 * M* +64 (27) 426 6649 * W* www.osnz.co.nz 24 Wellington St Papakura Auckland, 2110 New Zealand On 18 September 2015 at 12:02, Jason Gress <jgr...@accertify.com> wrote: > That may very well be it. Would you be so kind as to show me the pcs > command to create that config? I generated my configuration with these > commands, and I'm not sure how to get the additional monitor options in > there: > > pcs resource create drbd_vmfs ocf:linbit:drbd drbd_resource=vmfs op > monitor interval=30s > pcs resource master ms_drbd_vmfs drbd_vmfs master-max=1 master-node-max=1 > clone-max=2 clone-node-max=1 notify=true > > Thank you very much for your help, and sorry for the newbie question! > > Jason > > From: Luke Pascoe <l...@osnz.co.nz> > Reply-To: Cluster Labs - All topics related to open-source clustering > welcomed <users@clusterlabs.org> > Date: Thursday, September 17, 2015 at 6:54 PM > > To: Cluster Labs - All topics related to open-source clustering welcomed < > users@clusterlabs.org> > Subject: Re: [ClusterLabs] Pacemaker/pcs & DRBD not demoting secondary > node to Slave (always Stopped) > > The only difference in the DRBD resource between yours and mine that I can > see is the monitoring parameters (mine works nicely, but is Centos 6). > Here's mine: > > Master: ms_drbd_iscsicg0 > Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 > notify=true > Resource: drbd_iscsivg0 (class=ocf provider=linbit type=drbd) > Attributes: drbd_resource=iscsivg0 > Operations: start interval=0s timeout=240 > (drbd_iscsivg0-start-timeout-240) > promote interval=0s timeout=90 > (drbd_iscsivg0-promote-timeout-90) > demote interval=0s timeout=90 > (drbd_iscsivg0-demote-timeout-90) > stop interval=0s timeout=100 > (drbd_iscsivg0-stop-timeout-100) > monitor interval=29s role=Master > (drbd_iscsivg0-monitor-interval-29s-role-Master) > monitor interval=31s role=Slave > (drbd_iscsivg0-monitor-interval-31s-role-Slave) > > What mechanism are you using to fail over? Check your constraints after > you do it and make sure it hasn't added one which stops the slave clone > from starting on the "failed" node. > > > Luke Pascoe > > > > *E* l...@osnz.co.nz > *P* +64 (9) 296 2961 > *M* +64 (27) 426 6649 > *W* www.osnz.co.nz > > 24 Wellington St > Papakura > Auckland, 2110 > New Zealand > > On 18 September 2015 at 11:40, Jason Gress <jgr...@accertify.com> wrote: > >> Looking more closely, according to page 64 ( >> http://clusterlabs.org/doc/Cluster_from_Scratch.pdf) it does indeed >> appear that 1 is the correct number. (I just realized that it's page 64 of >> the "book", but page 76 of the pdf.) >> >> Thank you again, >> >> Jason >> >> From: Jason Gress <jgr...@accertify.com> >> Reply-To: Cluster Labs - All topics related to open-source clustering >> welcomed <users@clusterlabs.org> >> Date: Thursday, September 17, 2015 at 6:36 PM >> To: Cluster Labs - All topics related to open-source clustering welcomed >> <users@clusterlabs.org> >> Subject: Re: [ClusterLabs] Pacemaker/pcs & DRBD not demoting secondary >> node to Slave (always Stopped) >> >> I can't say whether or not you are right or wrong (you may be right!) but >> I followed the Cluster From Scratch tutorial closely, and it only had a >> clone-node-max=1 there. (Page 106 of the pdf, for the curious.) >> >> Thanks, >> >> Jason >> >> From: Luke Pascoe <l...@osnz.co.nz> >> Reply-To: Cluster Labs - All topics related to open-source clustering >> welcomed <users@clusterlabs.org> >> Date: Thursday, September 17, 2015 at 6:29 PM >> To: Cluster Labs - All topics related to open-source clustering welcomed >> <users@clusterlabs.org> >> Subject: Re: [ClusterLabs] Pacemaker/pcs & DRBD not demoting secondary >> node to Slave (always Stopped) >> >> I may be wrong, but shouldn't "clone-node-max" be 2 on the ms_drbd_vmfs >> resource? >> >> Luke Pascoe >> >> >> >> *E* l...@osnz.co.nz >> *P* +64 (9) 296 2961 >> *M* +64 (27) 426 6649 >> *W* www.osnz.co.nz >> >> 24 Wellington St >> Papakura >> Auckland, 2110 >> New Zealand >> >> On 18 September 2015 at 11:02, Jason Gress <jgr...@accertify.com> wrote: >> >>> I have a simple DRBD + filesystem + NFS configuration that works >>> properly when I manually start/stop DRBD, but will not start the DRBD slave >>> resource properly on failover or recovery. I cannot ever get the >>> Master/Slave set to say anything but 'Stopped'. I am running CentOS 7.1 >>> with the latest packages as of today: >>> >>> [root@fx201-1a log]# rpm -qa | grep -e pcs -e pacemaker -e drbd >>> pacemaker-cluster-libs-1.1.12-22.el7_1.4.x86_64 >>> pacemaker-1.1.12-22.el7_1.4.x86_64 >>> pcs-0.9.137-13.el7_1.4.x86_64 >>> pacemaker-libs-1.1.12-22.el7_1.4.x86_64 >>> drbd84-utils-8.9.3-1.1.el7.elrepo.x86_64 >>> pacemaker-cli-1.1.12-22.el7_1.4.x86_64 >>> kmod-drbd84-8.4.6-1.el7.elrepo.x86_64 >>> >>> Here is my pcs config output: >>> >>> [root@fx201-1a log]# pcs config >>> Cluster Name: fx201-vmcl >>> Corosync Nodes: >>> fx201-1a.ams fx201-1b.ams >>> Pacemaker Nodes: >>> fx201-1a.ams fx201-1b.ams >>> >>> Resources: >>> Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) >>> Attributes: ip=10.XX.XX.XX cidr_netmask=24 >>> Operations: start interval=0s timeout=20s (ClusterIP-start-timeout-20s) >>> stop interval=0s timeout=20s (ClusterIP-stop-timeout-20s) >>> monitor interval=15s (ClusterIP-monitor-interval-15s) >>> Master: ms_drbd_vmfs >>> Meta Attrs: master-max=1 master-node-max=1 clone-max=2 >>> clone-node-max=1 notify=true >>> Resource: drbd_vmfs (class=ocf provider=linbit type=drbd) >>> Attributes: drbd_resource=vmfs >>> Operations: start interval=0s timeout=240 >>> (drbd_vmfs-start-timeout-240) >>> promote interval=0s timeout=90 >>> (drbd_vmfs-promote-timeout-90) >>> demote interval=0s timeout=90 >>> (drbd_vmfs-demote-timeout-90) >>> stop interval=0s timeout=100 (drbd_vmfs-stop-timeout-100) >>> monitor interval=30s (drbd_vmfs-monitor-interval-30s) >>> Resource: vmfsFS (class=ocf provider=heartbeat type=Filesystem) >>> Attributes: device=/dev/drbd0 directory=/exports/vmfs fstype=xfs >>> Operations: start interval=0s timeout=60 (vmfsFS-start-timeout-60) >>> stop interval=0s timeout=60 (vmfsFS-stop-timeout-60) >>> monitor interval=20 timeout=40 (vmfsFS-monitor-interval-20) >>> Resource: nfs-server (class=systemd type=nfs-server) >>> Operations: monitor interval=60s (nfs-server-monitor-interval-60s) >>> >>> Stonith Devices: >>> Fencing Levels: >>> >>> Location Constraints: >>> Ordering Constraints: >>> promote ms_drbd_vmfs then start vmfsFS (kind:Mandatory) >>> (id:order-ms_drbd_vmfs-vmfsFS-mandatory) >>> start vmfsFS then start nfs-server (kind:Mandatory) >>> (id:order-vmfsFS-nfs-server-mandatory) >>> start ClusterIP then start nfs-server (kind:Mandatory) >>> (id:order-ClusterIP-nfs-server-mandatory) >>> Colocation Constraints: >>> ms_drbd_vmfs with ClusterIP (score:INFINITY) >>> (id:colocation-ms_drbd_vmfs-ClusterIP-INFINITY) >>> vmfsFS with ms_drbd_vmfs (score:INFINITY) (with-rsc-role:Master) >>> (id:colocation-vmfsFS-ms_drbd_vmfs-INFINITY) >>> nfs-server with vmfsFS (score:INFINITY) >>> (id:colocation-nfs-server-vmfsFS-INFINITY) >>> >>> Cluster Properties: >>> cluster-infrastructure: corosync >>> cluster-name: fx201-vmcl >>> dc-version: 1.1.13-a14efad >>> have-watchdog: false >>> last-lrm-refresh: 1442528181 >>> stonith-enabled: false >>> >>> And status: >>> >>> [root@fx201-1a log]# pcs status --full >>> Cluster name: fx201-vmcl >>> Last updated: Thu Sep 17 17:55:56 2015 Last change: Thu Sep 17 17:18:10 >>> 2015 by root via crm_attribute on fx201-1b.ams >>> Stack: corosync >>> Current DC: fx201-1b.ams (2) (version 1.1.13-a14efad) - partition with >>> quorum >>> 2 nodes and 5 resources configured >>> >>> Online: [ fx201-1a.ams (1) fx201-1b.ams (2) ] >>> >>> Full list of resources: >>> >>> ClusterIP (ocf::heartbeat:IPaddr2):Started fx201-1a.ams >>> Master/Slave Set: ms_drbd_vmfs [drbd_vmfs] >>> drbd_vmfs (ocf::linbit:drbd):Master fx201-1a.ams >>> drbd_vmfs (ocf::linbit:drbd):Stopped >>> Masters: [ fx201-1a.ams ] >>> Stopped: [ fx201-1b.ams ] >>> vmfsFS (ocf::heartbeat:Filesystem):Started fx201-1a.ams >>> nfs-server (systemd:nfs-server):Started fx201-1a.ams >>> >>> PCSD Status: >>> fx201-1a.ams: Online >>> fx201-1b.ams: Online >>> >>> Daemon Status: >>> corosync: active/enabled >>> pacemaker: active/enabled >>> pcsd: active/enabled >>> >>> If I do a failover, after manually confirming that the DRBD data is >>> synchronized completely, it does work, but then never reconnects the >>> secondary side, and in order to get the resource synchronized again, I have >>> to manually correct it, ad infinitum. I have tried standby/unstandby, pcs >>> resource debug-start (with undesirable results), and so on. >>> >>> Here are some relevant log messages from pacemaker.log: >>> >>> Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net crmd: >>> info: crm_timer_popped:PEngine Recheck Timer (I_PE_CALC) just popped >>> (900000ms) >>> Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net crmd: >>> notice: do_state_transition:State transition S_IDLE -> S_POLICY_ENGINE >>> [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] >>> Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net crmd: >>> info: do_state_transition:Progressed to state S_POLICY_ENGINE after >>> C_TIMER_POPPED >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> process_pe_message:Input has not changed since last time, not saving to >>> disk >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> determine_online_status:Node fx201-1b.ams is online >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> determine_online_status:Node fx201-1a.ams is online >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> determine_op_status:Operation monitor found resource drbd_vmfs:0 active >>> in master mode on fx201-1b.ams >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> determine_op_status:Operation monitor found resource drbd_vmfs:0 active >>> on fx201-1a.ams >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> native_print:ClusterIP(ocf::heartbeat:IPaddr2):Started fx201-1a.ams >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> clone_print:Master/Slave Set: ms_drbd_vmfs [drbd_vmfs] >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> short_print: Masters: [ fx201-1a.ams ] >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> short_print: Stopped: [ fx201-1b.ams ] >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> native_print:vmfsFS(ocf::heartbeat:Filesystem):Started fx201-1a.ams >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> native_print:nfs-server(systemd:nfs-server):Started fx201-1a.ams >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> native_color:Resource drbd_vmfs:1 cannot run anywhere >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> master_color:Promoting drbd_vmfs:0 (Master fx201-1a.ams) >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> master_color:ms_drbd_vmfs: Promoted 1 instances of a possible 1 to >>> master >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> LogActions:Leave ClusterIP(Started fx201-1a.ams) >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> LogActions:Leave drbd_vmfs:0(Master fx201-1a.ams) >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> LogActions:Leave drbd_vmfs:1(Stopped) >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> LogActions:Leave vmfsFS(Started fx201-1a.ams) >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: info: >>> LogActions:Leave nfs-server(Started fx201-1a.ams) >>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net pengine: notice: >>> process_pe_message:Calculated Transition 16: >>> /var/lib/pacemaker/pengine/pe-input-61.bz2 >>> Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net crmd: >>> info: do_state_transition:State transition S_POLICY_ENGINE -> >>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE >>> origin=handle_response ] >>> Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net crmd: >>> info: do_te_invoke:Processing graph 16 (ref=pe_calc-dc-1442530090-97) >>> derived from /var/lib/pacemaker/pengine/pe-input-61.bz2 >>> Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net crmd: >>> notice: run_graph:Transition 16 (Complete=0, Pending=0, Fired=0, >>> Skipped=0, Incomplete=0, >>> Source=/var/lib/pacemaker/pengine/pe-input-61.bz2): Complete >>> Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net crmd: >>> info: do_log:FSA: Input I_TE_SUCCESS from notify_crmd() received in >>> state S_TRANSITION_ENGINE >>> Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net crmd: >>> notice: do_state_transition:State transition S_TRANSITION_ENGINE -> >>> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] >>> >>> Thank you all for your help, >>> >>> Jason >>> >>> "This message and any attachments may contain confidential information. If >>> you >>> have received this message in error, any use or distribution is prohibited. >>> Please notify us by reply e-mail if you have mistakenly received this >>> message, >>> and immediately and permanently delete it and any attachments. Thank you." >>> >>> >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >>> >> "This message and any attachments may contain confidential information. If >> you >> have received this message in error, any use or distribution is prohibited. >> Please notify us by reply e-mail if you have mistakenly received this >> message, >> and immediately and permanently delete it and any attachments. Thank you." >> >> "This message and any attachments may contain confidential information. If >> you >> have received this message in error, any use or distribution is prohibited. >> Please notify us by reply e-mail if you have mistakenly received this >> message, >> and immediately and permanently delete it and any attachments. Thank you." >> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> >> > > "This message and any attachments may contain confidential information. If you > have received this message in error, any use or distribution is prohibited. > Please notify us by reply e-mail if you have mistakenly received this message, > and immediately and permanently delete it and any attachments. Thank you." > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > >
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org