On Mon, 2021-02-01 at 11:16 -0500, Stuart Massey wrote: > Andrei, > You are right, thank you. I have an earlier thread on which I posted > a pacemaker.log for this issue, and didn't think to point to it here. > The link is > http://project.ibss.net/samples/deidPacemakerLog.2021-01-25.txtxt . > So, node01 is in maintenance mode, and constraints prevent any > resources from running on it (other than drbd in Secondary). I would > not want node01 to ston[node02]ith after a communications failure, > especially not if all resources are running fine on node02. > Also I did not think to wonder if node01 could become DC even though > in maintenance mode. > The logs seem to me to match this contention. The cib ops happen > right in the middle of the DC negotiations. > Is there a way to tell node01 that it cannot be DC? Like a > constraint?
No, though that's been suggested as a new feature. As a workaround, you could restart the cluster on the less preferred node -- the controller with the most CPU time (i.e. up the longest) will be preferred for DC (if pacemaker versions are equal). > Thanks again. > > > > On Sun, Jan 31, 2021 at 1:55 AM Andrei Borzenkov <arvidj...@gmail.com > > wrote: > > 29.01.2021 20:37, Stuart Massey пишет: > > > Can someone help me with this? > > > Background: > > > > > > "node01" is failing, and has been placed in "maintenance" mode. > > It > > > occasionally loses connectivity. > > > > > > "node02" is able to run our resources > > > > > > Consider the following messages from pacemaker.log on "node02", > > just after > > > "node01" has rejoined the cluster (per "node02"): > > > > > > Jan 28 14:48:03 [21933] node02.example.com cib: info: > > > cib_perform_op: -- > > > /cib/status/node_state[@id='2']/transient_attributes[@id='2'] > > > Jan 28 14:48:03 [21933] node02.example.com cib: info: > > > cib_perform_op: + /cib: @num_updates=309 > > > Jan 28 14:48:03 [21933] node02.example.com cib: info: > > > cib_process_request: Completed cib_delete operation for section > > > //node_state[@uname='node02.example.com']/transient_attributes: > > OK (rc=0, > > > origin=node01.example.com/crmd/3784, version=0.94.309) > > > Jan 28 14:48:04 [21938] node02.example.com crmd: info: > > > abort_transition_graph: Transition aborted by deletion of > > > transient_attributes[@id='2']: Transient attribute change | > > cib=0.94.309 > > > source=abort_unless_down:357 > > > > > path=/cib/status/node_state[@id='2']/transient_attributes[@id='2'] > > > complete=true > > > Jan 28 14:48:05 [21937] node02.example.com pengine: info: > > > master_color: ms_drbd_ourApp: Promoted 0 instances of a possible > > 1 to master > > > > > > The implication, it seems to me, is that "node01" has asked > > "node02" to > > > delete the transient-attributes for "node02". The transient- > > attributes > > > should normally be: > > > <transient_attributes id="2"> > > > <instance_attributes id="status-2"> > > > <nvpair id="status-2-master-drbd_ourApp" > > > name="master-drbd_ourApp" value="10000"/> > > > <nvpair id="status-2-pingd" name="pingd" value="100"/> > > > </instance_attributes> > > > </transient_attributes> > > > > > > These attributes are necessary for "node02" to be Master/Primary, > > correct? > > > > > > Why might this be happening and how do we prevent it? > > > > > > > You do not provide enough information to answer. At the very least > > you > > need to show full logs from both nodes around time it happens > > (starting > > with both nodes losing connectivity). > > > > But as a wild guess - you do not use stonith, node01 becomes DC and > > clears other node state. > > _______________________________________________ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/