Re: [ClusterLabs] Peer (slave) node deleting master's transient_attributes

Ken Gaillot Mon, 01 Feb 2021 08:28:10 -0800

On Mon, 2021-02-01 at 11:16 -0500, Stuart Massey wrote:
> Andrei,
> You are right, thank you. I have an earlier thread on which I posted
> a pacemaker.log for this issue, and didn't think to point to it here.
> The link is 
> http://project.ibss.net/samples/deidPacemakerLog.2021-01-25.txtxt .
> So, node01 is in maintenance mode, and constraints prevent any
> resources from running on it (other than drbd in Secondary). I would
> not want node01 to ston[node02]ith after a communications failure,
> especially not if all resources are running fine on node02.
> Also I did not think to wonder if node01 could become DC even though
> in maintenance mode.
> The logs seem to me to match this contention. The cib ops happen
> right in the middle of the DC negotiations.
> Is there a way to tell node01 that it cannot be DC? Like a
> constraint?


No, though that's been suggested as a new feature.

As a workaround, you could restart the cluster on the less preferred
node -- the controller with the most CPU time (i.e. up the longest)
will be preferred for DC (if pacemaker versions are equal).

> Thanks again.
> 
> 
> 
> On Sun, Jan 31, 2021 at 1:55 AM Andrei Borzenkov <arvidj...@gmail.com
> > wrote:
> > 29.01.2021 20:37, Stuart Massey пишет:
> > > Can someone help me with this?
> > > Background:
> > > 
> > > "node01" is failing, and has been placed in "maintenance" mode.
> > It
> > > occasionally loses connectivity.
> > > 
> > > "node02" is able to run our resources
> > > 
> > > Consider the following messages from pacemaker.log on "node02",
> > just after
> > > "node01" has rejoined the cluster (per "node02"):
> > > 
> > > Jan 28 14:48:03 [21933] node02.example.com        cib:     info:
> > > cib_perform_op:       --
> > > /cib/status/node_state[@id='2']/transient_attributes[@id='2']
> > > Jan 28 14:48:03 [21933] node02.example.com        cib:     info:
> > > cib_perform_op:       +  /cib:  @num_updates=309
> > > Jan 28 14:48:03 [21933] node02.example.com        cib:     info:
> > > cib_process_request:  Completed cib_delete operation for section
> > > //node_state[@uname='node02.example.com']/transient_attributes:
> > OK (rc=0,
> > > origin=node01.example.com/crmd/3784, version=0.94.309)
> > > Jan 28 14:48:04 [21938] node02.example.com       crmd:     info:
> > > abort_transition_graph:       Transition aborted by deletion of
> > > transient_attributes[@id='2']: Transient attribute change |
> > cib=0.94.309
> > > source=abort_unless_down:357
> > >
> > path=/cib/status/node_state[@id='2']/transient_attributes[@id='2']
> > > complete=true
> > > Jan 28 14:48:05 [21937] node02.example.com    pengine:     info:
> > > master_color: ms_drbd_ourApp: Promoted 0 instances of a possible
> > 1 to master
> > > 
> > > The implication, it seems to me, is that "node01" has asked
> > "node02" to
> > > delete the transient-attributes for "node02". The transient-
> > attributes
> > > should normally be:
> > >       <transient_attributes id="2">
> > >         <instance_attributes id="status-2">
> > >           <nvpair id="status-2-master-drbd_ourApp"
> > > name="master-drbd_ourApp" value="10000"/>
> > >           <nvpair id="status-2-pingd" name="pingd" value="100"/>
> > >         </instance_attributes>
> > >       </transient_attributes>
> > > 
> > > These attributes are necessary for "node02" to be Master/Primary,
> > correct?
> > > 
> > > Why might this be happening and how do we prevent it?
> > > 
> > 
> > You do not provide enough information to answer. At the very least
> > you
> > need to show full logs from both nodes around time it happens
> > (starting
> > with both nodes losing connectivity).
> > 
> > But as a wild guess - you do not use stonith, node01 becomes DC and
> > clears other node state.
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgail...@redhat.com>

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Peer (slave) node deleting master's transient_attributes

Reply via email to