On Tue, 2017-11-28 at 09:36 +0000, 井上 和徳 wrote: > Hi, > > Sometimes a node with 'PCMK_node_start_state=standby' will start up > Online. > > [ reproduction scenario ] > * Set 'PCMK_node_start_state=standby' to /etc/sysconfig/pacemaker. > * Delete cib (/var/lib/pacemaker/cib/*). > * Start pacemaker at the same time on 2 nodes. > # for i in rhel74-1 rhel74-3 ; do ssh -f $i systemctl start > pacemaker ; done > > [ actual result ] > * crm_mon > Stack: corosync > Current DC: rhel74-3 (version 1.1.18-2b07d5c) - partition with > quorum > Last change: Wed Nov 22 06:22:50 2017 by hacluster via crmd on > rhel74-3 > > 2 nodes configured > 0 resources configured > > Node rhel74-3: standby > Online: [ rhel74-1 ] > > * cib.xml > <nodes> > <node id="3232261507" uname="rhel74-1"/> > <node id="3232261509" uname="rhel74-3"> > <instance_attributes id="nodes-3232261509"> > <nvpair id="nodes-3232261509-standby" name="standby" > value="on"/> > </instance_attributes> > </node> > </nodes> > > * pacemaker.log > Nov 22 06:22:50 [20755] rhel74-1 crmd: (cib_native.c:462 ) > warning: cib_native_perform_op_delegate: Call failed: No such > device or address > Nov 22 06:22:50 [20755] rhel74-1 crmd: ( cib_attrs.c:320 > ) info: update_attr_delegate: Update <node > id="3232261507"> > Nov 22 06:22:50 [20755] rhel74-1 crmd: ( cib_attrs.c:320 > ) info: update_attr_delegate: Update <instance_attribut > es id="nodes-3232261507"> > Nov 22 06:22:50 [20755] rhel74-1 crmd: ( cib_attrs.c:320 > ) info: update_attr_delegate: Update <nvpair > id="nodes-3232261507-standby" name="standby" value="on"/> > Nov 22 06:22:50 [20755] rhel74-1 crmd: ( cib_attrs.c:320 > ) info: update_attr_delegate: Update </instance_attribu > tes> > Nov 22 06:22:50 [20755] rhel74-1 crmd: ( cib_attrs.c:320 > ) info: update_attr_delegate: Update </node> > > * I attached crm_report to GitHub (too big to attach to this email), > so look at it. > https://github.com/inouekazu/pcmk_report/blob/master/pcmk-Wed-22-N > ov-2017.tar.bz2 > > > I think that the additional timing of <node id="3232261507">*1 and > <instance_attributes id="nodes-3232261507">*2 is the cause. > *1 <node id="3232261507" uname="rhel74-1"/>' > *2 <instance_attributes id="nodes-3232261507"> > <nvpair id="nodes-3232261507-standby" name="standby" > value="on"/> > > I expect to be fixed, but if it's difficult, I have two questions. > 1) Does this only occur if there is no cib.xml (in other words, there > is no <node> element)?
I believe so. I think this is the key message: Nov 22 06:22:50 [20750] rhel74-1 cib: ( callbacks.c:1101 ) warning: cib_process_request: Completed cib_modify operation for section nodes: No such device or address (rc=-6, origin=rhel74- 1/crmd/12, version=0.3.0) PCMK_node_start_state works by setting the "standby" node attribute in the CIB. However, it does this via a "modify" command that assumes the <nodes> tag already exists. If there is no CIB, pacemaker will quickly create one -- but in this case, the node tries to set the attribute before that's happened. Hopefully we can come up with a fix. If you want, you can file a bug report at bugs.clusterlabs.org, to track the progress. > 2) Is there any workaround other than "Do not start at the same > time"? > > Best Regards Before starting pacemaker, if /var/lib/pacemaker/cib is empty, you can create a skeleton CIB with: cibadmin --empty > /var/lib/pacemaker/cib/cib.xml That will include an empty <nodes/> tag, and the modify command should work when pacemaker starts. -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org