[ClusterLabs] Unable to add 'NodeX' to cluster: node is already in a cluster

Scott Greenlese Thu, 29 Jun 2017 08:36:17 -0700

Hi all...

When I try to add a previously removed cluster node back into my pacemaker
cluster, I get the following error:


[root@zs93kl]# pcs cluster node add zs95KLpcs1,zs95KLpcs2
Error: Unable to add 'zs95KLpcs1' to cluster: node is already in a cluster

The node I am adding was recently removed from the cluster, but apparently
the removal
was incomplete.

I am looking for some help to thoroughly remove zs95KLpcs1 from this (or
any other)
cluster that this host may be a part of.


Background:

I had removed node ( zs95KLpcs1) from my 3 node, single ring protocol
pacemaker cluster while that node
(which happens to be a KVM on System Z Linux host),  was deactivated / shut
down due to
relentless, unsolicited STONITH events.  My thought was that there was some
issue with the ring0
interface (on vlan1293) causing the cluster to initiate fence (power off)
actions, just minutes after
joining the cluster.   That's why I went ahead and deactivated that node.

The first procedure I used to remove zs95KLpcs1 was flawed, because I
forgot that there's an issue with
attempting to remove an unreachable cluster node on the older pacemaker
code:

[root@zs95kj ]# date;pcs cluster node remove zs95KLpcs1
Tue Jun 27 18:28:23 EDT 2017
Error: pcsd is not running on zs95KLpcs1

I then followed this procedure (courtesy of Tomas and Ken in this user
group):

1. run 'pcs cluster localnode remove <nodename>' on all remaining nodes
2. run 'pcs cluster reload corosync' on one node
3. run 'crm_node -R <nodename> --force' on one node

My execution:

I made the mistake of manually removing the target node (zs95KLpcs1) stanza
from corosync.conf file before
executing the above procedure:

[root@zs95kj ]# vi /etc/corosync/corosync.conf

Removed this stanza:

    node {
        ring0_addr: zs95KLpcs1
        nodeid: 3
    }

I then followed the recommended steps ...

[root@zs95kj ]# pcs cluster localnode remove zs95KLpcs1
Error: unable to remove zs95KLpcs1    ###  I assume this was because I
manually removed the stanza (above)

[root@zs93kl ]# pcs cluster localnode remove zs95KLpcs1
zs95KLpcs1: successfully removed!
[root@zs93kl ]#

[root@zs95kj ]# pcs cluster reload corosync
Corosync reloaded
[root@zs95kj ]#

[root@zs95kj ]# crm_node -R zs95KLpcs1 --force
[root@zs95kj ]#


[root@zs95kj ]# pcs status |less
Cluster name: test_cluster_2
Last updated: Tue Jun 27 18:39:14 2017          Last change: Tue Jun 27
18:38:56 2017 by root via crm_node on zs95kjpcs1
Stack: corosync
Current DC: zs93KLpcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) - partition
with quorum
45 nodes and 227 resources configured

Online: [ zs93KLpcs1 zs95kjpcs1 ]


This seemed to work well, at least I'm showing only the two cluster nodes.

Later on, once I was able to activate zs95KLpcs1 (former cluster
member) ... I did what I thought
I should do to tell that node that it's no longer a member of the cluster:

[root@zs95kj ]# cat neuter.sh
ssh root@zs95KL "/usr/sbin/pcs cluster localnode remove zs95KLpcs1"
ssh root@zs95KL "/usr/sbin/pcs cluster reload corosync"
ssh root@zs95KL "/usr/sbin/crm_node -R zs95KLpcs1 --force"

[root@zs95kj ]# ./neuter.sh
zs95KLpcs1: successfully removed!
Corosync reloaded
[root@zs95kj ]#


Next, I followed a procedure to convert my current 2-node, single ring
cluster to RRP ... which seems to be running
well, and the corosync config looks like this:

[root@zs93kl ]# for host in  zs95kjpcs1 zs93KLpcs1     ; do ssh $host
"hostname;corosync-cfgtool -s"; done
zs95kj
Printing ring status.
Local node ID 2
RING ID 0
        id      = 10.20.93.12
        status  = ring 0 active with no faults
RING ID 1
        id      = 10.20.94.212
        status  = ring 1 active with no faults

zs93kl
Printing ring status.
Local node ID 5
RING ID 0
        id      = 10.20.93.13
        status  = ring 0 active with no faults
RING ID 1
        id      = 10.20.94.213
        status  = ring 1 active with no faults
[root@zs93kl ]#


So now, when I try to add zs95KLpcs1 (and the second ring interface,
zs95KLpcs2) to the RRP config,
I get the error:

[root@zs93kl]# pcs cluster node add zs95KLpcs1,zs95KLpcs2
Error: Unable to add 'zs95KLpcs1' to cluster: node is already in a cluster


I re-ran the node removal procedures, and also
deleted /etc/corosync/corosync.conf
on the target node zs95KLpcs1, and nothing I've tried resolves my problem.

I checked to see if zs95KLpcs1 exists in any "corosync.conf" file on the 3
nodes, and it does not.

[root@zs95kj corosync]# grep zs95KLpcs1 *
[root@zs95kj corosync]#

[root@zs93kl corosync]# grep zs95KLpcs1 *
[root@zs95kj corosync]#

[root@zs95KL corosync]# grep zs95KLpcs1 *
[root@zs95kj corosync]#

Thanks in advance ..

Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie,
N.Y.
  INTERNET:  swgre...@us.ibm.com

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Unable to add 'NodeX' to cluster: node is already in a cluster

Reply via email to