My thanks to both Ken Gaillot and Tomas Jelinek for the workaround. The procedure(s) worked like a champ.
I just have a few side comments / observations ... First - Tomas, in the bugzilla you show this error message on your cluster remove command, directing you to use the --force option: [root@rh72-node1:~]# pcs cluster node remove rh72-node3 Error: pcsd is not running on rh72-node3, use --force to override When I issue the cluster remove, I do not get and reference to the --force option in the error message: [root@zs93kl ]# pcs cluster node remove zs95KLpcs1 Error: pcsd is not running on zs95KLpcs1 [root@zs93kl ]# The man page doesn't mention --force at my level. Is this a feature added after pcs-0.9.143-15.el7_2.ibm.2.s390x ? Also, in your workaround procedure, you have me do: 'pcs cluster localnode remove <name_of_node_to_be_removed> '. However, wondering why the 'localnode' option is not in the pcs man page for the pcs cluster command? The command / option worked great, just curious why it's not documented ... [root@zs93kl # pcs cluster localnode remove zs93kjpcs1 zs93kjpcs1: successfully removed! My man page level: [root@zs93kl VD]# rpm -q --whatprovides /usr/share/man/man8/pcs.8.gz pcs-0.9.143-15.el7_2.ibm.2.s390x [root@zs93kl VD]# Thanks again, Scott G. Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie, N.Y. INTERNET: swgre...@us.ibm.com From: Tomas Jelinek <tojel...@redhat.com> To: users@clusterlabs.org Date: 04/18/2017 09:04 AM Subject: Re: [ClusterLabs] How to force remove a cluster node? Dne 17.4.2017 v 17:28 Ken Gaillot napsal(a): > On 04/13/2017 01:11 PM, Scott Greenlese wrote: >> Hi, >> >> I need to remove some nodes from my existing pacemaker cluster which are >> currently unbootable / unreachable. >> >> Referenced >> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusternodemanage-HAAR.html#s2-noderemove-HAAR >> >> *4.4.4. Removing Cluster Nodes* >> The following command shuts down the specified node and removes it from >> the cluster configuration file, corosync.conf, on all of the other nodes >> in the cluster. For information on removing all information about the >> cluster from the cluster nodes entirely, thereby destroying the cluster >> permanently, refer to _Section 4.6, “Removing the Cluster >> Configuration”_ >> < https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusterremove-HAAR.html#s2-noderemove-HAAR >. >> >> pcs cluster node remove /node/ >> >> I ran the command with the cluster active on 3 of the 5 available >> cluster nodes (with quorum). The command fails with: >> >> [root@zs90KP VD]# date;*pcs cluster node remove zs93kjpcs1* >> Thu Apr 13 13:40:59 EDT 2017 >> *Error: pcsd is not running on zs93kjpcs1* >> >> >> The node was not removed: >> >> [root@zs90KP VD]# pcs status |less >> Cluster name: test_cluster_2 >> Last updated: Thu Apr 13 14:08:15 2017 Last change: Wed Apr 12 16:40:26 >> 2017 by root via cibadmin on zs93KLpcs1 >> Stack: corosync >> Current DC: zs90kppcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) - >> partition with quorum >> 45 nodes and 180 resources configured >> >> Node zs95KLpcs1: UNCLEAN (offline) >> Online: [ zs90kppcs1 zs93KLpcs1 zs95kjpcs1 ] >> *OFFLINE: [ zs93kjpcs1 ]* >> >> >> Is there a way to force remove a node that's no longer bootable? If not, >> what's the procedure for removing a rogue cluster node? >> >> Thank you... >> >> Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie, N.Y. >> INTERNET: swgre...@us.ibm.com > > Yes, the pcs command is just a convenient shorthand for a series of > commands. You want to ensure pacemaker and corosync are stopped on the > node to be removed (in the general case, obviously already done in this > case), remove the node from corosync.conf and restart corosync on all > other nodes, then run "crm_node -R <nodename>" on any one active node. Hi Scott, It is possible to remove an offline node from a cluster with upstream pcs 0.9.154 or RHEL pcs-0.9.152-5 (available in RHEL7.3) or newer. If you have an older version, here's a workaround: 1. run 'pcs cluster localnode remove <nodename>' on all remaining nodes 2. run 'pcs cluster reload corosync' on one node 3. run 'crm_node -R <nodename> --force' on one node It's basically the same procedure Ken described. See https://bugzilla.redhat.com/show_bug.cgi?id=1225423 for more details. Regards, Tomas _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org