Re: [ClusterLabs] Rebuild of failed node

alexey Mon, 12 May 2025 13:36:30 -0700

Hi,


Occasionally, I have pacemaker as a base layer of custom clustering solution 
and I have a script to rebuild the second node from the first one. I can’t 
share the script itself as is has a lot of solution-dependent references, but I 
can share the sequence to rebuild the failed node:

1.      Setup the new node with the same IP and hostname
2.      (optional) setup passwordless mutual key-based SSH access. It is not 
necessary, but make a lot of things easy.
3.      Copy files from survived host to the new one:

a.      /etc/corosync/authkey
b.      /etc/corosync/corosync.conf
c.      /etc/drbd.d/*.res
d.      /etc/pacemaker/authkey

4.      Set hacluster user pass to the same as it was on the survived node.
5.      Re-auth pcs nodes with command
pcs host auth <host1_name>  <host2_name> -u hacluster -p <ha_cluster_pass>
6.      Reboot the restored server
7.      PROFIT!!!

 

If you use no arbiter (corosync-qnetd) this should be enough for your new 
cluster node up and running. If you use corosync-qnetd, you need also restore 
corosync-qdevice nssdb keys for the second host connect the arbiter node:

1.      On old host, extract your arbiter certificate from nssdb on the 
survived host:
certutil -L -d /etc/corosync/qdevice/net/nssdb -n 'QNet CA' -r > 
/root/qnetd-cert.crt
2.      Copy certificate to the new host, assume the path on the new host is 
the same
3.      On the new host, Init new nssdb with certificate:
corosync-qdevice-net-certutil -i -c /root/qnetd-cert.crt
4.      Copy certificate and key at location 
/etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12 from old node to new one
5.      On the new node: Import certificate and key:
corosync-qdevice-net-certutil -m -c 
/etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12
6.      Enable or restart corosync-qdevice:
systemctl enable –now corosync-qdevice.service
or
systemctl restart corosync-qdevice.service
7.      Enjoy!

 

That’s what practically work for me and included in service scripts of our 
product, based on Pacemaker.

 

Hope this could help!

 

Sincerely,

 

Alex

 

 

From: Users <users-boun...@clusterlabs.org> On Behalf Of Fabrizio Ermini
Sent: Friday, May 9, 2025 5:26 PM
To: users@clusterlabs.org
Subject: [ClusterLabs] Rebuild of failed node

 

Hi all! Freshmen here, just joined. 

 

I'm currently in the need to rebuild a failed node on a 
pacemaker2.1/corosync3.1 2-node cluster with drbd storage. 

I've searched in Pacemaker docs and in the list archives, but I haven't found a 
clear guide on how to proceed in this task. So far, I've reinstalled a new 
server, configured the same IP and hostname of the failed one, and installed 
all the software. I've also fixed DRBD layer and started the resync of the 
volumes. But it's not clear to me how to proceed - I've found some hints online 
pointing to the need of manually copying corosync config, but they were quite 
old and probably obsolete. I'm using pcs as a shell and I haven't found a 
command designed to replace a node, only to add or remove them. 

It seems really strange to me that there isn't a guide, since this should be a 
very basic operation and it's quite important to know how to do it - HW breaks, 
as a matter of fact :D

So I'll be very grateful if anyone can point me in the right direction.

Thanks in advance, and best regards

 

Fabrizio

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Rebuild of failed node

Reply via email to