On Thu, 2020-07-02 at 17:18 +0200, stefan.schm...@farmpartner-tec.com wrote: > Hello, > > I hope someone can help with this problem. We are (still) trying to > get > Stonith to achieve a running active/active HA Cluster, but sadly to > no > avail. > > There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM. > The > Ubuntu VMs are the ones which should form the HA Cluster. > > The current status is this: > > # pcs status > Cluster name: pacemaker_cluster > WARNING: corosync and pacemaker node names do not match (IPs used in > setup?) > Stack: corosync > Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition > with > quorum > Last updated: Thu Jul 2 17:03:53 2020 > Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on > server4ubuntu1 > > 2 nodes configured > 13 resources configured > > Online: [ server2ubuntu1 server4ubuntu1 ] > > Full list of resources: > > stonith_id_1 (stonith:external/libvirt): Stopped > Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker] > Masters: [ server4ubuntu1 ] > Slaves: [ server2ubuntu1 ] > Master/Slave Set: WebDataClone [WebData] > Masters: [ server2ubuntu1 server4ubuntu1 ] > Clone Set: dlm-clone [dlm] > Started: [ server2ubuntu1 server4ubuntu1 ] > Clone Set: ClusterIP-clone [ClusterIP] (unique) > ClusterIP:0 (ocf::heartbeat:IPaddr2): Started > server2ubuntu1 > ClusterIP:1 (ocf::heartbeat:IPaddr2): Started > server4ubuntu1 > Clone Set: WebFS-clone [WebFS] > Started: [ server4ubuntu1 ] > Stopped: [ server2ubuntu1 ] > Clone Set: WebSite-clone [WebSite] > Started: [ server4ubuntu1 ] > Stopped: [ server2ubuntu1 ] > > Failed Actions: > * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1): > call=201, > status=Error, exitreason='', > last-rc-change='Thu Jul 2 14:37:35 2020', queued=0ms, > exec=3403ms > * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8): > call=203, > status=complete, exitreason='', > last-rc-change='Thu Jul 2 14:38:39 2020', queued=0ms, exec=0ms > * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1): > call=202, > status=Error, exitreason='', > last-rc-change='Thu Jul 2 14:37:39 2020', queued=0ms, > exec=3411ms > > > The stonith resoursce is stopped and does not seem to work. > On both hosts the command > # fence_xvm -o list > kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 > on
This should show both VMs, so getting to that point will likely solve your problem. fence_xvm relies on multicast, there could be some obscure network configuration to get that working on the VMs. > returns the local VM. Apparently it connects through the > Virtualization > interface because it returns the VM name not the Hostname of the > client > VM. I do not know if this is how it is supposed to work? Yes, fence_xvm knows only about the VM names. To get pacemaker to be able to use it for fencing the cluster nodes, you have to add a pcmk_host_map parameter to the fencing resource. It looks like pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..." > In the local network, every traffic is allowed. No firewall is > locally > active, just the connections leaving the local network are > firewalled. > Hence there are no coneection problems between the hosts and clients. > For example we can succesfully connect from the clients to the Hosts: > > # nc -z -v -u 192.168.1.21 1229 > Ncat: Version 7.50 ( https://nmap.org/ncat ) > Ncat: Connected to 192.168.1.21:1229. > Ncat: UDP packet sent successfully > Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. > > # nc -z -v -u 192.168.1.13 1229 > Ncat: Version 7.50 ( https://nmap.org/ncat ) > Ncat: Connected to 192.168.1.13:1229. > Ncat: UDP packet sent successfully > Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. > > > On the Ubuntu VMs we created and configured the the stonith resource > according to the howto provided here: > https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf > > The actual line we used: > # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt > hostlist="Host4,host2" > hypervisor_uri="qemu+ssh://192.168.1.21/system" > > > But as you can see in in the pcs status output, stonith is stopped > and > exits with an unkown error. > > Can somebody please advise on how to procced or what additionla > information is needed to solve this problem? > Any help would be greatly appreciated! Thank you in advance. > > Kind regards > Stefan Schmitz > > > > > > > > -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/