On 15.04.2021 16:39, Klaus Wenninger wrote: > On 4/15/21 3:26 PM, Ulrich Windl wrote: >>>>> Steffen Vinther Sørensen <[email protected]> schrieb am 15.04.2021 um >> 14:56 in >> Nachricht >> <calhdmbixzoyf-gxg82ont4mgfm6q-_imceuvhypgwky41jj...@mail.gmail.com>: >>> On Thu, Apr 15, 2021 at 2:29 PM Ulrich Windl >>> <[email protected]> wrote: >>>>>>> Steffen Vinther Sørensen <[email protected]> schrieb am >>>>>>> 15.04.2021 um >>>> 13:10 in >>>> Nachricht >>>> <CALhdMBhMQRwmgoWEWuiGMDr7HfVOTTKvW8=nqms2p2e9p8y...@mail.gmail.com>: >>>>> Hi there, >>>>> >>>>> In this 3 node cluster, node03 been offline for a while, and being >>>>> brought up to service. Then a migration of a VirtualDomain is being >>>>> attempted, and node02 is then fenced. >>>>> >>>>> Provided is logs from all 2 nodes, and the 'pcs config' as well as a >>>>> bzcatted pe-warn. Anyone with an idea of why the node was fenced ? Is >>>>> it because of the failed ipmi monitor warning ? >>>> After a short glace it looks as if the network traffic used for VM >> migration >>>> killed the corosync (or other) communication. >>>> >>> May I ask what part is making you think so ? >> The part that I saw no reason for an intended fencing. > And it looks like node02 is being cut off from all > networking-communication - both corosync & ipmi.
Well, IPMI fencing was (claimed to be) successful, so monitoring errors could be false positive. Still it is something that needs investigation. ... judging by Apr 15 06:59:26 kvm03-node02 systemd-logind[4179]: Power key pressed. IPMI fencing *was* successful. > May really be the networking-load although I would > rather bet on something more systematic like a > Mac/IP-conflict with the VM or something. > I see you are having libvirtd under cluster-control. > Maybe bringing up the network-topology destroys the > connection between the nodes. > Has the cluster been working with the 3 nodes before? > > > Klaus >> >>>>> >>>>> Here is the outline: >>>>> >>>>> At 06:58:27 node03 is being activated with 'pcs start node03', nothing >>>>> suspicious in the logs >>>>> >>>>> At 06:59:17 a resource migration is attempted from node02 to node03 >>>>> with 'pcs resource move sikkermail30 kvm03-node02.logiva-gcs.dk' >>>>> >>>>> >>>>> on node01 this happens: >>>>> >>>>> Apr 15 06:59:17 kvm03-node01 pengine[29024]: warning: Processing >>>>> failed monitor of ipmi-fencing-node01 on kvm03-node02.logiva-gcs.dk: >>>>> unknown error >>>>> >>>>> And node02 is fenced ? >>>>> >>>>> /Steffen >>>> > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
