Re: [ClusterLabs] Antw: Re: Antw: [EXT] Node fenced for unknown reason

Steffen Vinther Sørensen Thu, 15 Apr 2021 13:10:15 -0700

On Thu, Apr 15, 2021 at 3:39 PM Klaus Wenninger <kwenn...@redhat.com> wrote:
>
> On 4/15/21 3:26 PM, Ulrich Windl wrote:
> >>>> Steffen Vinther Sørensen <svint...@gmail.com> schrieb am 15.04.2021 um
> > 14:56 in
> > Nachricht
> > <calhdmbixzoyf-gxg82ont4mgfm6q-_imceuvhypgwky41jj...@mail.gmail.com>:
> >> On Thu, Apr 15, 2021 at 2:29 PM Ulrich Windl
> >> <ulrich.wi...@rz.uni-regensburg.de> wrote:
> >>>>>> Steffen Vinther Sørensen <svint...@gmail.com> schrieb am 15.04.2021 um
> >>> 13:10 in
> >>> Nachricht
> >>> <CALhdMBhMQRwmgoWEWuiGMDr7HfVOTTKvW8=nqms2p2e9p8y...@mail.gmail.com>:
> >>>> Hi there,
> >>>>
> >>>> In this 3 node cluster, node03 been offline for a while, and being
> >>>> brought up to service. Then a migration of a VirtualDomain is being
> >>>> attempted, and node02 is then fenced.
> >>>>
> >>>> Provided is logs from all 2 nodes, and the 'pcs config' as well as a
> >>>> bzcatted pe-warn. Anyone with an idea of why the node was fenced ? Is
> >>>> it because of the failed ipmi monitor warning ?
> >>> After a short glace it looks as if the network traffic used for VM
> > migration
> >>> killed the corosync (or other) communication.
> >>>
> >> May I ask what part is making you think so ?
> > The part that I saw no reason for an intended fencing.
> And it looks like node02 is being cut off from all
> networking-communication - both corosync & ipmi.
> May really be the networking-load although I would
> rather bet on something more systematic like a
> Mac/IP-conflict with the VM or something.
> I see you are having libvirtd under cluster-control.
> Maybe bringing up the network-topology destroys the
> connection between the nodes.
> Has the cluster been working with the 3 nodes before?
>
>
> Klaus


Hi Klaus

Yes it has been working before with all 3 nodes and migrations back
and forth, but a few more VirtualDomains have been deployed since the
last migration test.

It happens very fast, almost immediately after migration is starting.
Could it be that some timeout values should be adjusted ?
I just don't have any idea where to start looking, as to me there is
nothing obviously suspicious found in the logs.

/Steffen

> >
> >>>>
> >>>> Here is the outline:
> >>>>
> >>>> At 06:58:27 node03 is being activated with 'pcs start node03', nothing
> >>>> suspicious in the logs
> >>>>
> >>>> At  06:59:17 a resource migration is attempted from node02 to node03
> >>>> with 'pcs resource move sikkermail30 kvm03-node02.logiva-gcs.dk'
> >>>>
> >>>>
> >>>> on node01 this happens:
> >>>>
> >>>> Apr 15 06:59:17 kvm03-node01 pengine[29024]:  warning: Processing
> >>>> failed monitor of ipmi-fencing-node01 on kvm03-node02.logiva-gcs.dk:
> >>>> unknown error
> >>>>
> >>>> And node02 is fenced ?
> >>>>
> >>>> /Steffen
> >>>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Node fenced for unknown reason

Reply via email to