On Fri, 2020-08-14 at 20:37 +0200, Lentes, Bernd wrote: > ----- On Aug 9, 2020, at 10:17 PM, Bernd Lentes > bernd.len...@helmholtz-muenchen.de wrote: > > > > > So this appears to be the problem. From these logs I would guess > > > the > > > successful stop on ha-idg-1 did not get written to the CIB for > > > some > > > reason. I'd look at the pe input from this transition on ha-idg-2 > > > to > > > confirm that. > > > > > > Without the DC knowing about the stop, it tries to schedule a new > > > one, > > > but the node is shutting down so it can't do it, which means it > > > has to > > > be fenced. > > I checked all relevant pe-files in this time period. > This is what i found out (i just write the important entries): > > ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-input- > 3116 -G transition-3116.xml -D transition-3116.dot > Current cluster status: > ... > vm_nextcloud (ocf::heartbeat:VirtualDomain): Started ha-idg-1 > Transition Summary: > ... > * Migrate vm_nextcloud ( ha-idg-1 -> ha-idg-2 ) > Executing cluster transition: > * Resource action: vm_nextcloud migrate_from on ha-idg-2 <======= > migrate vm_nextcloud > * Resource action: vm_nextcloud stop on ha-idg-1 > * Pseudo action: vm_nextcloud_start_0 > Revised cluster status: > Node ha-idg-1 (1084777482): standby > Online: [ ha-idg-2 ] > vm_nextcloud (ocf::heartbeat:VirtualDomain): Started ha-idg-2 > > > ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-error- > 48 -G transition-4514.xml -D transition-4514.dot > Current cluster status: > Node ha-idg-1 (1084777482): standby > Online: [ ha-idg-2 ] > ... > vm_nextcloud (ocf::heartbeat:VirtualDomain): FAILED[ ha-idg-2 ha- > idg-1 ] <====== migration failed > Transition Summary: > .. > * Recover vm_nextcloud ( ha-idg-2 ) > Executing cluster transition: > * Resource action: vm_nextcloud stop on ha-idg-2 > * Resource action: vm_nextcloud stop on ha-idg-1 > * Resource action: vm_nextcloud start on ha-idg-2 > * Resource action: vm_nextcloud monitor=30000 on ha-idg-2 > Revised cluster status: > vm_nextcloud (ocf::heartbeat:VirtualDomain): Started ha-idg-2 > > ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-input- > 3117 -G transition-3117.xml -D transition-3117.dot > Current cluster status: > Node ha-idg-1 (1084777482): standby > Online: [ ha-idg-2 ] > vm_nextcloud (ocf::heartbeat:VirtualDomain): FAILED ha-idg-2 > <====== start on ha-idg-2 failed > Transition Summary: > * Stop vm_nextcloud ( ha-idg-2 ) due to node > availability <==== stop vm_nextcloud (what means due to node > availability ?)
"Due to node availability" means no node is allowed to run the resource, so it has to be stopped. > Executing cluster transition: > * Resource action: vm_nextcloud stop on ha-idg-2 > Revised cluster status: > vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped > > ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-input- > 3118 -G transition-4516.xml -D transition-4516.dot > Current cluster status: > Node ha-idg-1 (1084777482): standby > Online: [ ha-idg-2 ] > vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped > <============== vm_nextcloud is stopped > Transition Summary: > * Shutdown ha-idg-1 > Executing cluster transition: > * Resource action: vm_nextcloud stop on ha-idg-1 <==== why stop ? > It is already stopped I'm not sure, I'd have to see the pe input. > Revised cluster status: > vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped > > ha-idg-1:~/why-fenced/ha-idg-2/pengine # crm_simulate -S -x pe-input- > 3545 -G transition-0.xml -D transition-0.dot > Current cluster status: > Node ha-idg-1 (1084777482): pending > Online: [ ha-idg-2 ] > vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped <====== > vm_nextcloud is stopped > Transition Summary: > > Executing cluster transition: > Using the original execution date of: 2020-07-20 15:05:33Z > Revised cluster status: > vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped > > ha-idg-1:~/why-fenced/ha-idg-2/pengine # crm_simulate -S -x pe-warn- > 749 -G transition-1.xml -D transition-1.dot > Current cluster status: > Node ha-idg-1 (1084777482): OFFLINE (standby) > Online: [ ha-idg-2 ] > vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped <======= > vm_nextcloud is stopped > Transition Summary: > * Fence (Off) ha-idg-1 'resource actions are unrunnable' > Executing cluster transition: > * Fencing ha-idg-1 (Off) > * Pseudo action: vm_nextcloud_stop_0 <======= why stop ? It is > already stopped ? > Revised cluster status: > Node ha-idg-1 (1084777482): OFFLINE (standby) > Online: [ ha-idg-2 ] > vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped > > I don't understand why the cluster tries to stop a resource which is > already stopped. > > Bernd > Helmholtz Zentrum München > > Helmholtz Zentrum Muenchen > Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) > Ingolstaedter Landstr. 1 > 85764 Neuherberg > www.helmholtz-muenchen.de > Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling > Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin > Guenther > Registergericht: Amtsgericht Muenchen HRB 6466 > USt-IdNr: DE 129521671 > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/