----- On Oct 23, 2020, at 8:45 PM, Valentin Vidić vvi...@valentin-vidic.from.hr wrote:
> On Fri, Oct 23, 2020 at 08:08:31PM +0200, Lentes, Bernd wrote: >> But when the timeout has run out the RA tries to kill the machine with a >> "virsh >> destroy". >> And if that does not work (what is occasionally my problem) because the >> domain >> is in uninterruptable sleep (D state) the RA gives a $OCF_ERR_GENERIC back >> which >> cause pacemaker to fence the lazy node. Or am i wrong ? > > What does the log look like when this happens? > /var/log/cluster/corosync.log: VirtualDomain(vm_amok)[8998]: 2020/09/27_22:34:11 INFO: Issuing graceful shutdown request for domain vm_amok. VirtualDomain(vm_amok)[8998]: 2020/09/27_22:37:06 INFO: Issuing forced shutdown (destroy) request for domain vm_amok. Sep 27 22:37:11 [11282] ha-idg-2 lrmd: warning: child_timeout_callback: vm_amok_stop_0 process (PID 8998) timed out Sep 27 22:37:11 [11282] ha-idg-2 lrmd: warning: operation_finished: vm_amok_stop_0:8998 - timed out after 180000ms timeout of the domain is 180 sec. /var/log/libvirt/libvirtd.log (time is UTC): 2020-09-27 20:37:21.489+0000: 18583: error : virProcessKillPainfully:401 : Failed to terminate process 14037 with SIGKILL: Device or resource busy 2020-09-27 20:37:21.505+0000: 6610: error : virNetSocketWriteWire:1852 : Cannot write data: Broken pipe 2020-09-27 20:37:31.962+0000: 6610: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor SIGKILL didn't work. Nevertheless the process is finished 20 seconds later after destroy, surely because it woke up from D and received the signal. /var/log/cluster/corosync.log on the DC: Sep 27 22:37:11 [3580] ha-idg-1 crmd: warning: status_from_rc: Action 93 (vm_amok_stop_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error Stop (also sigkill) failed Sep 27 22:37:11 [3579] ha-idg-1 pengine: notice: native_stop_constraints: Stop of failed resource vm_amok is implicit after ha-idg-2 is fenced cluster decides to fence the node although resource is stopped 10 seconds later atop log: 14037 - S 261% /usr/bin/qemu-system-x86_64 -machine accel=kvm -name guest=vm_amok,debug-threads=on -S -object secret,id=masterKey0 ... PID of the domain is 14037 14037 - E 0% worker (at 22:37:31) domain has stoppped Bernd Helmholtz Zentrum München Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/