>>> "Lentes, Bernd" <bernd.len...@helmholtz-muenchen.de> schrieb am 29.07.2020 um 17:26 in Nachricht <1894379294.27456141.1596036406000.javamail.zim...@helmholtz-muenchen.de>: > Hi, > > a few days ago one of my nodes was fenced and i don't know why, which is > something i really don't like. > What i did: > I put one node (ha-idg-1) in standby. The resources on it (most of all > virtual domains) were migrated to ha-idg-2, > except one domain (vm_nextcloud). On ha-idg-2 a mountpoint was missing the > xml of the domain points to. > Then the cluster tries to start vm_nextcloud on ha-idg-2 which of course > also failed. > Then ha-idg-1 was fenced.
My guess is that ha-idg-1 was fenced because a failed migration from ha-idg-2 is treated like a stop failure on ha-idg-2. Stop failures cause fencing. You should have tested your resource before going productive. > > I did a "crm history" over the respective time period, you find it here: > https://hmgubox2.helmholtz-muenchen.de/index.php/s/529dfcXf5a72ifF > > Here, from my point of view, the most interesting from the logs: > ha-idg-1: > Jul 20 16:59:33 [23763] ha-idg-1 cib: info: cib_perform_op: > Diff: --- 2.16196.19 2 > Jul 20 16:59:33 [23763] ha-idg-1 cib: info: cib_perform_op: > Diff: +++ 2.16197.0 bc9a558dfbe6d7196653ce56ad1ee758 > Jul 20 16:59:33 [23763] ha-idg-1 cib: info: cib_perform_op: + > /cib: @epoch=16197, @num_updates=0 > Jul 20 16:59:33 [23763] ha-idg-1 cib: info: cib_perform_op: + > /cib/configuration/nodes/node[@id='1084777482']/instance_attributes[@id='node > s-108 > 4777482']/nvpair[@id='nodes-1084777482-standby']: @value=on > ha-idg-1 set to standby > > Jul 20 16:59:34 [23768] ha-idg-1 crmd: notice: process_lrm_event: > ha-idg-1-vm_nextcloud_migrate_to_0:3169 [ error: Cannot access storage > file > '/mnt/mcd/AG_BioInformatik/Technik/software_und_treiber/linux/ubuntu/ubuntu-1 > 8.04.4-live-server-amd64.iso': No such file or > directory\nocf-exit-reason:vm_nextcloud: live migration to ha-idg-2 failed: > 1\n ] > migration failed > > Jul 20 17:04:01 [23767] ha-idg-1 pengine: error: > native_create_actions: Resource vm_nextcloud is active on 2 nodes > (attempting recovery) > ??? > > Jul 20 17:04:01 [23767] ha-idg-1 pengine: notice: LogAction: * > Recover vm_nextcloud ( ha-idg-2 ) > > Jul 20 17:04:01 [23768] ha-idg-1 crmd: notice: te_rsc_command: > Initiating stop operation vm_nextcloud_stop_0 on ha-idg-2 | action 106 > Jul 20 17:04:01 [23768] ha-idg-1 crmd: notice: te_rsc_command: > Initiating stop operation vm_nextcloud_stop_0 locally on ha-idg-1 | action 2 > > Jul 20 17:04:01 [23768] ha-idg-1 crmd: info: match_graph_event: > Action vm_nextcloud_stop_0 (106) confirmed on ha-idg-2 (rc=0) > > Jul 20 17:04:06 [23768] ha-idg-1 crmd: notice: process_lrm_event: > Result of stop operation for vm_nextcloud on ha-idg-1: 0 (ok) | call=3197 > key=vm_nextcloud_stop_0 confirmed=true cib-update=5960 > > Jul 20 17:05:29 [23761] ha-idg-1 pacemakerd: notice: crm_signal_dispatch: > Caught 'Terminated' signal | 15 (invoking handler) > systemctl stop pacemaker.service > > > ha-idg-2: > Jul 20 17:04:03 [10691] ha-idg-2 crmd: notice: process_lrm_event: > Result of stop operation for vm_nextcloud on ha-idg-2: 0 (ok) | call=157 > key=vm_nextcloud_stop_0 confirmed=true cib-update=57 > the log from ha-idg-2 is two seconds ahead of ha-idg-1 > > Jul 20 17:04:08 [10688] ha-idg-2 lrmd: notice: log_execute: > executing - rsc:vm_nextcloud action:start call_id:192 > Jul 20 17:04:09 [10688] ha-idg-2 lrmd: notice: operation_finished: > vm_nextcloud_start_0:29107:stderr [ error: Failed to create domain from > /mnt/share/vm_nextcloud.xml ] > Jul 20 17:04:09 [10688] ha-idg-2 lrmd: notice: operation_finished: > vm_nextcloud_start_0:29107:stderr [ error: Cannot access storage file > '/mnt/mcd/AG_BioInformatik/Technik/software_und_treiber/linux/ubuntu/ubuntu-1 > 8.04.4-live-server-amd64.iso': No such file or directory ] > Jul 20 17:04:09 [10688] ha-idg-2 lrmd: notice: operation_finished: > vm_nextcloud_start_0:29107:stderr [ ocf-exit-reason:Failed to start > virtual domain vm_nextcloud. ] > Jul 20 17:04:09 [10688] ha-idg-2 lrmd: notice: log_finished: > finished - rsc:vm_nextcloud action:start call_id:192 pid:29107 exit-code:1 > exec-time:581ms queue-time:0ms > start on ha-idg-2 failed > > Jul 20 17:05:32 [10691] ha-idg-2 crmd: info: do_dc_takeover: > Taking over DC status for this partition > ha-idg-1 stopped pacemaker > > Jul 20 17:05:33 [10690] ha-idg-2 pengine: warning: > unpack_rsc_op_failure: Processing failed migrate_to of vm_nextcloud on > ha-idg-1: unknown error | rc=1 > Jul 20 17:05:33 [10690] ha-idg-2 pengine: warning: > unpack_rsc_op_failure: Processing failed start of vm_nextcloud on ha-idg-2: > unknown error | rc > > Jul 20 17:05:33 [10690] ha-idg-2 pengine: info: native_color: > Resource vm_nextcloud cannot run anywhere > logical > > Jul 20 17:05:33 [10690] ha-idg-2 pengine: warning: custom_action: > Action vm_nextcloud_stop_0 on ha-idg-1 is unrunnable (pending) > ??? > > Jul 20 17:05:35 [10690] ha-idg-2 pengine: warning: custom_action: > Action vm_nextcloud_stop_0 on ha-idg-1 is unrunnable (offline) > Jul 20 17:05:35 [10690] ha-idg-2 pengine: warning: pe_fence_node: > Cluster node ha-idg-1 will be fenced: resource actions are unrunnable > Jul 20 17:05:35 [10690] ha-idg-2 pengine: warning: stage6: Scheduling > Node ha-idg-1 for STONITH > Jul 20 17:05:35 [10690] ha-idg-2 pengine: info: > native_stop_constraints: vm_nextcloud_stop_0 is implicit after ha-idg-1 is > fenced > Jul 20 17:05:35 [10690] ha-idg-2 pengine: notice: LogNodeActions: * > Fence (Off) ha-idg-1 'resource actions are unrunnable' > > > Why does it say "Jul 20 17:05:35 [10690] ha-idg-2 pengine: warning: > custom_action: Action vm_nextcloud_stop_0 on ha-idg-1 is unrunnable > (offline)" although > "Jul 20 17:04:06 [23768] ha-idg-1 crmd: notice: process_lrm_event: > Result of stop operation for vm_nextcloud on ha-idg-1: 0 (ok) | call=3197 > key=vm_nextcloud_stop_0 confirmed=true cib-update=5960" > says that stop was ok ? > > > Bernd > > -- > > Bernd Lentes > Systemadministration > Institute for Metabolism and Cell Death (MCD) > Building 25 - office 122 > HelmholtzZentrum München > bernd.len...@helmholtz-muenchen.de > phone: +49 89 3187 1241 > phone: +49 89 3187 3827 > fax: +49 89 3187 2294 > http://www.helmholtz-muenchen.de/mcd > > stay healthy > Helmholtz Zentrum München > > Helmholtz Zentrum Muenchen > Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) > Ingolstaedter Landstr. 1 > 85764 Neuherberg > www.helmholtz-muenchen.de > Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling > Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin > Guenther > Registergericht: Amtsgericht Muenchen HRB 6466 > USt-IdNr: DE 129521671 > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/