On Wed, 2019-06-19 at 18:46 +0200, Lentes, Bernd wrote: > ----- On Jun 15, 2019, at 4:30 PM, Bernd Lentes > bernd.len...@helmholtz-muenchen.de wrote: > > > ----- Am 14. Jun 2019 um 21:20 schrieb kgaillot kgail...@redhat.com > > : > > > > > On Fri, 2019-06-14 at 18:27 +0200, Lentes, Bernd wrote: > > > > Hi, > > > > > > > > i had that problem already once but still it's not clear for me > > > > what > > > > really happens. > > > > I had this problem some days ago: > > > > I have a 2-node cluster with several virtual domains as > > > > resources. I > > > > put one node (ha-idg-2) into standby, and two running virtual > > > > domains > > > > were migrated to the other node (ha-idg-1). The other virtual > > > > domains > > > > were already running on ha-idg-1. > > > > Since then the two virtual domains which migrated > > > > (vm_idcc_devel and > > > > vm_severin) start or stop every 15 minutes on ha-idg-1. > > > > ha-idg-2 resides in standby. > > > > I know that the 15 minutes interval is related to the "cluster- > > > > recheck-interval". > > > > But why are these two domains started and stopped ? > > > > I looked around much in the logs, checked the pe-input files, > > > > watched > > > > some graphs created by crm_simulate with dotty ... > > > > I always see that the domains are started and 15 minutes later > > > > stopped and 15 minutes later started ... > > > > but i don't see WHY. I would really like to know that. > > > > And why are the domains not started from the monitor resource > > > > operation ? It should recognize that the domain is stopped and > > > > starts > > > > it again. My monitor interval is 30 seconds. > > > > I had two errors pending concerning these domains, a failed > > > > migrate > > > > from ha-idg-1 to ha-idg-2, form some time before. > > > > Could that be the culprit ?
It did indeed turn out to be. The resource history on ha-idg-1 shows the last failed action as a migrate_to from ha-idg-1 to ha-idg-2, and the last successful action as a migrate_from from ha-idg-2 to ha-idg-1. That confused pacemaker as to the current status of the migration. A full migration is migrate_to on the source node, migrate_from on the target node, and stop on the source node. When the resource history has a failed migrate_to on the source, and a stop but no migrate_from on the target, the migration is considered "dangling" and forces a stop of the resource on the source, because it's possible the migrate_from never got a chance to be scheduled. That is wrong in this situation. The resource is happily running on the node with the failed migrate_to because it was later moved back successfully, and the failed migrate_to is no longer relevant. My current plan for a fix is that if a node with a failed migrate_to has a successful migrate_from or start that's newer, and the target node of the failed migrate_to has a successful stop, then the migration should not be considered dangling. A couple of side notes on your configuration: Instead of putting action=off in fence device configurations, you should use pcmk_reboot_action=off. Pacemaker adds action when sending the fence command. When keeping a fence device off its target node, use a finite negative score rather than -INFINITY. This ensures the node can fence itself as a last resort. > > > > > > > > I still have all the logs from that time, if you need > > > > information > > > > just let me know. > > > > > > Yes the logs and pe-input files would be helpful. It sounds like > > > a bug > > > in the scheduler. What version of pacemaker are you running? > > > > > > > Hi, > > > > here are the log and some pe-input files: > > https://hmgubox.helmholtz-muenchen.de/d/f28f6961722f472eb649/ > > On 6th of june at 15:41:28 i issued "crm node standby ha-idg-2", > > then the > > trouble began. > > I'm running pacemaker-1.1.19+20181105.ccd6b5b10-3.10.1.x86_64 on > > SLES 12 SP4 and > > kernel 4.12.14-95.13. > > > > Hi, > the problem arised again. > And what attracted my attention: i made a change in the > configuration, e.g. some slight changes > of a resource, it immediately start or stop the domains, depending on > the state before. > The fence-resource is not affected by this start/stop. > > Example (some changes of a stonith agent): > > Jun 18 18:07:09 [9577] ha-idg-1 cib: info: > cib_process_request: Forwarding cib_replace operation for > section configuration to all (origin=local/crm_shadow/2) > Jun 18 18:07:09 [9577] ha-idg-1 cib: info: > __xml_diff_object: Moved nvpair@id (0 -> 2) > Jun 18 18:07:09 [9577] ha-idg-1 cib: info: > __xml_diff_object: Moved nvpair@name (1 -> 0) > Jun 18 18:07:09 [9577] ha-idg-1 cib: info: > cib_perform_op: Diff: --- 2.6990.1043 2 > Jun 18 18:07:09 [9577] ha-idg-1 cib: info: > cib_perform_op: Diff: +++ 2.6991.0 > 6a5f09a19ae7d0a7bae55bddb9d1564f <===================== new epoch > > Jun 18 18:07:09 [9577] ha-idg-1 cib: info: > cib_perform_op: -- > /cib/configuration/resources/primitive[@id='fence_ilo_ha-idg- > 2']/instance_attributes[@id='fence_ha-idg-2- > instance_attributes']/nvpair[@id='fence_ha-idg-2-instance > _attributes-action'] > Jun 18 18:07:09 [9577] ha-idg-1 cib: info: > cib_perform_op: -- > /cib/configuration/resources/primitive[@id='fence_ilo_ha-idg- > 2']/instance_attributes[@id='fence_ha-idg-2-instance_attributes- > 0']/nvpair[@id='fence_ha-idg-2-instan > ce_attributes-0-ipaddr'] > Jun 18 18:07:09 [9577] ha-idg-1 cib: info: > cib_perform_op: -- > /cib/configuration/resources/primitive[@id='fence_ilo_ha-idg- > 2']/instance_attributes[@id='fence_ha-idg-2-instance_attributes- > 1']/nvpair[@id='fence_ha-idg-2-instan > ce_attributes-1-login'] > Jun 18 18:07:09 [9577] ha-idg-1 cib: info: > cib_perform_op: -- > /cib/configuration/resources/primitive[@id='fence_ilo_ha-idg- > 2']/instance_attributes[@id='fence_ha-idg-2-instance_attributes- > 2']/nvpair[@id='fence_ha-idg-2-instan > ce_attributes-2-passwd'] > Jun 18 18:07:09 [9577] ha-idg-1 cib: info: > cib_perform_op: -- > /cib/configuration/resources/primitive[@id='fence_ilo_ha-idg- > 2']/instance_attributes[@id='fence_ha-idg-2-instance_attributes- > 3']/nvpair[@id='fence_ha-idg-2-instan > ce_attributes-3-ssl_insecure'] > Jun 18 18:07:09 [9577] ha-idg-1 cib: info: > cib_perform_op: -- > /cib/configuration/resources/primitive[@id='fence_ilo_ha-idg- > 2']/instance_attributes[@id='fence_ha-idg-2-instance_attributes- > 4']/nvpair[@id='fence_ha-idg-2-instan > ce_attributes-4-delay'] > Jun 18 18:07:09 [9577] ha-idg-1 cib: info: > cib_perform_op: + /cib: @epoch=6991, @num_updates=0 > Jun 18 18:07:09 [9577] ha-idg-1 cib: info: > cib_perform_op: ++ > /cib/configuration/resources/primitive[@id='fence_ilo_ha-idg- > 2']/instance_attributes[@id='fence_ha-idg-2- > instance_attributes']: <nvpair name="pcmk_off_action" va > lue="Off" id="fence_ha-idg-2-instance_attributes-pcmk_off_action"/> > > ... > > cluster reacts immediately: > Jun 18 18:07:10 [9581] ha-idg-1 pengine: notice: > LogAction: * > Restart vm_sim ( ha-idg-2 ) > Jun 18 18:07:10 [9581] ha-idg-1 pengine: notice: > LogAction: * > Restart vm_geneious ( ha-idg-2 ) > Jun 18 18:07:10 [9581] ha-idg-1 pengine: notice: > LogAction: * > Restart vm_idcc_devel ( ha-idg-2 ) > Jun 18 18:07:10 [9581] ha-idg-1 pengine: notice: > LogAction: * > Restart vm_genetrap ( ha-idg-2 ) > Jun 18 18:07:10 [9581] ha-idg-1 pengine: notice: > LogAction: * > Restart vm_mouseidgenes ( ha-idg-2 ) > Jun 18 18:07:10 [9581] ha-idg-1 pengine: notice: > LogAction: * > Restart vm_greensql ( ha-idg-2 ) > Jun 18 18:07:10 [9581] ha-idg-1 pengine: notice: > LogAction: * > Restart vm_severin ( ha-idg-2 ) > > What else surprises me: > With the changes of the stonith agent a new epoch (6991) was created. > Afterwards, all start/stop actions of the domains, happening hours > later, relate to this epoch: > > ... > Jun 18 21:15:07 [9577] ha-idg-1 cib: info: > cib_perform_op: Diff: --- 2.6991.178 > 2 <======================== > Jun 18 21:15:07 [9577] ha-idg-1 cib: info: > cib_perform_op: Diff: +++ 2.6991.179 > (null) <======================== > Jun 18 21:15:07 [9577] ha-idg-1 cib: info: > cib_perform_op: + /cib: @num_updates=179 > Jun 18 21:15:07 [9577] ha-idg-1 cib: info: > cib_perform_op: + /cib/status/node_state[@id='1084777492']/lrm[@id > ='1084777492']/lrm_resources/lrm_resource[@id='vm_sim']/lrm_rsc_op[@i > d='vm_sim_last_0']: @operation_key=vm_sim_start_0, @operation=start, > @transition-key=33:178:0:612aaad1-30d6-4a94-978e-fcbece63cb8f, > @transition-magic=0:0;33:178:0:612aaad1-30d6-4a94-978e-fcbece63cb8f, > @call-id=4693, @exec-time=2928 > Jun 18 21:15:07 [9577] ha-idg-1 cib: info: > cib_process_request: Completed cib_modify operation for section > status: OK (rc=0, origin=ha-idg-2/crmd/1977, version=2.6991.179) > Jun 18 21:15:07 [9583] ha-idg-1 crmd: info: > match_graph_event: Action vm_sim_start_0 (33) confirmed on ha- > idg-2 (rc=0) > Jun 18 21:15:07 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating monitor operation vm_sim_monitor_30000 > on ha-idg-2 | action 34 > Jun 18 21:15:07 [9578] ha-idg-1 stonith-ng: info: > update_cib_stonith_devices_v2: Updating device list from the cib: > modify lrm_rsc_op[@id='vm_sim_last_0'] > Jun 18 21:15:07 [9578] ha-idg-1 stonith-ng: info: > cib_devices_update: Updating devices to version 2.6991.179 > Jun 18 21:15:07 [9578] ha-idg-1 stonith-ng: notice: > unpack_config: On loss of CCM Quorum: Ignore > Jun 18 21:15:07 [9578] ha-idg-1 stonith-ng: info: > cib_device_update: Device fence_ilo_ha-idg-1 has been disabled > on ha-idg-1: score=-INFINITY > Jun 18 21:15:07 [9577] ha-idg-1 cib: info: > cib_perform_op: Diff: --- 2.6991.179 2 > Jun 18 21:15:07 [9577] ha-idg-1 cib: info: > cib_perform_op: Diff: +++ 2.6991.180 (null) > ... > > Continuing just to the next change of the configuration, then a new > epoch is created. > Normally this epochs are changed very frequently, here they remain > for a long time unchanged. > Does that mean that still, although hours later, the actions relate > to this epoch ? What is the use of this epoch ? > > Also interesting: > Jun 17 16:47:05 [9581] ha-idg-1 pengine: notice: > process_pe_message: Calculated transition 62, saving inputs in > /var/lib/pacemaker/pengine/pe-input-1085.bz2 > Jun 17 16:47:05 [9583] ha-idg-1 crmd: info: > do_state_transition: State transition S_POLICY_ENGINE -> > S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE > origin=handle_response > Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice: > do_te_invoke: Processing graph 62 (ref=pe_calc-dc-1560782825-446) > derived from /var/lib/pacemaker/pengine/pe-input-1085.bz2 > Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_mausdb_stop_0 on ha- > idg-2 | action 37 > Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_sim_stop_0 on ha-idg-2 > | action 39 > Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_geneious_stop_0 on ha- > idg-2 | action 41 > Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_idcc_devel_stop_0 on > ha-idg-2 | action 43 > Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_genetrap_stop_0 on ha- > idg-2 | action 45 > Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_mouseidgenes_stop_0 on > ha-idg-2 | action 47 > Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_greensql_stop_0 on ha- > idg-2 | action 49 > Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_severin_stop_0 on ha- > idg-2 | action 51 > ... > Jun 17 16:48:20 [9583] ha-idg-1 crmd: notice: > run_graph: Transition 62 (Complete=9, Pending=0, Fired=0, > Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input- > 1085.bz2): Complete > Jun 17 16:48:20 [9583] ha-idg-1 crmd: info: do_log: Input > I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd > Jun 17 16:48:20 [9583] ha-idg-1 crmd: notice: > do_state_transition: State transition S_TRANSITION_ENGINE -> > S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd > > 15 min later: > ... > Jun 17 17:03:20 [9581] ha-idg-1 pengine: notice: > LogAction: * > Restart vm_mausdb ( ha-idg-2 ) > Jun 17 17:03:20 [9581] ha-idg-1 pengine: notice: > LogAction: * > Restart vm_sim ( ha-idg-2 ) > Jun 17 17:03:20 [9581] ha-idg-1 pengine: notice: > LogAction: * > Restart vm_geneious ( ha-idg-2 ) > Jun 17 17:03:20 [9581] ha-idg-1 pengine: notice: > LogAction: * > Restart vm_idcc_devel ( ha-idg-2 ) > Jun 17 17:03:20 [9581] ha-idg-1 pengine: notice: > LogAction: * > Restart vm_genetrap ( ha-idg-2 ) > Jun 17 17:03:20 [9581] ha-idg-1 pengine: notice: > LogAction: * > Restart vm_mouseidgenes ( ha-idg-2 ) > Jun 17 17:03:20 [9581] ha-idg-1 pengine: notice: > LogAction: * > Restart vm_greensql ( ha-idg-2 ) > Jun 17 17:03:20 [9581] ha-idg-1 pengine: notice: > LogAction: * > Restart vm_severin ( ha-idg-2 ) > ... > Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_mausdb_stop_0 on ha- > idg-2 | action 29 > Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_sim_stop_0 on ha-idg-2 > | action 32 > Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_geneious_stop_0 on ha- > idg-2 | action 35 > Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_idcc_devel_stop_0 on > ha-idg-2 | action 38 > Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_genetrap_stop_0 on ha- > idg-2 | action 41 > Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_mouseidgenes_stop_0 on > ha-idg-2 | action 44 > Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_greensql_stop_0 on ha- > idg-2 | action 47 > Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_severin_stop_0 on ha- > idg-2 | action 50 > ... > Despite the fact that the domains are already stopped they are > stopped again !?! > > And immediately after being stopped they are started again: > Jun 17 17:03:20 [9583] ha-idg-1 crmd: info: > match_graph_event: Action vm_mausdb_stop_0 (29) confirmed on > ha-idg-2 (rc=0) > Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating start operation vm_mausdb_start_0 on ha- > idg-2 | action 30 > > Jun 17 17:03:20 [9583] ha-idg-1 crmd: info: > match_graph_event: Action vm_idcc_devel_stop_0 (38) confirmed > on ha-idg-2 (rc=0) > Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating start operation vm_idcc_devel_start_0 on > ha-idg-2 | action 39 > Stopped and immediately started again !?! > > Also interesting: > ... > Jun 19 14:47:42 [9583] ha-idg-1 crmd: info: > crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped > (900000ms) > Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice: > do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE > | input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped > Jun 19 14:47:42 [9583] ha-idg-1 crmd: info: > do_state_transition: Progressed to state S_POLICY_ENGINE after > C_TIMER_POPPED > Jun 19 14:47:42 [9581] ha-idg-1 pengine: notice: > unpack_config: On loss of CCM Quorum: Ignore > Jun 19 14:47:42 [9581] ha-idg-1 pengine: info: > unpack_status: Node ha-idg-1 is in standby-mode > Jun 19 14:47:42 [9581] ha-idg-1 pengine: info: > determine_online_status_fencing: Node ha-idg-1 is active > Jun 19 14:47:42 [9581] ha-idg-1 pengine: info: > determine_online_status: Node ha-idg-1 is standby > Jun 19 14:47:42 [9581] ha-idg-1 pengine: info: > determine_online_status_fencing: Node ha-idg-2 is active > Jun 19 14:47:42 [9581] ha-idg-1 pengine: info: > determine_online_status: Node ha-idg-2 is online > Jun 19 14:47:42 [9581] ha-idg-1 pengine: warning: > unpack_rsc_op_failure: Processing failed monitor of vm_mausdb on > ha-idg-2: not running | rc=7 > Jun 19 14:47:42 [9581] ha-idg-1 pengine: warning: > unpack_rsc_op_failure: Processing failed migrate_to of > vm_mouseidgenes on ha-idg-2: unknown error | rc=1 > Jun 19 14:47:42 [9581] ha-idg-1 pengine: warning: > unpack_rsc_op_failure: Processing failed migrate_to of > vm_idcc_devel on ha-idg-2: unknown error | rc=1 > Jun 19 14:47:42 [9581] ha-idg-1 pengine: warning: > unpack_rsc_op_failure: Processing failed migrate_to of vm_sim on > ha-idg-2: unknown error | rc=1 > Jun 19 14:47:42 [9581] ha-idg-1 pengine: warning: > unpack_rsc_op_failure: Processing failed migrate_to of vm_genetrap > on ha-idg-2: unknown error | rc=1 > Jun 19 14:47:42 [9581] ha-idg-1 pengine: warning: > unpack_rsc_op_failure: Processing failed migrate_to of vm_geneious > on ha-idg-2: unknown error | rc=1 > Jun 19 14:47:42 [9581] ha-idg-1 pengine: warning: > unpack_rsc_op_failure: Processing failed migrate_to of vm_greensql > on ha-idg-2: unknown error | rc=1 > Jun 19 14:47:42 [9581] ha-idg-1 pengine: warning: > unpack_rsc_op_failure: Processing failed migrate_to of vm_severin > on ha-idg-2: unknown error | rc=1 > Jun 19 14:47:42 [9581] ha-idg-1 pengine: info: > unpack_node_loop: Node 1084777482 is already processed > Jun 19 14:47:42 [9581] ha-idg-1 pengine: info: > unpack_node_loop: Node 1084777492 is already processed > Jun 19 14:47:42 [9581] ha-idg-1 pengine: info: > unpack_node_loop: Node 1084777482 is already processed > Jun 19 14:47:42 [9581] ha-idg-1 pengine: info: > unpack_node_loop: Node 1084777492 is already processed > ... > Jun 19 14:47:42 [9581] ha-idg-1 pengine: notice: > process_pe_message: ===============> Calculated transition 250 > <===============, saving inputs in /var/lib/pacemaker/pengine/pe- > input-1273.bz2 > > Jun 19 14:47:42 [9583] ha-idg-1 crmd: info: > do_state_transition: State transition S_POLICY_ENGINE -> > S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE > origin=handle_response > Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice: > do_te_invoke: Processing graph 250 (ref=pe_calc-dc-1560948462- > 3241) derived from /var/lib/pacemaker/pengine/pe-input- > 1273.bz2 <=============================== > Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_sim_stop_0 on ha-idg-2 > | action 32 > Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_geneious_stop_0 on ha- > idg-2 | action 35 > Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_idcc_devel_stop_0 on > ha-idg-2 | action 38 > Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_genetrap_stop_0 on ha- > idg-2 | action 41 > Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_mouseidgenes_stop_0 on > ha-idg-2 | action 44 > Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_greensql_stop_0 on ha- > idg-2 | action 47 > Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice: > te_rsc_command: Initiating stop operation vm_severin_stop_0 on ha- > idg-2 | action 50 > ... > The changes are calculated to transition 250 and completed: > ... > Jun 19 14:47:46 [9583] ha-idg-1 crmd: notice: > run_graph: =========> Transition 250 (Complete=22, Pending=0, > Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-1273.bz2): Complete > <=========== > > Jun 19 14:47:46 [9583] ha-idg-1 crmd: info: do_log: Input > I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd > Jun 19 14:47:46 [9583] ha-idg-1 crmd: notice: > do_state_transition: State transition S_TRANSITION_ENGINE -> > S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd > Jun 19 14:47:46 [9578] ha-idg-1 stonith-ng: info: > update_cib_stonith_devices_v2: Updating device list from the cib: > modify lrm_rsc_op[@id='vm_mouseidgenes_monitor_30000'] > Jun 19 14:47:46 [9578] ha-idg-1 stonith-ng: info: > cib_devices_update: Updating devices to version 2.7007.890 > Jun 19 14:47:46 [9578] ha-idg-1 stonith-ng: notice: > unpack_config: On loss of CCM Quorum: Ignore > Jun 19 14:47:46 [9578] ha-idg-1 stonith-ng: info: > cib_device_update: Device fence_ilo_ha-idg-1 has been disabled > on ha-idg-1: score=-INFINITY > > Transition 250 is completed. But > ... > Jun 19 14:47:51 [9577] ha-idg-1 cib: info: > cib_process_ping: Reporting our current digest to ha-idg-1: > ae505b65505c427ab0a45d36717b4135 for 2.7007.890 (0x1ea7670 0) > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_process_request: Forwarding cib_delete operation for section > //node_state[@uname='ha-idg-1']//lrm_resource[@id='vm_mouseidgenes'] > to all (origin=local/crmd/707) > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_perform_op: Diff: --- 2.7007.890 2 > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_perform_op: Diff: +++ 2.7007.891 > 2d0272cf2594d336b179085d63b67c6f > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_perform_op: -- > /cib/status/node_state[@id='1084777482']/lrm[@id='1084777482']/lrm_re > sources/lrm_resource[@id='vm_mouseidgenes'] > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_perform_op: + /cib: @num_updates=891 > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_process_request: Completed cib_delete operation for section > //node_state[@uname='ha-idg-1']//lrm_resource[@id='vm_mouseidgenes']: > OK (rc=0, origin=ha-idg-1/crmd/707, version= > 2.7007.890) > Jun 19 14:57:32 [9583] ha-idg-1 crmd: info: > delete_resource: Removing resource vm_mouseidgenes for 4460a5c3- > c009-44f6-a01d-52f93e731fda (root) on ha-idg-1 > Jun 19 14:57:32 [9583] ha-idg-1 crmd: info: > notify_deleted: Notifying 4460a5c3-c009-44f6-a01d-52f93e731fda on > ha-idg-1 that vm_mouseidgenes was deleted > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_process_request: Forwarding cib_delete operation for section > //node_state[@uname='ha-idg-1']//lrm_resource[@id='vm_mouseidgenes'] > to all (origin=local/crmd/708) > Jun 19 14:57:32 [9583] ha-idg-1 crmd: warning: > qb_ipcs_event_sendv: new_event_notification (9583-10294-15): > Broken pipe (32) > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_perform_op: Diff: --- 2.7007.890 2 > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_perform_op: Diff: +++ 2.7007.891 (null) > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_perform_op: -- > /cib/status/node_state[@id='1084777482']/lrm[@id='1084777482']/lrm_re > sources/lrm_resource[@id='vm_mouseidgenes'] > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_perform_op: + /cib: @num_updates=891 > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_process_request: Completed cib_delete operation for section > //node_state[@uname='ha-idg-1']//lrm_resource[@id='vm_mouseidgenes']: > OK (rc=0, origin=ha-idg-1/crmd/708, version= > 2.7007.891) > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_process_request: Forwarding cib_modify operation for section > crm_config to all (origin=local/crmd/710) > > Jun 19 14:57:32 [9583] ha-idg-1 crmd: info: > abort_transition_graph: ========> Transition 250 aborted > <============== by deletion of lrm_resource[@id='vm_mouseidgenes']: > Resource state removal | cib=2.7007.891 source=abort_unless_down:344 > path=/cib/sta > tus/node_state[@id='1084777482']/lrm[@id='1084777482']/lrm_resources/ > lrm_resource[@id='vm_mouseidgenes'] complete=true > > Jun 19 14:57:32 [9583] ha-idg-1 crmd: notice: > do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE > | input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph > Jun 19 14:57:32 [9578] ha-idg-1 stonith-ng: info: > update_cib_stonith_devices_v2: Updating device list from the cib: > delete lrm_resource[@id='vm_mouseidgenes'] > Jun 19 14:57:32 [9578] ha-idg-1 stonith-ng: info: > cib_devices_update: Updating devices to version 2.7007.891 > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_perform_op: Diff: --- 2.7007.891 2 > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_perform_op: Diff: +++ 2.7008.0 (null) > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_perform_op: + /cib: @epoch=7008, @num_updates=0 > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_perform_op: + /cib/configuration/crm_config/cluster_property_s > et[@id='cib-bootstrap-options']/nvpair[@id='cib-bootstrap-options- > last-lrm-refresh']: @value=1560949052 > Jun 19 14:57:32 [9578] ha-idg-1 stonith-ng: notice: > unpack_config: On loss of CCM Quorum: Ignore > Jun 19 14:57:32 [9577] ha-idg-1 cib: info: > cib_process_request: Completed cib_modify operation for section > crm_config: OK (rc=0, origin=ha-idg-1/crmd/710, version=2.7008.0) > Jun 19 14:57:32 [9583] ha-idg-1 crmd: info: > abort_transition_graph: Transition 250 aborted by cib-bootstrap- > options-last-lrm-refresh doing modify last-lrm-refresh=1560949052: > Configuration change | cib=2.7008.0 source=te_update_diff_v2:500 > path=/cib/configuration/crm_config/cluster_property_set[@id='cib- > bootstrap-options']/nvpair[@id='cib-bootstrap-options-last-lrm- > refresh'] complete=true > > a few minutes later transition 250 is aborted. How can something > which is completed being aborted ? > > > > > Helmholtz Zentrum Muenchen > Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) > Ingolstaedter Landstr. 1 > 85764 Neuherberg > www.helmholtz-muenchen.de > Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling > Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, > Heinrich Bassler, Kerstin Guenther > Registergericht: Amtsgericht Muenchen HRB 6466 > USt-IdNr: DE 129521671 > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/