[ClusterLabs] Antw: Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-28 Thread Ulrich Windl
>>> Ferenc Wágner schrieb am 28.08.2017 um 18:07 in Nachricht <87mv6jk75r@lant.ki.iif.hu>: [...] cLVM under I/O load can be really slow (I'm talking about delays in the range of a few seconds). Be sure to have any timeouts adjusted accordingly. I wrote a tool that allows to monitor the read l

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-28 Thread Jan Friesse
Ferenc, Hi, In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new configuration. vhbl03 corosync[3890]: [TOTEM ] A processor failed

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-28 Thread Digimer
On 2017-08-28 12:07 PM, Ferenc Wágner wrote: > Hi, > > In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day > (in August; in May, it happened 0-2 times a day only, it's slowly > ramping up): > > vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new > configuration. >

[ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-28 Thread Ferenc Wágner
Hi, In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new configuration. vhbl03 corosync[3890]: [TOTEM ] A processor failed, forming n

Re: [ClusterLabs] Resources are stopped and started when one node rejoins

2017-08-28 Thread Octavian Ciobanu
Thank you for info. Looking over the output of the crm_simulate I've noticed the "notice" messages and with the help of the debug mode I've found this sequence in log Aug 28 16:23:19 [13802] node03 crm_simulate:debug: native_assign_node: Assigning node01 to DLM:2 Aug 28 16:23:19 [13802] n

Re: [ClusterLabs] Resources are stopped and started when one node rejoins

2017-08-28 Thread Vladislav Bogdanov
28.08.2017 14:03, Octavian Ciobanu wrote: Hey Vladislav, Thank you for the info. I've tried you suggestions but the behavior is still the same. When an offline/standby node rejoins the cluster all the resources are first stopped and then started. I've added the changes I've made, see below in re

Re: [ClusterLabs] Resources are stopped and started when one node rejoins

2017-08-28 Thread Octavian Ciobanu
Hey Vladislav, Thank you for the info. I've tried you suggestions but the behavior is still the same. When an offline/standby node rejoins the cluster all the resources are first stopped and then started. I've added the changes I've made, see below in reply message, next to your suggestions. Once