Jean,

Hello,

As the subject line suggests, I am wondering why I see so many of these
log lines (many means about 10 times per minute, usually several in the
same second):

Sep 26 19:56:24 [950] vm0 corosync notice  [TOTEM ] Process pause detected
for 2555 ms, flushing membership messages.
Sep 26 19:56:24 [950] vm0 corosync notice  [TOTEM ] Process pause detected
for 2558 ms, flushing membership messages.

Let me add some context:
- this is observed in 3 small VMs on my laptop
- the OS is CentOS 7.3, corosync is 2.4.0-9.el7_4.2
- these VMs only run corosync, nothing else
- the VM host (my laptop) is idle 60-80% of the time
- VMs are qemu-kvm guests, connected with tap interfaces
- AND the messages only appear when, on one of the VMs, I do stop/start
corosync in a tight loop, like this:

[root@vm2 ~]# while :; do echo $(date) stop; systemctl stop corosync ;
echo $(date) start;systemctl start corosync ; done
Tue Sep 26 19:50:19 CEST 2017 stop
Tue Sep 26 19:50:21 CEST 2017 start
Tue Sep 26 19:50:21 CEST 2017 stop
Tue Sep 26 19:50:22 CEST 2017 start
...

I understand that this kind of test is stressful (and quite articial), but
I'm still surprised to see these particular messages, because it seems to
me a bit unlikely that the corosync process is not properly scheduled for
seconds at a time so frequently (several times per minute).

I don't think scheduling is the case. If scheduler would be the case other message (Corosync main process was not scheduled for ...) would kick in. This looks more like a something is blocked in totemsrp.



So I wonder if maybe there could be other explanations?

Also, it looks like the side effect is that corosync drops important
messages (I think "join" messages?), and I fear that this can lead to

You mean membership join messages? Because there are a lot (327) of them in log you've sent.

bigger issues with DLM (which is why I'm looking into this in the first
place).

In case that's helpful, attached are 10 minutes of corosync log and the
config file I'm using (it has 5 nodes declared, but I reproduce even with
just 3 nodes).

Thanks in advance for any suggestion!

I'll definitively try to reproduce this bug and let you know. I don't think any message get lost, but it's better to be on a safe side.

Regards,
  Honza



Cheers,
JM



_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to