Hi,

I saw from mailing list archives that topic in subject was discussed
already before, but I thought I may add my experience into the debate.

I'm running (not yet in production, so I'm free to make all kind of
tests) three node Corosync/Pacemaker cluster (Rita, Sara, Quorum-rs).
Two nodes are physical machines running Xen (Rita and Sara), while
third one (Quorum-rs) is a virtual machine running on another physical
host (so it never runs on Rita or Sara). Quorum-rs is always on
standby, never runs any services and, like name suggests, is there
just to to be counted for quorum.

Rita and Sara have small DRBD+OCFS2 shared storage, just to hold
configuration files of Xen domains, some iso images and so on. There
is also NFS mounted partition on both Rita and Sara from some external
NAS device. Both Rita and Sara have dedicated eth1 interface to NFS,
which is neither used by Corosync or DRBD.

There is for example one Xen HVM virtual machine running WindowsXP,
which has its disk as a file on NFS. If I exercise disk in WinXP
domain (by running 'sdelete -c c:'), things get worse. I start getting
messages like below on the node where this WinXP guest is running ...

Aug 26 10:09:28 rita corosync[1359]:   [TOTEM ] Process pause detected
for 5614 ms, flushing membership messages.
Aug 26 10:09:28 rita corosync[1359]:   [pcmk  ] notice:
pcmk_peer_update: Transitional membership event on ring 128: memb=3,
new=0, lost=0
Aug 26 10:09:28 rita corosync[1359]:   [pcmk  ] info:
pcmk_peer_update: memb: rita 16863498
Aug 26 10:09:28 rita corosync[1359]:   [pcmk  ] info:
pcmk_peer_update: memb: sara 33640714
Aug 26 10:09:28 rita corosync[1359]:   [pcmk  ] info:
pcmk_peer_update: memb: quorum-rs 50417930
Aug 26 10:09:28 rita corosync[1359]:   [pcmk  ] notice:
pcmk_peer_update: Stable membership event on ring 128: memb=3, new=0,
lost=0
Aug 26 10:09:28 rita corosync[1359]:   [pcmk  ] info:
pcmk_peer_update: MEMB: rita 16863498
Aug 26 10:09:28 rita corosync[1359]:   [pcmk  ] info:
pcmk_peer_update: MEMB: sara 33640714
Aug 26 10:09:28 rita corosync[1359]:   [pcmk  ] info:
pcmk_peer_update: MEMB: quorum-rs 50417930
Aug 26 10:09:28 rita corosync[1359]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Aug 26 10:09:28 rita corosync[1359]:   [MAIN  ] Completed service
synchronization, ready to provide service.

... if Corosync is paused for longer time, like in case ...

Aug 26 10:11:00 rita corosync[1359]:   [TOTEM ] Process pause detected
for 16948 ms, flushing membership messages.

... it's detected by remaining two nodes as failed node and fenced
(powered down). It's no surprise, because when I do tcpdump on
Quorum-rs server for example, I can see a period of around 17seconds
when I'm not receiving any messages from Rita running WinXP guest ...

10:11:07.285895 IP 10.81.1.1.5404 > 239.94.81.2.5405: UDP, length 82
10:11:07.600773 IP 10.81.1.1.5404 > 239.94.81.2.5405: UDP, length 82
10:11:07.915640 IP 10.81.1.1.5404 > 239.94.81.2.5405: UDP, length 82
10:11:08.230572 IP 10.81.1.1.5404 > 239.94.81.2.5405: UDP, length 82
10:11:08.547069 IP 10.81.1.1.5404 > 239.94.81.2.5405: UDP, length 82
10:11:08.862314 IP 10.81.1.1.5404 > 239.94.81.2.5405: UDP, length 82
10:11:25.923506 IP 10.81.1.1.5404 > 239.94.81.2.5405: UDP, length 82
 <---- gap here
10:11:25.923676 IP 10.81.1.1.5404 > 239.94.81.2.5405: UDP, length 200
10:11:26.132995 IP 10.81.1.1.5404 > 239.94.81.2.5405: UDP, length 200

Things which I've tried so far:

 1.) both Rita and Sara have dedicated two CPU cores (0-1) and domUs
can run on (2-7) only
   xen_commandline        : dom0_mem=768M dom0_max_vcpus=2 dom0_vcpus_pin

  Name                                ID  VCPU   CPU State   Time(s)
CPU Affinity
  Domain-0                             0     0     0   r--     221.3 0
  Domain-0                             0     1     1   -b-     280.7 1

2.) changed credit scheduler weight for Domain0 from default 256 to 1024

  Name                                ID Weight  Cap
  Domain-0                             0   1024    0

3.) recompiled Corosync with patch
corosync-trunk-reset-pause-timestamp-on-events.patch applied

... and none of it made any difference. I can reproduce the problem
each and every time.

Does anybody have any hints what to try next? I'm thinking of
switching to jumbo frames for eth1 NIC towards NFS server (I should be
able to do it on both ends) and to recompile kernel with
CONFIG_PREEMPT set to Yes. I can try perhaps Linux Trace Toolkit?

As for the versions, it's Debian/Squeeze installation:

corosync/squeeze uptodate 1.2.1-4
pacemaker/squeeze uptodate 1.0.9.1+hg15626-1
ocfs2-tools/squeeze uptodate 1.4.4-3
kernel 2.6.32-5-xen-amd64

Best Regards,
Martin
_______________________________________________
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to