I will try the renice solution you proposed.
re-niceing corosync should not be required as the process is supposed to run
with RT-Priority anyway.
I have been thinking that I could increase the token timeout value in
/etc/corosync/corosync.conf , to prevent short hiccups. Did you
Message-
From: Charles Taylor [mailto:tay...@hpc.ufl.edu]
Sent: Wednesday, October 24, 2012 3:33 PM
To: Hall, Shawn
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] Large Corosync/Pacemaker clusters
FWIW, we are running HA Lustre using corosync/pacemaker.We broke our
[mailto:marco.passer...@csc.fi]
Sent: Tuesday, November 06, 2012 7:13 AM
To: lustre-discuss@lists.lustre.org
Cc: Hall, Shawn
Subject: Re: [Lustre-discuss] Large Corosync/Pacemaker clusters
Hi,
I'm also setting up a high-available Lustre system, I configured pairs
for the OSSes and MDSes, redundant
Hi,
We're setting up fairly large Lustre 2.1.2 filesystems, each with 18
nodes and 159 resources all in one Corosync/Pacemaker cluster as
suggested by our vendor. We're getting mixed messages on how large of a
Corosync/Pacemaker cluster will work well between our vendor an others.
1.
Shawn,
In my opinion you shouldn't be running corosync on any more than two
machines. They should be configured in self contained pairs (mds pair,
oss pairs). Anything beyond that would be chaos to manage, even if it
worked. Don't forget the stonith portion. Not every block storage