[ClusterLabs] Restarting a failed resource on same node

2017-10-02 Thread Paolo Zarpellon
Hi, on a basic 2-node cluster, I have a master-slave resource where master runs on a node and slave on the other one. If I kill the slave resource, the resource status goes to "stopped". Similarly, if I kill the the master resource, the slave one is promoted to master but the failed one does not

Re: [ClusterLabs] Is "Process pause detected" triggered too easily?

2017-10-02 Thread Jean-Marc Saffroy
On Mon, 2 Oct 2017, Jan Friesse wrote: > > We had one problem on a real deployment of DLM+corosync (5 voters and 20 > > non-voters, with dlm on those 20, for a specific application that uses > > What you mean by voters and non-voters? There is 25 nodes in total and > each of them is running

Re: [ClusterLabs] Is "Process pause detected" triggered too easily?

2017-10-02 Thread Jan Friesse
On Wed, 27 Sep 2017, Jan Friesse wrote: I don't think scheduling is the case. If scheduler would be the case other message (Corosync main process was not scheduled for ...) would kick in. This looks more like a something is blocked in totemsrp. Ah, interesting! Also, it looks like the side

[ClusterLabs] Moving PAF to clusterlabs (was: PostgreSQL Automatic Failover (PAF) v2.2.0)

2017-10-02 Thread Jehan-Guillaume de Rorthais
Hi All, Sorry, this discussion spanned over two different discussions over time...Renaming to the original subject. On Wed, 13 Sep 2017 08:03:14 -0700 Digimer wrote: > On 2017-09-13 07:15 AM, Jehan-Guillaume de Rorthais wrote: > > On Tue, 12 Sep 2017 08:02:00 -0700 > >

Re: [ClusterLabs] monitor failed actions not cleared

2017-10-02 Thread LE COQUIL Pierre-Yves
Hi, I finally found my mistake: I have set up the failure-timeout like the lifetime example in the RedHat Documentation with the value PT1M. If I set up the failure-timeout with 60, it works like it should. Just trying a last question ...: Couldn't it be something in the log telling the value