Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-28 Thread Klaus Wenninger
On 09/29/2016 05:57 AM, Andrew Beekhof wrote: > On Mon, Sep 26, 2016 at 7:39 PM, Klaus Wenninger wrote: >> On 09/24/2016 01:12 AM, Ken Gaillot wrote: >>> On 09/22/2016 05:58 PM, Andrew Beekhof wrote: On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillot >>> > wrote: >>

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-28 Thread Andrew Beekhof
On Mon, Sep 26, 2016 at 7:39 PM, Klaus Wenninger wrote: > On 09/24/2016 01:12 AM, Ken Gaillot wrote: >> On 09/22/2016 05:58 PM, Andrew Beekhof wrote: >>> >>> On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillot >> > wrote: >>> >>> On 09/22/2016 09:53 AM, Jan Pokorný wrote:

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-28 Thread Andrew Beekhof
On Sat, Sep 24, 2016 at 9:12 AM, Ken Gaillot wrote: > On 09/22/2016 05:58 PM, Andrew Beekhof wrote: >> >> >> On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillot > > wrote: >> >> On 09/22/2016 09:53 AM, Jan Pokorný wrote: >> > On 22/09/16 08:42 +0200, Kristoffer Grönlun

Re: [ClusterLabs] Pacemaker quorum behavior

2016-09-28 Thread Ken Gaillot
On 09/28/2016 03:57 PM, Scott Greenlese wrote: > A quick addendum... > > After sending this post, I decided to stop pacemaker on the single, > Online node in the cluster, > and this effectively killed the corosync daemon: > > [root@zs93kl VD]# date;pcs cluster stop > Wed Sep 28 16:39:22 EDT 2016

Re: [ClusterLabs] Failed to retrieve meta-data for custom ocf resource

2016-09-28 Thread Ken Gaillot
On 09/28/2016 04:04 PM, Christopher Harvey wrote: > My corosync/pacemaker logs are seeing a bunch of messages like the > following: > > Sep 22 14:50:36 [1346] node-132-60 crmd: info: > action_synced_wait: Managed MsgBB-Active_meta-data_0 process 15613 > exited with rc=4 This is the

[ClusterLabs] Failed to retrieve meta-data for custom ocf resource

2016-09-28 Thread Christopher Harvey
My corosync/pacemaker logs are seeing a bunch of messages like the following: Sep 22 14:50:36 [1346] node-132-60 crmd: info: action_synced_wait: Managed MsgBB-Active_meta-data_0 process 15613 exited with rc=4 Sep 22 14:50:36 [1346] node-132-60 crmd:error: generic_get_metada

Re: [ClusterLabs] Pacemaker quorum behavior

2016-09-28 Thread Scott Greenlese
A quick addendum... After sending this post, I decided to stop pacemaker on the single, Online node in the cluster, and this effectively killed the corosync daemon: [root@zs93kl VD]# date;pcs cluster stop Wed Sep 28 16:39:22 EDT 2016 Stopping Cluster (pacemaker)... Stopping Cluster (corosync)...

Re: [ClusterLabs] Failover debugging

2016-09-28 Thread Evan Rinaldo
Thanks for the information. We are actually a few revs behind unfortunately. Thanks On Wed, Sep 28, 2016 at 12:38 AM, Klaus Wenninger wrote: > On 09/28/2016 03:13 AM, Evan Rinaldo wrote: > > Is it possible to trigger the blackbox recorder or even a crm_report > > on a failover event. I know th

Re: [ClusterLabs] Pacemaker quorum behavior

2016-09-28 Thread Scott Greenlese
Hi folks.. I have some follow-up questions about corosync daemon status after cluster shutdown. Basically, what should happen to corosync on a cluster node when pacemaker is shutdown on that node? On my 5 node cluster, when I do a global shutdown, the pacemaker processes exit, but corosync proce

Re: [ClusterLabs] pacemaker_remoted XML parse error

2016-09-28 Thread Radoslaw Garbacz
Just to add maybe a helpful observation: either "cib" or "pengine" process goes to ~100% CPU when this remote nodes errors happen. On Tue, Sep 27, 2016 at 2:36 PM, Radoslaw Garbacz < radoslaw.garb...@xtremedatainc.com> wrote: > Hi, > > I encountered the same problem with pacemaker built from gith