[ClusterLabs] Pacemaker dependency fails when upgrading OS (Amazon Linux)

2016-09-29 Thread neeraj ch
Hello, I have pacemaker cluster running on Amazon Linux 2013.03 , details as follows. OS : Amazon Linux 2013.03 64 bit (based off on el6) Pacemaker version : 1.1.12 downloaded form http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/ I downloaded all of pacem

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-29 Thread Ken Gaillot
On 09/28/2016 10:54 PM, Andrew Beekhof wrote: > On Sat, Sep 24, 2016 at 9:12 AM, Ken Gaillot wrote: >>> "Ignore" is theoretically possible to escalate, e.g. "ignore 3 failures >>> then migrate", but I can't think of a real-world situation where that >>> makes sense, >>> >>> >>> really?

Re: [ClusterLabs] Failed to retrieve meta-data for custom ocf resource

2016-09-29 Thread Christopher Harvey
On Thu, Sep 29, 2016, at 02:45 PM, Jan Pokorný wrote: > Hello , > > On 29/09/16 12:41 -0400, Christopher Harvey wrote: > > I think something is failing at the execvp() level. I'm seeing > > useful looking trace logs in the code, but can't enable them right > > now. I have: > > PCMK_debug=yes > >

Re: [ClusterLabs] Failed to retrieve meta-data for custom ocf resource

2016-09-29 Thread Jan Pokorný
Hello , On 29/09/16 12:41 -0400, Christopher Harvey wrote: > I think something is failing at the execvp() level. I'm seeing > useful looking trace logs in the code, but can't enable them right > now. I have: > PCMK_debug=yes > PCMK_logfile=/tmp/pacemaker.log > PCMK_logpriority=debug > PCMK_trace_

Re: [ClusterLabs] Pacemaker quorum behavior

2016-09-29 Thread Jan Pokorný
On 28/09/16 16:30 -0400, Scott Greenlese wrote: > Also, I have tried simulating a failed cluster node (to trigger a > STONITH action) by killing the corosync daemon on one node, but all > that does is respawn the daemon ... causing a temporary / transient > failure condition, and no fence takes p

Re: [ClusterLabs] Failed to retrieve meta-data for custom ocf resource

2016-09-29 Thread Christopher Harvey
On Thu, Sep 29, 2016, at 12:20 PM, Jan Pokorný wrote: > On 28/09/16 16:55 -0500, Ken Gaillot wrote: > > On 09/28/2016 04:04 PM, Christopher Harvey wrote: > >> My corosync/pacemaker logs are seeing a bunch of messages like the > >> following: > >> > >> Sep 22 14:50:36 [1346] node-132-60 crmd:

Re: [ClusterLabs] Failed to retrieve meta-data for custom ocf resource

2016-09-29 Thread Jan Pokorný
On 28/09/16 16:55 -0500, Ken Gaillot wrote: > On 09/28/2016 04:04 PM, Christopher Harvey wrote: >> My corosync/pacemaker logs are seeing a bunch of messages like the >> following: >> >> Sep 22 14:50:36 [1346] node-132-60 crmd: info: >> action_synced_wait: Managed MsgBB-Active_meta-da

Re: [ClusterLabs] Pacemaker quorum behavior

2016-09-29 Thread Tomas Jelinek
Dne 29.9.2016 v 00:14 Ken Gaillot napsal(a): On 09/28/2016 03:57 PM, Scott Greenlese wrote: A quick addendum... After sending this post, I decided to stop pacemaker on the single, Online node in the cluster, and this effectively killed the corosync daemon: [root@zs93kl VD]# date;pcs cluster st

Re: [ClusterLabs] Failover debugging

2016-09-29 Thread Klaus Wenninger
On 09/28/2016 10:55 PM, Evan Rinaldo wrote: > Thanks for the information. We are actually a few revs behind > unfortunately. You can still have a look at the - meanwhile legacy - features ClusterMon-RA and/or the global cluster properties notification-agent/notification-recipient. The first should