Re: [ClusterLabs] Antw: Announcing hawk-apiserver, now in ClusterLabs

2019-02-13 Thread Adam Spiers
Ulrich Windl wrote: Hello! I'd like to comment as an "old" SuSE customer: I'm amazed that lighttpd is dropped in favor of some new go application: SuSE now has a base system that needs (correct me if I'm wrong): shell, perl, python, java, go, ruby, ...? Sorry for the off-topic nitpick, but my

Re: [ClusterLabs] [questionnaire] Do you manage your pacemaker configuration by hand and (if so) what reusability features do you use?

2018-06-14 Thread Adam Spiers
Jan Pokorný wrote: On 31/05/18 14:48 +0200, Jan Pokorný wrote: I am soliciting feedback on these CIB features related questions, please reply (preferably on-list so we have the shared collective knowledge) if at least one of the questions is answered positively in your case (just tick the respe

Re: [ClusterLabs] Ansible role to configure Pacemaker

2018-06-07 Thread Adam Spiers
Jan Pokorný wrote: While I see why Ansible is compelling, I feel it's important to challenge this trend of trying to bend/rebrand _machine-local configuration management tool_ as _distributed system management tool_ (pacemaker is distributed application/framework of sorts), which Ansible alone i

Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-03-29 Thread Adam Spiers
Kristoffer Gronlund wrote: Ken Gaillot writes: Hi all, Andrew Beekhof brought up a potential change to help with reading Pacemaker logs. Great idea! [snipped] Better to do it now rather than later. I vote in favor of changing the names. Yes, it'll mess up crmsh, but at least for distribu

Re: [ClusterLabs] Misunderstanding or bug in crm_simulate output

2018-01-18 Thread Adam Spiers
Jehan-Guillaume de Rorthais wrote: Hi list, I was explaining how to use crm_simulate to a colleague when he pointed to me a non expected and buggy output. [snipped] Probably related: https://bugs.clusterlabs.org/show_bug.cgi?id=5294 ___ Users mai

Re: [ClusterLabs] Antw: Feedback wanted: changing "master/slave" terminology

2018-01-17 Thread Adam Spiers
Ulrich Windl wrote: Ken Gaillot schrieb am 16.01.2018 um 23:33 in Nachricht <1516142036.5604.3.ca...@redhat.com>: As we look to release Pacemaker 2.0 and (separately) update the OCF standard, this is a good time to revisit the terminology and syntax we use for master/slave resources. I think

Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology

2018-01-17 Thread Adam Spiers
Digimer wrote: On 2018-01-16 05:33 PM, Ken Gaillot wrote: As we look to release Pacemaker 2.0 and (separately) update the OCF standard, this is a good time to revisit the terminology and syntax we use for master/slave resources. I think the term "stateful resource" is a better substitute for "

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Coming in Pacemaker 2.0.0: /var/log/pacemaker/pacemaker.log

2018-01-15 Thread Adam Spiers
Ken Gaillot wrote: On Mon, 2018-01-15 at 12:40 +, Adam Spiers wrote: Ulrich Windl wrote: But for a general solution, do you think it's more clean to have the same directory with identical properties in multiple packages, or to have one package that owns that directory? This que

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Coming in Pacemaker 2.0.0: /var/log/pacemaker/pacemaker.log

2018-01-15 Thread Adam Spiers
Ulrich Windl wrote: Vladislav Bogdanov schrieb: 15.01.2018 11:23, Ulrich Windl wrote: Vladislav Bogdanov schrieb: 11.01.2018 18:39, Ken Gaillot wrote: [...] I thought one option aired at the summit to address this was /var/log/clusterlabs, but it's entirely possible my memory's playing tr

Re: [ClusterLabs] Coming in Pacemaker 2.0.0: /var/log/pacemaker/pacemaker.log

2018-01-10 Thread Adam Spiers
Ken Gaillot wrote: The initial proposal, after discussion at last year's summit, was to use /var/log/cluster/pacemaker.log instead. That turned out to be slightly problematic: it broke some regression tests in a way that wasn't easily fixable, and more significantly, it raises the question of wh

Re: [ClusterLabs] low-cost ways to make Pacemaker more usable?

2017-12-07 Thread Adam Spiers
Ken Gaillot wrote: On Thu, 2017-12-07 at 17:15 +, Adam Spiers wrote: For example, making a few of the most crucial existing log messages less cryptic could maybe go a long way. Or if "dumbing down" log messages would make life harder for developers who are familiar with

[ClusterLabs] low-cost ways to make Pacemaker more usable?

2017-12-07 Thread Adam Spiers
Ken Gaillot wrote: On Thu, 2017-12-07 at 12:13 +, Adam Spiers wrote: https://gocardless.com/blog/incident-review-api-and-dashboard-outage- on-10th-october/ It's a great write-up, although a little frustrating that it is still not fully understood why a -inf colocation failed wher

[ClusterLabs] interesting blog on Pacemaker-related outage

2017-12-07 Thread Adam Spiers
https://gocardless.com/blog/incident-review-api-and-dashboard-outage-on-10th-october/ It's a great write-up, although a little frustrating that it is still not fully understood why a -inf colocation failed whereas a +inf succeeded. (I actually have a vague memory of discovering something very si

Re: [ClusterLabs] questions about startup fencing

2017-12-06 Thread Adam Spiers
Ken Gaillot wrote: On Thu, 2017-11-30 at 11:58 +, Adam Spiers wrote: Ken Gaillot wrote: On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote: [snipped] Let's suppose further that the cluster configuration is such that no stateful resources which could potentially conflict with

Re: [ClusterLabs] questions about startup fencing

2017-11-30 Thread Adam Spiers
Ken Gaillot wrote: On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote: Hi all, A colleague has been valiantly trying to help me belatedly learn about the intricacies of startup fencing, but I'm still not fully understanding some of the finer points of the behaviour. The documentati

Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Adam Spiers
Kristoffer Gronlund wrote: Adam Spiers writes: Kristoffer Gronlund wrote: Adam Spiers writes: - The whole cluster is shut down cleanly. - The whole cluster is then started up again. (Side question: what happens if the last node to shut down is not the first to start up? How will

Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Adam Spiers
Klaus Wenninger wrote: On 11/29/2017 04:23 PM, Kristoffer Grönlund wrote: Adam Spiers writes: - The whole cluster is shut down cleanly. - The whole cluster is then started up again. (Side question: what happens if the last node to shut down is not the first to start up? How will the

Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Adam Spiers
Kristoffer Gronlund wrote: Adam Spiers writes: - The whole cluster is shut down cleanly. - The whole cluster is then started up again. (Side question: what happens if the last node to shut down is not the first to start up? How will the cluster ensure it has the most recent version of

[ClusterLabs] questions about startup fencing

2017-11-29 Thread Adam Spiers
Hi all, A colleague has been valiantly trying to help me belatedly learn about the intricacies of startup fencing, but I'm still not fully understanding some of the finer points of the behaviour. The documentation on the "startup-fencing" option[0] says Advanced Use Only: Should the cluster

Re: [ClusterLabs] Where to Find pcs and pcsd for OpenSUSE LEAP 4.23

2017-11-10 Thread Adam Spiers
Eric Robinson wrote: Which aspects of its constraints handling do you like, and why? I'm curious, since I wasn't aware that it was significantly different from crmsh in this respect. Well, to be fair, in the past I have always configured my clusters by using 'crm configure edit' and building

Re: [ClusterLabs] Where to Find pcs and pcsd for OpenSUSE LEAP 4.23

2017-11-07 Thread Adam Spiers
Eric Robinson wrote: Thanks much. I am experienced with crmsh because I have been using it for years, but I recently tried pcs and I really like the way it handles constraints. Which aspects of its constraints handling do you like, and why? I'm curious, since I wasn't aware that it was signif

[ClusterLabs] strange behaviour from pacemaker_remote

2017-09-27 Thread Adam Spiers
Hi all, When I do a pkill -9 -f pacemaker_remote to simulate failure of a remote node, sometimes I see things like: 08:29:32 d52-54-00-da-4e-05 pacemaker_remoted[5806]: error: No ipc providers available for uid 0 gid 0 08:29:32 d52-54-00-da-4e-05 pacemaker_remoted[5806]: error: Err

Re: [ClusterLabs] New website design and new-new logo

2017-09-21 Thread Adam Spiers
Kai Dupke wrote: On 09/21/2017 04:42 PM, Ken Gaillot wrote: Yes, the FAQ needs an overhaul as well -- all the Pacemaker-specific questions should be moved to a separate Pacemaker FAQ, and the top FAQ should just have questions about ClusterLabs plus links to project FAQs Can we make this a wi

Re: [ClusterLabs] Clusterlabs Summit: Expect rain tomorrow

2017-09-05 Thread Adam Spiers
Kristoffer Gronlund wrote: > Hey everyone! > > I am going to try to be at the event area at 8 in the morning tomorrow, > and I wouldn't recommend showing up earlier than that. The doors will > probably be locked. The summit itself is scheduled to start at 9. > > Unfortunately it seems we can exp

Re: [ClusterLabs] [ClusterLabs Developers] [HA/ClusterLabs Summit] Key-Signing Party, 2017 Edition

2017-07-23 Thread Adam Spiers
Hi Jan :-) Jan Pokorný wrote: Hello cluster masters :-) as there's little less than 7 weeks left to "The Summit" meetup (), it's about time to get the ball rolling so we can voluntarily augment the digital trust amongst us the attendees, on OpenGPG basis. Doing that,

Re: [ClusterLabs] Clusterlabs Summit 2017 (Nuremberg, 6-7 September) - Hotels and Topics

2017-05-03 Thread Adam Spiers
Kristoffer Gronlund wrote: > Hi everyone! > > Here's a quick update on the summit happening at the SUSE office in > Nuremberg on September 6-7. [snipped] > I am also happy to say that Adam Spiers from the SUSE Cloud team will be > attending the summit, and hopefull

Re: [ClusterLabs] Never join a list without a problem...

2017-03-01 Thread Adam Spiers
Ferenc Wágner wrote: Jeffrey Westgate writes: We use Nagios to monitor, and once every 20 to 40 hours - sometimes longer, and we cannot set a clock by it - while the machine is 95% idle (or more according to 'top'), the host load shoots up to 50 or 60%. It takes about 20 minutes to peak, and

Re: [ClusterLabs] Pacemaker 1.1.16 - Release Candidate 1

2016-11-03 Thread Adam Spiers
Klaus Wenninger wrote: > On 11/03/2016 05:28 PM, Adam Spiers wrote: > > Ken Gaillot wrote: > >> ClusterLabs is happy to announce the first release candidate for > >> Pacemaker version 1.1.16. Source code is available at: > >> > >> https://github.com

Re: [ClusterLabs] Pacemaker 1.1.16 - Release Candidate 1

2016-11-03 Thread Adam Spiers
Ken Gaillot wrote: > ClusterLabs is happy to announce the first release candidate for > Pacemaker version 1.1.16. Source code is available at: > > https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.16-rc1 > > The most significant enhancements in this release are: [snipped] > *

Re: [ClusterLabs] Doing reload right

2016-07-21 Thread Adam Spiers
Ken Gaillot wrote: > On 07/20/2016 07:32 PM, Andrew Beekhof wrote: > > On Thu, Jul 21, 2016 at 2:47 AM, Adam Spiers wrote: > >> Ken Gaillot wrote: > >>> Hello all, > >>> > >>> I've been meaning to address the implementation of "

Re: [ClusterLabs] Doing reload right

2016-07-21 Thread Adam Spiers
Ken Gaillot wrote: > On 07/20/2016 11:47 AM, Adam Spiers wrote: > > Ken Gaillot wrote: > >> Hello all, > >> > >> I've been meaning to address the implementation of "reload" in Pacemaker > >> for a while now, and I think the next releas

Re: [ClusterLabs] Doing reload right

2016-07-20 Thread Adam Spiers
Ken Gaillot wrote: > Hello all, > > I've been meaning to address the implementation of "reload" in Pacemaker > for a while now, and I think the next release will be a good time, as it > seems to be coming up more frequently. [snipped] I don't want to comment directly on any of the excellent poi

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-25 Thread Adam Spiers
Ken Gaillot wrote: > On 06/24/2016 05:41 AM, Adam Spiers wrote: > > Andrew Beekhof wrote: > >> On Fri, Jun 24, 2016 at 1:01 AM, Adam Spiers wrote: > >>> Andrew Beekhof wrote: > >>>>> Earlier in this thread I proposed > >>>>> the

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-24 Thread Adam Spiers
Andrew Beekhof wrote: > On Fri, Jun 24, 2016 at 1:01 AM, Adam Spiers wrote: > > Andrew Beekhof wrote: > > >> > Well, if you're OK with bending the rules like this then that's good > >> > enough for me to say we should at least try it :) > >

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-23 Thread Adam Spiers
Adam Spiers wrote: > As per the FIXME, one remaining problem is dealing with this kind of > scenario: > > - Cloud operator notices SMART warnings on the compute node > which is not yet causing hard failures but signifies that the > hard disk might die soon. > &g

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-23 Thread Adam Spiers
Andrew Beekhof wrote: > On Wed, Jun 15, 2016 at 10:42 PM, Adam Spiers wrote: > > Andrew Beekhof wrote: > >> On Mon, Jun 13, 2016 at 9:34 PM, Adam Spiers wrote: > >> > Andrew Beekhof wrote: > >> >> On Wed, Jun 8, 2016 at 6:23 PM, Adam S

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-15 Thread Adam Spiers
Andrew Beekhof wrote: > On Mon, Jun 13, 2016 at 9:34 PM, Adam Spiers wrote: > > Andrew Beekhof wrote: > >> On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers wrote: > >> > Andrew Beekhof wrote: > >> >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers wrote

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-13 Thread Adam Spiers
Andrew Beekhof wrote: > On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers wrote: > > Andrew Beekhof wrote: > >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers wrote: > >> > Ken Gaillot wrote: > >> >> On 06/06/2016 05:45 PM, Adam Spiers wrote: > >>

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-08 Thread Adam Spiers
Andrew Beekhof wrote: > On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers wrote: > > Ken Gaillot wrote: > >> On 06/06/2016 05:45 PM, Adam Spiers wrote: > >> > Adam Spiers wrote: > >> >> Andrew Beekhof wrote: > >> >>> On Tue, Jun 7, 20

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-07 Thread Adam Spiers
Ken Gaillot wrote: > On 06/06/2016 05:45 PM, Adam Spiers wrote: > > Adam Spiers wrote: > >> Andrew Beekhof wrote: > >>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers wrote: > >>>> Ken Gaillot wrote: > >>>>> My main question is ho

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Adam Spiers
Adam Spiers wrote: > Andrew Beekhof wrote: > > On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers wrote: > > > Ken Gaillot wrote: > > >> My main question is how useful would it actually be in the proposed use > > >> cases. Considering the possibility that th

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Adam Spiers
Andrew Beekhof wrote: > On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers wrote: > > Ken Gaillot wrote: > >> On 06/02/2016 08:01 PM, Andrew Beekhof wrote: > >> > On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot wrote: > >> >> A recent thread discus

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Adam Spiers
Ken Gaillot wrote: > On 06/02/2016 08:01 PM, Andrew Beekhof wrote: > > On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot wrote: > >> A recent thread discussed a proposed new feature, a new environment > >> variable that would be passed to resource agents, indicating whether a > >> stop action was part

Re: [ClusterLabs] Antw: Re: FR: send failcount to OCF RA start/stop actions

2016-05-23 Thread Adam Spiers
Ken Gaillot wrote: > On 05/20/2016 10:40 AM, Adam Spiers wrote: > > Ken Gaillot wrote: > >> Just musing a bit ... on-fail + migration-threshold could have been > >> designed to be more flexible: > >> > >> hard-fail-threshold: When an operation

Re: [ClusterLabs] Antw: Re: Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-05-20 Thread Adam Spiers
Klaus Wenninger wrote: > On 05/20/2016 08:39 AM, Ulrich Windl wrote: > Jehan-Guillaume de Rorthais schrieb am 19.05.2016 um > 21:29 in > > Nachricht <20160519212947.6cc0fd7b@firost>: > > [...] > >> I was thinking of a use case where a graceful demote or stop action failed > >> multiple

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-05-20 Thread Adam Spiers
Ken Gaillot wrote: > A recent thread discussed a proposed new feature, a new environment > variable that would be passed to resource agents, indicating whether a > stop action was part of a recovery. > > Since that thread was long and covered a lot of topics, I'm starting a > new one to focus on

Re: [ClusterLabs] Antw: Re: FR: send failcount to OCF RA start/stop actions

2016-05-20 Thread Adam Spiers
Ken Gaillot wrote: > On 05/12/2016 06:21 AM, Adam Spiers wrote: > > Ken Gaillot wrote: > >> On 05/10/2016 02:29 AM, Ulrich Windl wrote: > >>>> Here is what I'm testing currently: > >>>> > >>>> - When the cluster recovers a resou

Re: [ClusterLabs] Pacemaker with Zookeeper??

2016-05-17 Thread Adam Spiers
Bogdan Dobrelya wrote: > On 05/16/2016 09:23 AM, Jan Friesse wrote: > >> Hi, > >> > >> I have an idea: use Pacemaker with Zookeeper (instead of Corosync). Is > >> it possible? > >> Is there any examination about that? > > Indeed, would be *great* to have a Pacemaker based control plane on top > o

Re: [ClusterLabs] Antw: Re: FR: send failcount to OCF RA start/stop actions

2016-05-12 Thread Adam Spiers
Hi Ken, Firstly thanks a lot not just for working on this, but also for being so proactive in discussing the details. A perfect example of OpenStack's "Open Design" philosophy in action :-) Ken Gaillot wrote: > On 05/10/2016 02:29 AM, Ulrich Windl wrote: > Ken Gaillot schrieb am 10.05.201

Re: [ClusterLabs] FR: send failcount to OCF RA start/stop actions

2016-05-04 Thread Adam Spiers
Ken Gaillot wrote: > On 05/04/2016 08:49 AM, Klaus Wenninger wrote: > > On 05/04/2016 02:09 PM, Adam Spiers wrote: > >> Hi all, > >> > >> As discussed with Ken and Andrew at the OpenStack summit last week, we > >> would like Pacemaker to be extend

[ClusterLabs] FR: send failcount to OCF RA start/stop actions

2016-05-04 Thread Adam Spiers
Hi all, As discussed with Ken and Andrew at the OpenStack summit last week, we would like Pacemaker to be extended to export the current failcount as an environment variable to OCF RA scripts when they are invoked with 'start' or 'stop' actions. This would mean that if you have start-failure-is-f

Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts

2016-04-22 Thread Adam Spiers
Ken Gaillot wrote: > On 04/21/2016 06:09 PM, Adam Spiers wrote: > > Ken Gaillot wrote: > >> Hello everybody, > >> > >> The release cycle for 1.1.15 will be started soon (hopefully tomorrow)! > >> > >> The most prominent feature will be Kla

Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts

2016-04-21 Thread Adam Spiers
Ken Gaillot wrote: > Hello everybody, > > The release cycle for 1.1.15 will be started soon (hopefully tomorrow)! > > The most prominent feature will be Klaus Wenninger's new implementation > of event-driven alerts -- the ability to call scripts whenever > interesting events occur (nodes joining

Re: [ClusterLabs] HA meetup at OpenStack Summit

2016-04-21 Thread Adam Spiers
(vendor booths). I'll put a ClusterLabs sign on the table to help people > find it. > > On 04/14/2016 09:53 AM, Adam Spiers wrote: > > Ken Gaillot wrote: > >> Hi everybody, > >> > >> The upcoming OpenStack Summit is April 25-29 in Austin, Texas (U

Re: [ClusterLabs] service flap as nodes join and leave

2016-04-14 Thread Adam Spiers
Ken Gaillot wrote: > On 04/14/2016 09:33 AM, Christopher Harvey wrote: > > MsgBB-Active is a dummy resource that simply returns OCF_SUCCESS on > > every operation and logs to a file. > > That's a common mistake, and will confuse the cluster. The cluster > checks the status of resources both where

Re: [ClusterLabs] HA meetup at OpenStack Summit

2016-04-14 Thread Adam Spiers
Ken Gaillot wrote: > Hi everybody, > > The upcoming OpenStack Summit is April 25-29 in Austin, Texas (US). Some > regular ClusterLabs contributors are going, so I was wondering if anyone > would like to do an informal meetup sometime during the summit. > > It looks like the best time would be th

Re: [ClusterLabs] IPMI working but evacuations don't work‏

2016-03-31 Thread Adam Spiers
Digimer wrote: > On 31/03/16 02:26 AM, Moiz Arif wrote: > > Hi, > > > > I am working on VM evacuations and i have noticed that when my compute > > node's network is disconnected there is call from STONITH to fence the > > node and my node gets rebooted. But the VMs are not evacuated. I have > > ch

Re: [ClusterLabs] Set "start-failure-is-fatal=false" on only one resource?

2016-03-24 Thread Adam Spiers
Sam Gardner wrote: > I'm having some trouble on a few of my clusters in which the DRBD Slave > resource does not want to come up after a reboot until I manually run > resource cleanup. > > Setting 'start-failure-is-fatal=false' as a global cluster property and a > failure-timeout works to reso

Re: [ClusterLabs] documentation on STONITH with remote nodes?

2016-03-14 Thread Adam Spiers
Ken Gaillot wrote: > On 03/12/2016 05:07 AM, Adam Spiers wrote: > > Is there any documentation on how STONITH works on remote nodes? I > > couldn't find any on clusterlabs.org, and it's conspicuously missing > > from: > > > > http://clusterl

[ClusterLabs] documentation on STONITH with remote nodes?

2016-03-12 Thread Adam Spiers
Is there any documentation on how STONITH works on remote nodes? I couldn't find any on clusterlabs.org, and it's conspicuously missing from: http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Remote/ I'm guessing the answer is more or less "it works exactly the same as for cor

Re: [ClusterLabs] Corosync main process was not scheduled for 115935.2266 ms (threshold is 800.0000 ms). Consider token timeout increase.

2016-02-24 Thread Adam Spiers
Hi all, Jan Friesse wrote: > >>>There is really no help. It's best to make sure corosync is scheduled > >regularly. > >I may sound silly, but how can I do it? > > It's actually very hard to say. Pauses like 30 sec is really unusual > and shouldn't happen (specially with RT scheduling). It is usu

Re: [ClusterLabs] Coming in Pacemaker 1.1.15: graceful Pacemaker Remote node stops

2016-02-19 Thread Adam Spiers
Ken Gaillot wrote: > Pacemaker's upstream master branch has a new feature that will be part > of the eventual 1.1.15 release. [snipped] > This new feature makes updates of Pacemaker Remote nodes more similar to > that of cluster nodes -- simply stop cluster services (in this case > pacemaker_rem

Re: [ClusterLabs] [ANNOUNCE] [HA] new #openstack-ha IRC channel on FreeNode

2015-10-22 Thread Adam Spiers
Sorry! It would have helped if I'd used the right address for the openstack list in the To: and Reply-To: headers :-/ Hopefully second time lucky ... Adam Spiers wrote: > [cross-posting to several lists; please trim the recipients list > before replying!] > > Hi all, >

[ClusterLabs] [ANNOUNCE] [HA] new #openstack-ha IRC channel on FreeNode

2015-10-22 Thread Adam Spiers
[cross-posting to several lists; please trim the recipients list before replying!] Hi all, After discussion with members of the openstack-infra team, I registered new FreeNode IRC channel #openstack-ha. Discussion on all aspects of OpenStack High Availability is welcome in this channel. Hopefull

[ClusterLabs] [ANNOUNCE] [HA] [Pacemaker] new, maintained openstack-resource-agents repository

2015-10-21 Thread Adam Spiers
[cross-posting to openstack-dev and pacemaker user lists; please consider trimming the recipients list if your reply is not relevant to both communities] Hi all, Back in June I proposed moving the well-used but no longer maintained https://github.com/madkiss/openstack-resource-agents/ repository

[ClusterLabs] multiple action= lines sent to STDIN of fencing agents - why?

2015-10-15 Thread Adam Spiers
I inserted some debugging into fencing.py and found that stonithd sends stuff like this to STDIN of the fencing agents it forks: action=list param1=value1 param2=value2 param3=value3 action=list where paramX and valueX come from the configuration of the primitive for the fenci

Re: [ClusterLabs] Question about fence-agents-compute

2015-10-12 Thread Adam Spiers
Kazunori INOUE wrote: > [VM_db0101]# export OS_USERNAME=demo ; export OS_PASSWORD=demo ; > export OS_AUTH_URL=http://10.0.2.11:5000/v2.0 ; export > OS_TENANT_NAME=demo > [VM_db0101]# nova list > +--+---++ (snip) > | ID

Re: [ClusterLabs] [HA] RFC: moving Pacemaker openstack-resource-agents to stackforge

2015-06-24 Thread Adam Spiers
Ken Gaillot wrote: > On 06/23/2015 07:17 PM, Adam Spiers wrote: > >>> https://github.com/madkiss/openstack-resource-agents/ is a nice > >>> repository of Pacemaker High Availability resource agents (RAs) for > >>> OpenStack, usage of which has been offici