Re: [Pacemaker] crm resourse (lsb:apache2) not starting

2014-07-08 Thread Andrew Beekhof
On 8 Jul 2014, at 11:15 pm, W Forum W wrote: > Hi, > > I have a two node cluster with a DRBD, heartbeat and pacemaker (on Debian > Wheezy) > The cluster is working fine. 2 DRBD resources, Shared IP, 2 File systems and > a postgresql database start, stop, migrate, ... correctly. > > Now the p

Re: [Pacemaker] Enabling pacemaker debug logging while running

2014-07-08 Thread Andrew Beekhof
nds, are you trying to turn off the black box or debug logging? they're not the same thing > > 2014-04-04 3:45 GMT+02:00 Andrew Beekhof : >> >> On 24 Mar 2014, at 10:07 pm, emmanuel segura wrote: >> >>> but it will be implemented? >> >> no pla

Re: [Pacemaker] Pacemaker 1.1: cloned stonith resources require --force to be added to levels

2014-07-07 Thread Andrew Beekhof
On 5 Jul 2014, at 1:00 am, Giuseppe Ragusa wrote: > From: and...@beekhof.net > Date: Fri, 4 Jul 2014 22:50:28 +1000 > To: pacemaker@oss.clusterlabs.org > Subject: Re: [Pacemaker] Pacemaker 1.1: cloned stonith resources require > --force to be added to levels > > > On 4 Jul 2014, at 1:29

[Pacemaker] Calling all last minute fixes/bugs for 1.1.12

2014-07-07 Thread Andrew Beekhof
Fingers crossed, this will be the last pre-release of 1.1.12. In this update: - SUSE has contributed some additional logic around the removal of 'old' nodes - Handling of resources that require neither quorum nor fencing is improved - systemd resources that take a while to reach the 'active' state

Re: [Pacemaker] Pacemaker cant promote Master/Slave resource

2014-07-07 Thread Andrew Beekhof
On 4 Jul 2014, at 1:05 am, Bryan Bueter wrote: > I have a two node active/standby cluster that I'm building and I cant get > pacemaker to promote my DRBD resource. I'm thinking I have something > wrong in the configuration but I dont see it. > > The errors I get on NODE1 are: > /var/log/corosy

Re: [Pacemaker] Creating a safe cluster-node shutdown script (for when UPS goes OnBattery+LowBattery)

2014-07-07 Thread Andrew Beekhof
On 4 Jul 2014, at 3:16 pm, Giuseppe Ragusa wrote: > Hi all, > I'm trying to create a script as per subject (on CentOS 6.5, CMAN+Pacemaker, > only DRBD+KVM active/passive resources; SNMP-UPS monitored by NUT). > > Ideally I think that each node should stop (disable) all locally-running > Virtu

Re: [Pacemaker] crm_verify reports bogus "requires fencing but fencing is disabled" notices

2014-07-07 Thread Andrew Beekhof
On 8 Jul 2014, at 2:39 am, Ron Kerry wrote: > On 7/3/14, 10:23 PM, Andrew Beekhof wrote: >> >> Seems to be fixed in the latest 1.1.12 beta: >> >> # tools/crm_verify -x ~/Downloads/cibadmin-Ql.txt -VVV >> notice: update_validation: pacemaker-1.2-styl

Re: [Pacemaker] Pacemaker 1.1: cloned stonith resources require --force to be added to levels

2014-07-04 Thread Andrew Beekhof
On 4 Jul 2014, at 1:29 pm, Giuseppe Ragusa wrote: > >> Hi all, > >> while creating a cloned stonith resource > > > > Any particular reason you feel the need to clone it? > > In the end, I suppose it's only a "purist mindset" :) because it is a PDU > whose power outlets control both nodes, so

Re: [Pacemaker] DRBD active/passive on Pacemaker+CMAN cluster unexpectedly performs STONITH when promoting

2014-07-03 Thread Andrew Beekhof
On 4 Jul 2014, at 1:50 am, Giuseppe Ragusa wrote: > > > > > } > > > > > handlers { > > > > > fence-peer "/usr/lib/drbd/rhcs_fence"; > > > > > } > > > > > } > > > > > > > > > > > > > > rhcs_fence is wrong fence-peer utility. You should use > > > > /usr/lib/drbd/crm-fence-peer.sh and > > > > /usr/

Re: [Pacemaker] Pacemaker 1.1: cloned stonith resources require --force to be added to levels

2014-07-03 Thread Andrew Beekhof
On 4 Jul 2014, at 5:16 am, Giuseppe Ragusa wrote: > Hi all, > while creating a cloned stonith resource Any particular reason you feel the need to clone it? > for multi-level STONITH on a fully-up-to-date CentOS 6.5 > (pacemaker-1.1.10-14.el6_5.3.x86_64): > > pcs cluster cib stonith_cfg > pcs

Re: [Pacemaker] crm_verify reports bogus "requires fencing but fencing is disabled" notices

2014-07-03 Thread Andrew Beekhof
Seems to be fixed in the latest 1.1.12 beta: # tools/crm_verify -x ~/Downloads/cibadmin-Ql.txt -VVV notice: update_validation:pacemaker-1.2-style configuration is also valid for pacemaker-1.3 notice: update_validation:Upgrading pacemaker-1.3-style configuration to pacemaker-2.0 with

Re: [Pacemaker] Working in virtual environment but not in physical.

2014-07-03 Thread Andrew Beekhof
le to make it work. Excellent > > Good software, by the way. Appreciate your help to the community. Always nice to get some positive feedback :) > > Cheers, > Jef > > > On Wed, Jun 25, 2014 at 7:22 AM, Andrew Beekhof wrote: > > On 25 Jun 2014, at 2:58 am, C

Re: [Pacemaker] Unicast communication is working, but mcastaddr is no working

2014-07-02 Thread Andrew Beekhof
On 3 Jul 2014, at 7:46 am, Ziqing Zhuang wrote: > I am trying to configure Corosync here. At first, I tried to use unicast, > here is part of my corosync.conf file (I did not change others): > > interface { > > # The following values need to be set based on your > environme

Re: [Pacemaker] Resources not failing over, ERROR: RecurringOp: Invalid recurring action ... wth name: 'start'

2014-07-02 Thread Andrew Beekhof
1.1.6 is really too old in any case, rc=5 'not installed' means we cant find an init script of that name in /etc/init.d On 2 Jul 2014, at 2:07 pm, Vijay B wrote: > Hi, > > I'm puppetizing resource deployment for pacemaker and corosync, and as part > of it, am creating a resource on one of thr

Re: [Pacemaker] Quorum in pacemaker

2014-07-01 Thread Andrew Beekhof
On 2 Jul 2014, at 1:46 pm, Vijay B wrote: > Hi Emmanuel, > > Thanks for the response! I thought cman is a newer version of corosync itself, older actually. even though you're technically using cman, corosync is doing all the heavy lifting underneath. > w.r.t the plugin that is used between c

Re: [Pacemaker] crm_verify reports bogus "requires fencing but fencing is disabled" notices

2014-07-01 Thread Andrew Beekhof
Can you send us the 'cibadmin -Ql' output? On 2 Jul 2014, at 3:30 am, Ron Kerry wrote: > I have seen the following reporting coming out of crm_verify that is clearly > misleading to a sysadmin. Every resource defined with this sort of start/stop > operations is called out twice (presumably bec

Re: [Pacemaker] Pacemaker logging and blackbox issues.

2014-06-29 Thread Andrew Beekhof
On 30 Jun 2014, at 4:27 pm, Arjun Pandey wrote: > Hi > > I am using pacemaker version 1.1.10-14.el6 > I have enabled PCMK_DEBUG and pacemaker blackbox as well. However the file > /var/log/pacemaker.log isn’t even created. Is there a log file specified in corosync.conf/cluster.conf ? If so w

Re: [Pacemaker] Alternative communication engine to corosync (etcd/consul/zookeeper/doozerd)

2014-06-29 Thread Andrew Beekhof
On 29 Jun 2014, at 2:45 pm, Patrick Hemmer wrote: > From: Andrew Beekhof > Sent: 2014-06-21 21:40:44 EDT > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] Alternative communication engine to corosync > (etcd/consul/zookeeper/doozerd) > >> IF

Re: [Pacemaker] Ordered Resources

2014-06-29 Thread Andrew Beekhof
On 30 Jun 2014, at 9:08 am, Dan Journo wrote: > Hi, > > I’m struggling to set up pacemaker for the first time. > > The resources I have (and the order I need them to start are) > > - IPAddr > - Promote DRBD > - Asterisk > > They also need to be running on the s

Re: [Pacemaker] Blind Faith still fencing unseen nodes

2014-06-27 Thread Andrew Beekhof
On 13 Jun 2014, at 9:21 pm, Jason Hendry wrote: > > Hi Everyone, > > This is my first post, please let me know if I am missing any > standard/essential information to help with debugging... > > I have a 2-node cluster with node-level fencing. The cluster appears to be > configured with "Bl

Re: [Pacemaker] a question on the `ping` RA

2014-06-27 Thread Andrew Beekhof
On 10 Jun 2014, at 7:52 pm, Riccardo Murri wrote: > Hi Andrew, all, > > sorry for this late reply -- currently I am only able to work on this > issue very "part-time-ly"... > > On 2 June 2014 13:34, Andrew Beekhof wrote: >> >> On 2 Jun 2014, at 7:05

Re: [Pacemaker] Trouble with "Failed application of an update diff"

2014-06-27 Thread Andrew Beekhof
On 10 Jun 2014, at 10:44 pm, Виталий Туровец wrote: > Hello there again! > Here you are: http://pastebin.com/bUaNQHs1 > It's also identical on both nodes. > Thank you! > > > 2014-06-10 3:20 GMT+03:00 Andrew Beekhof : > > On 9 Jun 2014, at 11:01 pm, Виталий Ту

Re: [Pacemaker] Decreasing failover time when running DRBD+OCFS2+XEN in dual primary mode

2014-06-27 Thread Andrew Beekhof
tch > is not compatible with pacemaker. > You too try to install 8.3.11 and check once, all the best > > > On Fri, Jun 13, 2014 at 5:22 AM, Andrew Beekhof wrote: > > On 12 Jun 2014, at 9:15 pm, kamal kishi wrote: > > > Hi All, > > > > This might be a

Re: [Pacemaker] Why "order o inf: VIP A B" starts VIP, A and B simultaneously ?

2014-06-27 Thread Andrew Beekhof
On 25 Jun 2014, at 7:36 pm, Sékine Coulibaly wrote: > Hi all, > > My setup is as follows : RedHat 6.3 (yes, I know,this is quite old) , > Pacemaker 1.1.7, Corosync 1.4.1. > > I noticed something that is strange because since it doesn't complies with > what I read (and understood) from the fo

Re: [Pacemaker] How to put delay in fence_intelmodular for one node only

2014-06-27 Thread Andrew Beekhof
On 26 Jun 2014, at 8:18 am, Gianluca Cecchi wrote: > > On Sun, Jun 22, 2014 at 1:51 AM, Digimer wrote: > Excellent. > > Please note; With IPMI-only fencing, you may find that killing all power to > the node will cause fencing to fail, as the IPMI's BMC will lose power as > well (unless it

Re: [Pacemaker] colocation and ordering

2014-06-27 Thread Andrew Beekhof
On 26 Jun 2014, at 11:27 pm, Xzarth wrote: > I have a pacemaker cluster with following config: > >crm(live)configure# show >node node1 >node node2 >primitive ClusterIP ocf:heartbeat:IPaddr2 \ >params ip="192.168.56.111" cidr_netmask="32" nic="eth1" >iflabel="1" \

Re: [Pacemaker] Quorum in pacemaker

2014-06-26 Thread Andrew Beekhof
On 27 Jun 2014, at 10:22 am, Vijay B wrote: > Hi, > > I'm trying to set up a three node cluster using pacemaker+corosync, and I > installed the required packages on each node, checked for their network > connectivity so they can see each other, added the required startup scripts > and edited

Re: [Pacemaker] Troubleshooting document

2014-06-25 Thread Andrew Beekhof
On 25 Jun 2014, at 6:21 pm, Bart Coninckx wrote: > Hi all, > > Aside of the thorough and comprehensive documentation, I was wondering if > anyone would be willing to create a "Troubleshooting" document, containing a > methodology to track down and correct errors. I feel like this is missing a

Re: [Pacemaker] Info on failcount automatic reset

2014-06-24 Thread Andrew Beekhof
On 20 Jun 2014, at 11:29 pm, Gianluca Cecchi wrote: > Hello, > when the monitor action for a resource times out I think its failcount is > incremented by 1, correct? > If so, suppose the next monitor action succeeds, does the failcount value > automatically resets to zero or does it stay to 1?

Re: [Pacemaker] Working in virtual environment but not in physical.

2014-06-24 Thread Andrew Beekhof
On 25 Jun 2014, at 2:58 am, Cayab, Jefrey E. wrote: > Hi all, > > I used the same steps in the attached guide to build the cluster in physical > environment but when i got to "crm_mon -1", i always get this error: > Crm verify: Could not establish cib_ro connection: connection refused (111) >

Re: [Pacemaker] Listing resources running on node

2014-06-22 Thread Andrew Beekhof
On 23 Jun 2014, at 8:35 am, Dennis Jacobfeuerborn wrote: > Hi, > what is the best way to list the resources running on the local node? > I'm trying to create a simple monitoring script and basically want to be > able to simply list all the resources started on the local node. > > crm_resource h

Re: [Pacemaker] Alternative communication engine to corosync (etcd/consul/zookeeper/doozerd)

2014-06-21 Thread Andrew Beekhof
On 21 Jun 2014, at 1:32 am, Patrick Hemmer wrote: > From: Andrew Beekhof > Sent: 2014-06-20 04:48:25 EDT > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] Alternative communication engine to corosync > (etcd/consul/zookeeper/doozerd) > >> On

Re: [Pacemaker] Alternative communication engine to corosync (etcd/consul/zookeeper/doozerd)

2014-06-20 Thread Andrew Beekhof
On 20 Jun 2014, at 2:14 pm, Patrick Hemmer wrote: > After the demise of the old heartbeat service, and the switch to corosync as > the primary (sole) method of communication between nodes, heartbeat is still supported as a messaging/membership layer > has there ever been any consideration int

Re: [Pacemaker] monitor operation

2014-06-18 Thread Andrew Beekhof
On 18 Jun 2014, at 7:34 pm, ESWAR RAO wrote: > Hi All, > > I am having a setup of 3 nodes custer (HB+pacemaker) > > If I add a resource to the cluster, just wanted to know if the monitor > operation is invoked periodically from DC node (or) local node itself does > monitor and report status

Re: [Pacemaker] first monitor action after start of ressource fails - ends up in ressource restart

2014-06-18 Thread Andrew Beekhof
On 18 Jun 2014, at 4:13 pm, Bauer, Stefan (IZLBW Extern) wrote: > -Ursprüngliche Nachricht- > Von: Andrew Beekhof [mailto:and...@beekhof.net] > > Sounds like apache is saying "done" for the start action before its actually > started. > I believe more

Re: [Pacemaker] Pacemaker handling dual primary DRBD to host Xen HVM(windows 7) DOMU doesn't start sometime and if it starts then doesn't migrate

2014-06-17 Thread Andrew Beekhof
I see: Jun 5 15:11:45 server1 pengine: [1599]: WARN: unpack_rsc_op: Processing failed op xenwin7_last_failure_0 on server1: unknown error (1) but the actual error appears to have been before the logs start. Later on there is: Jun 5 15:12:05 server1 lrmd: [1596]: WARN: xenwin7:start process (

Re: [Pacemaker] node is unable to join cluster after upgrade (crmd dies)

2014-06-17 Thread Andrew Beekhof
On 18 Jun 2014, at 2:59 am, Krause, Markus wrote: > Hi all, > > I am using pacemaker/openais and drbd to have a high-available MySQL server > which worked „for years“ (sind SLES 11 SP 0) without configuration chances. > Just two days ago I did an update (zypper up) which now leads to the iss

Re: [Pacemaker] API documentation

2014-06-17 Thread Andrew Beekhof
On 17 Jun 2014, at 8:01 pm, Kostiantyn Ponomarenko wrote: > I took a look at the include folders of pacemaker and corosync and I didn't > find there any explanation to functions. Did I look at a wrong place? > > My goal is to manage cluster from my app, so I don't need to use crmsh or pcs. >

Re: [Pacemaker] first monitor action after start of ressource fails - ends up in ressource restart

2014-06-17 Thread Andrew Beekhof
On 17 Jun 2014, at 4:21 pm, Bauer, Stefan (IZLBW Extern) wrote: > Dear Users/Developers, > > I’m running a pacemaker/corosync cluster on Debian 7: > > Pacemaker 1.1.7.1 > Corosync 1.4.2-3 > > Everything is smooth but the first monitor action after the start action on > my apache2 ressour

Re: [Pacemaker] Decreasing failover time when running DRBD+OCFS2+XEN in dual primary mode

2014-06-16 Thread Andrew Beekhof
gt; Had to force the download to particular version as the current download/patch > is not compatible with pacemaker. > You too try to install 8.3.11 and check once, all the best > > > On Fri, Jun 13, 2014 at 5:22 AM, Andrew Beekhof wrote: > > On 12 Jun 2014, at 9:15 pm,

Re: [Pacemaker] Actions on "Another DC detected"

2014-06-16 Thread Andrew Beekhof
On 16 Jun 2014, at 4:56 pm, K Mehta wrote: > Hi, >In case stonith is not setup and a two node cluster automatically recovers > from split brain, I get "Another DC detected" message in log file. > > Here are the cluster settings > > Cluster Properties: > cluster-infrastructure: cman > d

[Pacemaker] Release Pacemaker 1.1.12 - Release Candidate 3

2014-06-13 Thread Andrew Beekhof
The latest, and possibly last Pacemaker 1.1.12 release candidate (rc3) is available now for your testing edification. https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.12-rc3 Mostly the fixes have been around master/slave resources and remote nodes. Keep those bug reports comi

Re: [Pacemaker] Pacemaker 1.1.12 cib testing, crm_mon doesn't work

2014-06-13 Thread Andrew Beekhof
rm_mon and >> everything worked. >> But I see this more as a workaround (could set this in my bash_profile). >> >> I currently have libqb-0.16.0-2.el6.x86_64 installed, >> which version should fix this ? >> >> Thanks! >> Johan H. >> >&g

Re: [Pacemaker] Pacemaker 1.1.12 cib testing, crm_mon doesn't work

2014-06-13 Thread Andrew Beekhof
e into less than the configured ipc limit (20480 > bytes).Set PCMK_ipc_buffer to a higher value (-988987392 bytes suggested) > > There is definitely something wrong. Is it the printing of the suggested > value or is it something else ? > > If I check the cib.xml files in /v

Re: [Pacemaker] one node not always joining the 3-node cluster[Welcome reply not received from: node3]

2014-06-12 Thread Andrew Beekhof
software versions? On 10 Jun 2014, at 9:28 pm, ESWAR RAO wrote: > Hi, > > I have a 3 node cluster [node1 ,node2, node3] HB+pacemaker setup. > All config files and auth files are same on all the 3 nodes. > > But strangely always node3 is unable to join cluster group. > In the ha logs of DC node

Re: [Pacemaker] PSQL 9.3.4 and Pacemaker

2014-06-12 Thread Andrew Beekhof
On 12 Jun 2014, at 9:51 pm, Christian Gebler wrote: > Hi all, > > I'm trying to configure my PSQL primitive. I have a 2 node cluster running > with pacemaker 1.1.10, corosync 1.3.3 on Ubuntu Server 14.04 LTS. It is an > active/passive cluster, so PostgreSQL (and all other resources) just ha

Re: [Pacemaker] Decreasing failover time when running DRBD+OCFS2+XEN in dual primary mode

2014-06-12 Thread Andrew Beekhof
On 12 Jun 2014, at 9:15 pm, kamal kishi wrote: > Hi All, > > This might be a basic question but I'm not sure whats taking time for > failover switching. > Hope anyone can figure it out. How about looking in the logs and seeing when the various stop/start actions occur and which ones take the

Re: [Pacemaker] Pacemaker 1.1.12 cib testing, crm_mon doesn't work

2014-06-12 Thread Andrew Beekhof
On 12 Jun 2014, at 10:53 pm, Johan Huysmans wrote: > Hi All, > > I deployed Pacemaker 1.1.12-rc2 on our platform to test the cib changes. > This was needed on our setup as it contains 6 nodes, 150 resources and the > cib process was using lots of cpu. > > With a limited set of resources (6 no

Re: [Pacemaker] When stonith is enabled, resources won't start until after stonith, even though requires="nothing" and prereq="nothing" on RHEL 7 with pacemaker-1.1.11 compiled from source.

2014-06-11 Thread Andrew Beekhof
On 12 Jun 2014, at 4:55 am, Paul E Cain wrote: > Hello, > > Overview > I'm experimenting with a small two-node Pacemaker cluster on two RHEL 7 VMs. > One of the things I need to do is ensure that my cluster can connect to a > certain IP address 10.10.0.1 because once I add the actual resource

Re: [Pacemaker] resources not rebalancing

2014-06-11 Thread Andrew Beekhof
On 11 Jun 2014, at 10:59 pm, Patrick Hemmer wrote: > From: Andrew Beekhof > Sent: 2014-06-11 02:36:15 EDT > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] resources not rebalancing > >> On 11 Jun 2014, at 3:44 pm, Patrick Hemmer >> wrote: &

Re: [Pacemaker] Not unmoving colocated resources can provoke DRBD split-brain

2014-06-11 Thread Andrew Beekhof
Referring to the king of drbd... Lars, question for you inline. On 11 Jun 2014, at 11:14 pm, Robert Dahlem wrote: > Hi Andrew, > > On 02.06.2014 02:57, Andrew Beekhof wrote: > >>> This seems to be some kind of a race condition: I added >>> sleep 3 >>

Re: [Pacemaker] DRBD primary/primary + Pacemaker goes into split brain after crm node standby/online

2014-06-11 Thread Andrew Beekhof
On 12 Jun 2014, at 12:13 am, Alexis de BRUYN wrote: > On 10.06.2014 01:44, Andrew Beekhof wrote: >> >> On 10 Jun 2014, at 4:07 am, Alexis de BRUYN >> wrote: >> >>> Hi Everybody, >>> >>> I have an issue with a 2-node Debian Wheezy pri

Re: [Pacemaker] votequorum for 2 node cluster

2014-06-11 Thread Andrew Beekhof
Chrissy? Can you shed some light here? On 11 Jun 2014, at 11:26 pm, Kostiantyn Ponomarenko wrote: > Hi guys, > > I am trying to deal somehow with split brain situation in 2 node cluster > using votequorum. > Here is a quorum section in my corosync.conf: > > provider: corosync_votequorum > e

Re: [Pacemaker] resources not rebalancing

2014-06-10 Thread Andrew Beekhof
On 11 Jun 2014, at 3:44 pm, Patrick Hemmer wrote: >>> >> Right. But each node still has 4998000+ units with which to accommodate >> something that only requires 1. >> Thats about 0.2% of the remaining capacity, so wherever it starts, its >> hardly making a dint. >> > You're thinking of t

Re: [Pacemaker] resources not rebalancing

2014-06-09 Thread Andrew Beekhof
On 5 Jun 2014, at 10:38 am, Patrick Hemmer wrote: > From: Andrew Beekhof > Sent: 2014-06-04 20:15:22 EDT > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] resources not rebalancing > >> On 5 Jun 2014, at 12:57 am, Patrick Hemmer >> wrot

Re: [Pacemaker] Trouble with "Failed application of an update diff"

2014-06-09 Thread Andrew Beekhof
ter-ClusterIP inf: ClusterIP MySQL_MasterSlave:promote > property cib-bootstrap-options: \ > dc-version=1.1.10-14.el6_5.3-368c726 \ > cluster-infrastructure="classic openais (with plugin)" \ > expected-quorum-votes=2 \ > no-quorum-policy=

Re: [Pacemaker] Process cib loops infinitely with 100% cpu usage and can't be killed

2014-06-09 Thread Andrew Beekhof
On 10 Jun 2014, at 10:07 am, Andrew Beekhof wrote: > > On 10 Jun 2014, at 9:56 am, Gabriel Gomiz > wrote: > >> On 05/30/2014 12:12 AM, Andrew Beekhof wrote: >>> There have been some big steps forward in cib for the next upstream release >>> (its basical

Re: [Pacemaker] Process cib loops infinitely with 100% cpu usage and can't be killed

2014-06-09 Thread Andrew Beekhof
On 10 Jun 2014, at 9:56 am, Gabriel Gomiz wrote: > On 05/30/2014 12:12 AM, Andrew Beekhof wrote: >> There have been some big steps forward in cib for the next upstream release >> (its basically 2 orders of magnitude faster/more efficient). >> Current versions will reg

Re: [Pacemaker] Order of resources in a group and crm_diff

2014-06-09 Thread Andrew Beekhof
On 6 Jun 2014, at 5:37 pm, Gao,Yan wrote: > > > On 06/06/14 13:21, Gao,Yan wrote: >> On 01/29/14 13:44, Andrew Beekhof wrote: >>> >>> On 28 Jan 2014, at 10:11 pm, Vladislav Bogdanov >>> wrote: >>> >>>> Hi all, >

Re: [Pacemaker] [Enhancement] When attrd reboots, the attribute disappears.

2014-06-09 Thread Andrew Beekhof
On 9 Jun 2014, at 12:01 pm, renayama19661...@ybb.ne.jp wrote: > Hi All, > > I submitted a problem in next bugziila in the past. > * https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2501 Please use bugs.clusterlabs.org in future. I'll follow up in bugzilla > > A similar phenomenon is

Re: [Pacemaker] DRBD primary/primary + Pacemaker goes into split brain after crm node standby/online

2014-06-09 Thread Andrew Beekhof
On 10 Jun 2014, at 4:07 am, Alexis de BRUYN wrote: > Hi Everybody, > > I have an issue with a 2-node Debian Wheezy primary/primary DRBD > Pacemaker/Corosync configuration. > > After a 'crm node standby' then a 'crm node online', the DRBD volume > stays in a 'split brain state' (cs:StandAlone

Re: [Pacemaker] email alerts on resource status

2014-06-09 Thread Andrew Beekhof
On 9 Jun 2014, at 9:03 pm, Francesco De Giorgi wrote: > On Sat, Jun 7, 2014 at 4:23 PM, Andrew Beekhof wrote: >> >> On 7 Jun 2014, at 12:01 am, Francesco De Giorgi >> wrote: >> >>> Hi all, >>> first post here. >>> >>> We

Re: [Pacemaker] email alerts on resource status

2014-06-07 Thread Andrew Beekhof
On 7 Jun 2014, at 12:01 am, Francesco De Giorgi wrote: > Hi all, > first post here. > > We are managing an HA Lustre filesystem with pacemaker 1.1.11 and > corosync 2.3.3 . > I was looking for a way to send email alerts to signal a Lustre target > migration, and I considered: > > - MailTo pri

Re: [Pacemaker] Upstart support in cluster-glue 1.0.11 and pacemaker 1.1.11

2014-06-05 Thread Andrew Beekhof
On 6 Jun 2014, at 12:47 am, Andrew Martin wrote: > Hello, > > I'm working on backporting the latest cluster-glue (1.0.11), pacemaker > (1.1.11), corosync, and related packages to Ubuntu 12.04: > https://launchpad.net/~xespackages/+archive/clustertesting > > I've installed these packages and s

Re: [Pacemaker] Trouble with "Failed application of an update diff"

2014-06-04 Thread Andrew Beekhof
On 30 May 2014, at 6:32 pm, Виталий Туровец wrote: > Hello there, people! > I am new to this list, so please excuse me if i'm posting to the wrong place. > > I've got a pacemaker cluster with such a configuration: > http://pastebin.com/1SbWWh4n. > > Output of "crm status": > > La

Re: [Pacemaker] resources not rebalancing

2014-06-04 Thread Andrew Beekhof
On 5 Jun 2014, at 12:57 am, Patrick Hemmer wrote: > From: Andrew Beekhof > Sent: 2014-06-04 04:15:48 E > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] resources not rebalancing > >> On 4 Jun 2014, at 4:22 pm, Patrick Hemmer >> wrot

Re: [Pacemaker] How to failover when system is overloaded?

2014-06-04 Thread Andrew Beekhof
On 5 Jun 2014, at 5:58 am, Michael Monette wrote: > Hi, > > Lately we have been having issues with our primary server becoming overloaded > and basically unresponsive. I assumed that having a floating ip was enough, > but it's not and the floating_ip resource does not fail to the second syst

Re: [Pacemaker] How to calculate manually the score of the node by yourself using cibadmin command

2014-06-04 Thread Andrew Beekhof
On 5 Jun 2014, at 4:24 am, Jacob Nikom wrote: > Hi, > > I have problem with highest preferred node score calculation. > > I have two exactly the same resources resource1 and resource2 > with the same colocation constraints. > However, during the execution resource1 goes to node1, but another >

Re: [Pacemaker] Pacemaker handling dual primary DRBD to host Xen HVM(windows 7) DOMU doesn't start sometime and if it starts then doesn't migrate

2014-06-04 Thread Andrew Beekhof
On 4 Jun 2014, at 5:20 pm, kamal kishi wrote: > Hi emi, > > Cluster logs?? > Rite now i'm getting all the logs in Syslog itself. Yes, but we aren't. And without them its kinda hard to comment. > > Another thing i found out is that ocfs2 has some issue while a anyone server > is offline or

Re: [Pacemaker] resources not rebalancing

2014-06-04 Thread Andrew Beekhof
On 4 Jun 2014, at 4:22 pm, Patrick Hemmer wrote: > Testing some different scenarios, and after bringing a node back online, none > of the resources move to it unless they are restarted. However > default-resource-stickiness is set to 0, so they should be able to move > around freely. > > # p

Re: [Pacemaker] The problem with which queue between cib and stonith-ng overflows

2014-06-03 Thread Andrew Beekhof
the device > list (10 active devices) > Jun 4 10:47:09 vm02 stonith-ng[2971]: notice: > stonith_device_register: Added 'prmStonith_helper07' to the device > list (11 active devices) > Jun 4 10:47:09 vm02 stonith-ng[2971]: notice: > stonith_device_register: Added &#x

Re: [Pacemaker] The problem with which queue between cib and stonith-ng overflows

2014-06-03 Thread Andrew Beekhof
On 4 Jun 2014, at 8:11 am, Andrew Beekhof wrote: > > On 3 Jun 2014, at 11:26 am, Yusuke Iida wrote: > >> Hi, Andrew >> >> About 15 seconds are the time taken in the whole device construction. >> I think that it cannot receive the message from cib during dev

Re: [Pacemaker] The problem with which queue between cib and stonith-ng overflows

2014-06-03 Thread Andrew Beekhof
mStonith_libvirt06' to the device > list (10 active devices) > Jun 2 11:34:13 vm04 stonith-ng[4891]: notice: > stonith_device_register: Added 'prmStonith_helper07' to the device > list (11 active devices) > Jun 2 11:34:14 vm04 stonith-ng[4891]: notice: > stonith_devi

[Pacemaker] Announce: Pacemaker 1.1.12 - Release Candidate 2

2014-06-03 Thread Andrew Beekhof
Release candidate 2 is now live! Please see GitHub for full details and let us know if you find any problems: https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.12-rc2 signature.asc Description: Message signed with OpenPGP using GPGMail ___

Re: [Pacemaker] a question on the `ping` RA

2014-06-02 Thread Andrew Beekhof
On 2 Jun 2014, at 7:05 pm, Riccardo Murri wrote: > Hi Andrew, > > thanks for your explanations. One more question: > > On 30 May 2014 02:38, Andrew Beekhof wrote: >> >> On 29 May 2014, at 9:19 pm, Riccardo Murri wrote: >> >>> - or rather does

Re: [Pacemaker] The problem with which queue between cib and stonith-ng overflows

2014-06-02 Thread Andrew Beekhof
On 2 Jun 2014, at 3:05 pm, Yusuke Iida wrote: > Hi, Andrew > > I use the newest of 1.1 brunches and am testing by eight sets of nodes. > > Although the problem was settled once, > Now, the problem with which queue overflows between cib and stonithd > has recurred. > > As an example, I paste t

Re: [Pacemaker] unexpected demote request on master

2014-06-02 Thread Andrew Beekhof
ne > > vsanqa11 > > Error: 'ms-c6933988-9e5c-419e-8fdf-744100d76ad6' is not a resource > > > > > > Can anyone let me correct command for this ? > > > > Which version of pcs are you using (and what distribution)? This has been > > fixed upstream.

Re: [Pacemaker] Not unmoving colocated resources can provoke DRBD split-brain

2014-06-01 Thread Andrew Beekhof
On 31 May 2014, at 12:43 am, Robert Dahlem wrote: > Hi, > > On 30.05.2014 13:20, Robert Dahlem wrote: > >>> run crm_report for the period covered by these commands and attach the >>> result: >>> >>> # crm node standby korfwf01 ; sleep 10 >>> # crm node standby korfwf02 ; sleep 10 >>> # crm n

Re: [Pacemaker] Not unmoving colocated resources can provoke DRBD split-brain

2014-05-30 Thread Andrew Beekhof
On 30 May 2014, at 7:15 pm, Robert Dahlem wrote: > Hi, > > On 23.05.2014 02:40, Andrew Beekhof wrote: > >> run crm_report for the period covered by these commands and attach the >> result: >> >> # crm node standby korfwf01 ; sleep 10 >> # crm node

Re: [Pacemaker] Process cib loops infinitely with 100% cpu usage and can't be killed

2014-05-29 Thread Andrew Beekhof
On 29 May 2014, at 10:00 pm, Gabriel Gomiz wrote: > On 05/26/2014 08:56 PM, Andrew Beekhof wrote: >> On 27 May 2014, at 5:48 am, Gabriel Gomiz >> wrote: >> >>> Hello Andrew and cluster folks! >>> >>> In the last month we are experiencing so

Re: [Pacemaker] a question on the `ping` RA

2014-05-29 Thread Andrew Beekhof
On 29 May 2014, at 9:19 pm, Riccardo Murri wrote: > Hello, > > we have setup a cluster of 10 nodes to serve a Lustre filesystem to a > computational cluster, with Pacemaker+Corosync to handle failover > between hosts. Each host is connected to an ethernet network and an > Infiniband, and we se

Re: [Pacemaker] Logging info

2014-05-29 Thread Andrew Beekhof
Pacemaker never sends info logs to syslog unless you've specified 'PCMK_logpriority=info' in /etc/sysconfig/pacemaker If you want to get rid of notice too, set it to 'warning' On 30 May 2014, at 10:29 am, Alex Samad - Yieldbroker wrote: > Hi > > How do I stop pacemaker from logging info, noti

Re: [Pacemaker] If 256 resources are load(ed), crmd will reboot.

2014-05-29 Thread Andrew Beekhof
On 29 May 2014, at 8:43 pm, Yusuke Iida wrote: > Hi, Andrew > > 2014-05-29 15:30 GMT+09:00 Andrew Beekhof : >> >> On 29 May 2014, at 3:40 pm, Yusuke Iida wrote: >> >>> Hi, Andrew >>> >>> 2014-05-29 14:00 GMT+09:00 Andrew Beekhof : >

Re: [Pacemaker] If 256 resources are load(ed), crmd will reboot.

2014-05-28 Thread Andrew Beekhof
On 29 May 2014, at 3:40 pm, Yusuke Iida wrote: > Hi, Andrew > > 2014-05-29 14:00 GMT+09:00 Andrew Beekhof : >> >> On 29 May 2014, at 12:28 pm, Yusuke Iida wrote: >> >>> Hi, Andrew >>> >>> I'm sorry. >>> It seems that the

Re: [Pacemaker] If 256 resources are load(ed), crmd will reboot.

2014-05-28 Thread Andrew Beekhof
I_ERROR > cause=C_FSA_INTERNAL origin=config_query_callback ] > May 29 10:43:37 vm02 crmd[25608]: warning: do_recover: Fast-tracking > shutdown in response to errors > May 29 10:43:37 vm02 crmd[25608]: warning: do_election_vote: Not > voting in election, we're in state S_RECOV

Re: [Pacemaker] If 256 resources are load(ed), crmd will reboot.

2014-05-28 Thread Andrew Beekhof
On 28 May 2014, at 6:42 pm, Yusuke Iida wrote: > Hi, Andrew > > I made the cluster load a setup to which 256 resources are started using > crmsh. > At this time, crmd changed into the S_RECOVERY state and rebooted. > > May 28 17:08:00 [14194] vm02 crmd:error: > config_query_callback

Re: [Pacemaker] Logging options: how to change logpriority only for cib

2014-05-28 Thread Andrew Beekhof
On 28 May 2014, at 8:25 pm, Bernardo Cabezas Serra wrote: > Hello, > > El 27/05/14 23:05, Andrew Beekhof escribió: >>> But now, we get this line each 5 seconds on corosync syslog: >>> pacemakerd[23624]: notice: crm_add_logfile: Additional logging available &g

Re: [Pacemaker] [Problem] The "dampen" parameter of the attrd_updater command is ignored, and an attribute is updated.

2014-05-27 Thread Andrew Beekhof
} >> } >> - >> -if(changed) { >> -if(a->timer) { >> -crm_trace("Delayed write out (%dms) for %s", a->timeout_ms, >> a->id); >> -mainloop_timer_start(a->timer); >> -} else { >> -

Re: [Pacemaker] [Problem] The "dampen" parameter of the attrd_updater command is ignored, and an attribute is updated.

2014-05-27 Thread Andrew Beekhof
On 28 May 2014, at 4:10 pm, Andrew Beekhof wrote: > > On 28 May 2014, at 3:04 pm, renayama19661...@ybb.ne.jp wrote: > >> Hi Andrew, >> >>>> I'd expect that block to hit this clause though: >>>> >>>> } else if(mainloop_tim

Re: [Pacemaker] [Problem] The "dampen" parameter of the attrd_updater command is ignored, and an attribute is updated.

2014-05-27 Thread Andrew Beekhof
On 28 May 2014, at 3:04 pm, renayama19661...@ybb.ne.jp wrote: > Hi Andrew, > >>> I'd expect that block to hit this clause though: >>> >>> } else if(mainloop_timer_running(a->timer)) { >>> crm_info("Write out of '%s' delayed: timer is running", a->id); >>> return; >> >> Whi

Re: [Pacemaker] [Problem] The "dampen" parameter of the attrd_updater command is ignored, and an attribute is updated.

2014-05-27 Thread Andrew Beekhof
On 27 May 2014, at 12:13 pm, renayama19661...@ybb.ne.jp wrote: > Hi All, > > The attrd_updater command ignores the "dampen" parameter and updates an > attribute. > > Step1) Start one node. > [root@srv01 ~]# crm_mon -1 -Af > Last updated: Tue May 27 19:36:35 2014 > Last change: Tue May 27 19:34

Re: [Pacemaker] Resources move on Pacemaker + Corosync cluster with set stickiness

2014-05-27 Thread Andrew Beekhof
>> > > I am sorry Andrew, I've tried to follow your advice but didn't quite catch > what you want me to look for :( Failing monitor actions that indicate the resource was already started without pacemaker telling it to do so. > > > Il 23/05/2014 2.42, Andrew

Re: [Pacemaker] Logging options: how to change logpriority only for cib

2014-05-27 Thread Andrew Beekhof
On 28 May 2014, at 12:33 am, Bernardo Cabezas Serra wrote: > Hello, > > The fast question: > > Is it possible to have different logpriorities for cib than for lrmd or > pacemakerd? > > The long explanation: > > After upgrading to pacemaker 1.1.12rc1, and enabling pgsql Master/Slave RA, I >

Re: [Pacemaker] no-quorum-policy = demote?

2014-05-27 Thread Andrew Beekhof
On 27 May 2014, at 7:20 pm, Christian Ciach wrote: > > > > 2014-05-27 7:34 GMT+02:00 Andrew Beekhof : > > On 27 May 2014, at 3:12 pm, Gao,Yan wrote: > > > On 05/27/14 08:07, Andrew Beekhof wrote: > >> > >> On 26 May 2014, at 10:47 pm, Christi

Re: [Pacemaker] [Fuel][HA] Notifying clones of offline nodes

2014-05-27 Thread Andrew Beekhof
On 28 May 2014, at 1:20 am, Lars Marowsky-Bree wrote: > On 2014-05-27T10:02:44, Andrew Beekhof wrote: > >>> We are working on HA solutions for OpenStack(-related) services and figured >>> out that sometimes we need clones to be notified if one of the cluster

Re: [Pacemaker] version compatibility between pcs and pacemaker

2014-05-26 Thread Andrew Beekhof
-magic="0:0;11:196:0:79ecdaeb-e637-4fdf-b8e8-ebfc7e2eca39" > call-id="275" rc-code="0" op-status="0" interval="0" last-run="1401169174" > last-rc-change="1401169174" exec-time="196" queue-time="0"

Re: [Pacemaker] no-quorum-policy = demote?

2014-05-26 Thread Andrew Beekhof
On 27 May 2014, at 3:12 pm, Gao,Yan wrote: > On 05/27/14 08:07, Andrew Beekhof wrote: >> >> On 26 May 2014, at 10:47 pm, Christian Ciach wrote: >> >>> I am sorry to get back to this topic, but I'm genuinely curious: >>> >>> Why i

Re: [Pacemaker] version compatibility between pcs and pacemaker

2014-05-26 Thread Andrew Beekhof
Any specific reason to stay on 1.1.8? > > > On Tue, May 27, 2014 at 5:28 AM, Andrew Beekhof wrote: > > On 26 May 2014, at 5:15 pm, K Mehta wrote: > > > pcs versions 0.9.26 and 0.9.90 > > pacemaker versions 1.1.8 and 1.1.10 > > > > Which pcs versi

Re: [Pacemaker] unexpected demote request on master

2014-05-26 Thread Andrew Beekhof
On 27 May 2014, at 2:37 pm, K Mehta wrote: > So is globally-unique=false correct in my case ? yes > > > On Tue, May 27, 2014 at 5:30 AM, Andrew Beekhof wrote: > > On 26 May 2014, at 9:56 pm, K Mehta wrote: > > > What I understand from "globally-uniqu

Re: [Pacemaker] Long time to do promote

2014-05-26 Thread Andrew Beekhof
_64 x86_64 x86_64 GNU/Linux > CentOS release 6.4 (Final) 6.4 should have 1.1.10 available, I would highly recommend you update before pursuing this one much further. > > > > On Mon, May 26, 2014 at 5:19 AM, Andrew Beekhof wrote: > What version of pacemaker/corosync/libqb are

<    1   2   3   4   5   6   7   8   9   10   >