Re: [Pacemaker] node1 fencing itself after node2 being fenced

2014-02-18 Thread Andrew Beekhof
On 18 Feb 2014, at 9:12 pm, Asgaroth wrote: >> >> The 3rd node should (and needs to be) fenced at this point to allow the >> cluster to continue. >> Is this not happening? > > The fencing operation appears to complete successfully, here is the > sequence: > > [1] All 3 nodes running properly

Re: [Pacemaker] node1 fencing itself after node2 being fenced

2014-02-18 Thread Andrew Beekhof
On 19 Feb 2014, at 7:16 am, Vladislav Bogdanov wrote: > 18.02.2014 23:01, David Vossel wrote: >> >> >> >> >> - Original Message - >>> From: "Vladislav Bogdanov" To: >>> pacemaker@oss.clusterlabs.org Sent: Tuesday, February 18, 2014 >>> 1:02:09 PM Subject: Re: [Pacemaker] node1 fenci

Re: [Pacemaker] [Problem] Fail-over is delayed.(State transition is not calculated.)

2014-02-18 Thread Andrew Beekhof
I'll follow up on the bug. On 19 Feb 2014, at 10:55 am, renayama19661...@ybb.ne.jp wrote: > Hi David, > > Thank you for comments. > >> You have resource-stickiness=INFINITY, this is what is preventing the >> failover from occurring. Set resource-stickiness=1 or 0 and the failover >> should oc

Re: [Pacemaker] hangs pending

2014-02-18 Thread Andrew Beekhof
On 18 Feb 2014, at 11:05 pm, Andrey Groshev wrote: > Hi, ALL and Andrew! > > Today is a good day - I killed a lot, and a lot of shooting at me. > In general - I am happy (almost like an elephant) :) > Except resources on the node are important to me eight processes: > corosync,pacemakerd,cib

Re: [Pacemaker] hangs pending

2014-02-18 Thread Andrew Beekhof
hat will fail causing the crmd to fail and the node to be fenced. > Generaly don't touch corosync,cib and maybe lrmd,crmd. > > What do you think about this? > The main question of this topic - we decided. > But this varied behavior - another big problem. > > > >

Re: [Pacemaker] About the difference in handling of "sequential".

2014-02-18 Thread Andrew Beekhof
On 19 Feb 2014, at 10:48 am, Kristoffer Grönlund wrote: > Hi everyone, > > On Mon, 17 Feb 2014 10:54:29 +0900 (JST) > renayama19661...@ybb.ne.jp wrote: > >> Hi Andrew, >> >> I found your correction. >> >> https://github.com/beekhof/pacemaker/commit/37ff51a0edba208e6240e812936717fffc941a41 >>

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

2014-02-18 Thread Andrew Beekhof
On 18 Feb 2014, at 8:18 pm, Andrew Beekhof wrote: > > On 18 Feb 2014, at 7:40 pm, Vladislav Bogdanov wrote: > >> 18.02.2014 03:49, Andrew Beekhof wrote: >>> >>> On 31 Jan 2014, at 6:20 pm, yusuke iida wrote: >>> >>>> Hi, all &g

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

2014-02-18 Thread Andrew Beekhof
On 18 Feb 2014, at 7:40 pm, Vladislav Bogdanov wrote: > 18.02.2014 03:49, Andrew Beekhof wrote: >> >> On 31 Jan 2014, at 6:20 pm, yusuke iida wrote: >> >>> Hi, all >>> >>> I measure the performance of Pacemaker in the following combina

Re: [Pacemaker] Order of resources in a group and crm_diff

2014-02-18 Thread Andrew Beekhof
On 18 Feb 2014, at 7:25 pm, Vladislav Bogdanov wrote: > 29.01.2014 08:44, Andrew Beekhof wrote: > ... >> >> Thats a known deficiency in the v1 diff format (and why we need costly >> digests to detect ordering changes). >> Happily .12 will have a new and improv

Re: [Pacemaker] [Patch]Information of "Connectivity is lost" is not displayed

2014-02-17 Thread Andrew Beekhof
srv01: > > - > > I uploaded log in the next place.(trac2781.zip) > > * https://skydrive.live.com/?cid=3A14D57622C66876&id=3A14D57622C66876%21117 > > Best Regards, > Hideo Yamauchi. > > > --- On Tue, 2014/2/18, Andrew Beekhof wrote

Re: [Pacemaker] pre_notify_demote is issued twice

2014-02-17 Thread Andrew Beekhof
On 6 Feb 2014, at 7:45 pm, Keisuke MORI wrote: > Hi, > > I observed that pre_notify_demote is issued twice when a master > resource is migrating. > I'm wondering if this is the correct behavior. > > Steps to reproduce: > > - Start up 2 nodes cluster configured for the PostgreSQL streaming > r

Re: [Pacemaker] stopped resource was judged to be active

2014-02-17 Thread Andrew Beekhof
On 10 Feb 2014, at 5:28 pm, Kazunori INOUE wrote: > Hi, > > Pacemaker stopped, but it was judged that a resource was active. > I put crm_report here. > https://drive.google.com/file/d/0B9eNn1AWfKD4S29JWk1ldUJJNGs/edit?usp=sharing > > [Steps to reproduce] > 1) start up the cluster > > Stack: c

Re: [Pacemaker] [Patch]Information of "Connectivity is lost" is not displayed

2014-02-17 Thread Andrew Beekhof
iscuss the correction to put meta data in > rsc-parameters with Mr. Lars? Or Mr. David? > > Best Regards, > Hideo Yamauchi. > > --- On Tue, 2014/2/18, Andrew Beekhof wrote: > >> >> On 17 Feb 2014, at 5:43 pm, renayama19661...@ybb.ne.jp wrote: >> >>

Re: [Pacemaker] [Patch]Information of "Connectivity is lost" is not displayed

2014-02-17 Thread Andrew Beekhof
On 17 Feb 2014, at 5:43 pm, renayama19661...@ybb.ne.jp wrote: > Hi All, > > The next change was accomplished by Mr. Lars. > > https://github.com/ClusterLabs/pacemaker/commit/6a17c003b0167de9fe51d5330fb6e4f1b4ffe64c I'm confused... that patch seems to be the reverse of yours. Are you saying tha

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-02-17 Thread Andrew Beekhof
On 22 Jan 2014, at 10:54 am, Brian J. Murrell (brian) wrote: > On Thu, 2014-01-16 at 14:49 +1100, Andrew Beekhof wrote: >> >> What crm_mon are you looking at? >> I see stuff like: >> >> virt-fencing (stonith:fence_xvm):Started rhos4-node3 >> Resou

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

2014-02-17 Thread Andrew Beekhof
On 31 Jan 2014, at 6:20 pm, yusuke iida wrote: > Hi, all > > I measure the performance of Pacemaker in the following combinations. > Pacemaker-1.1.11.rc1 > libqb-0.16.0 > corosync-2.3.2 > > All nodes are KVM virtual machines. > > stopped the node of vm01 compulsorily from the inside, after s

Re: [Pacemaker] Resource Agents in OpenVZ containers

2014-02-17 Thread Andrew Beekhof
On 17 Feb 2014, at 5:38 pm, emmanuel segura wrote: > example: colocation ipwithpgsql inf: virtualip psql:Master Ah, so colocating with the host running the vm with the master inside it. Thats not something we can do yet, sorry. > > > 2014-02-17 6:25 GMT+01:00 Tomasz Kontusz : &g

Re: [Pacemaker] crm_mon --as-html default permissions

2014-02-17 Thread Andrew Beekhof
On 12 Feb 2014, at 9:53 pm, Marko Potocnik wrote: > Hi, > > I've upgraded from pacemaker-1.1.7-6.el6.x86_64 to > pacemaker-1.1.10-14.el6_5.2.x86_64. > I use crm_mon with --as-html option to get the cluster status in html file. > I've noticed that the permissions for file have changed from 644

Re: [Pacemaker] node1 fencing itself after node2 being fenced

2014-02-17 Thread Andrew Beekhof
On 18 Feb 2014, at 5:52 am, Asgaroth wrote: >> -Original Message- >> From: Andrew Beekhof [mailto:and...@beekhof.net] >> Sent: 17 February 2014 00:55 >> To: li...@blueface.com; The Pacemaker cluster resource manager >> Subject: Re: [Pacemaker] node1 fe

Re: [Pacemaker] Get group behaviour with Master slave or clones envolved

2014-02-17 Thread Andrew Beekhof
On 17 Feb 2014, at 10:34 pm, Néstor C. wrote: > > > > 2014-02-17 1:22 GMT+01:00 Andrew Beekhof : > > On 21 Jan 2014, at 10:50 pm, Néstor C. wrote: > > >> Hello. > >> > >> When you need that some primitives switch in block you can use a group

Re: [Pacemaker] resource is too active problem in a 2-node cluster

2014-02-17 Thread Andrew Beekhof
Could this > error be a result of these un-implemented actions? Unlikely. More likely the monitor action is not correctly returning OCF_NOT_RUNNING if run before the resource is running. > On 02/16/2014 09:15 PM, Andrew Beekhof wrote: >> On 12 Feb 2014, at 1:39 am, Ajay Aggarwal

Re: [Pacemaker] Resource Agents in OpenVZ containers

2014-02-16 Thread Andrew Beekhof
On 16 Feb 2014, at 6:53 am, emmanuel segura wrote: > i think, if you use pacemaker_remote inside the container, the container will > be a normal node of you cluster, so you can run pgsql + vip in it > > > 2014-02-15 19:40 GMT+01:00 Tomasz Kontusz : > Hi > I'm setting up a cluster which will u

Re: [Pacemaker] resource is too active problem in a 2-node cluster

2014-02-16 Thread Andrew Beekhof
On 12 Feb 2014, at 1:39 am, Ajay Aggarwal wrote: > Yes, we have cman (version: cman-3.0.12.1-49). We use manual fencing ( I know > it is not recommended). There is an external monitoring and fencing service > that we use (our own). > > Perhaps subject line "resource is too active problem in

Re: [Pacemaker] [Question] About replacing in resource_set of the order limitation.

2014-02-16 Thread Andrew Beekhof
On 17 Feb 2014, at 12:47 pm, renayama19661...@ybb.ne.jp wrote: > Hi Andrew, > > Thank you for comments. > >> Is this related to your email about symmetrical not being defaulted >> consistently between colocate_rsc_sets() and unpack_colocation_set()? > > Yes. > I think that a default is not ha

Re: [Pacemaker] Reason for automatic migration after one node rebooted?

2014-02-16 Thread Andrew Beekhof
Attachments beat pastebin as they don't expire :) What does your config look like after you run: crm resource migrate nfs_services nfs01 ? On 7 Feb 2014, at 5:00 am, Andrew J. Caines wrote: > I'm new to Pacemaker and am working on my RTFM+STFW efforts, but have an > urgent need to find the cau

Re: [Pacemaker] pacemaker shutdown issue after OS change

2014-02-16 Thread Andrew Beekhof
On 31 Jan 2014, at 3:18 am, Pascal BERTON wrote: > Hi ! > > I recently changed hosting platform versions for my PCMK clusters, from > RHEL6.0 equivalent towards SL6.4. Also changed from pcmk 1.1.2+corosync 1.3.3 > to pcmk 1.1.10+corosync 1.4.1 that come with SL6. > Until now, I used to mana

Re: [Pacemaker] Restart of resources

2014-02-16 Thread Andrew Beekhof
On 7 Feb 2014, at 8:54 pm, Frank Brendel wrote: > >> Its somewhat inferred in the description of cluster-recheck-interval >> >>cluster-recheck-interval = time [15min] >>Polling interval for time based changes to options, resource >> parameters and constraints. >> >>

Re: [Pacemaker] [Question] About replacing in resource_set of the order limitation.

2014-02-16 Thread Andrew Beekhof
Is this related to your email about symmetrical not being defaulted consistently between colocate_rsc_sets() and unpack_colocation_set()? On 22 Jan 2014, at 3:05 pm, renayama19661...@ybb.ne.jp wrote: > Hi All, > > My test seemed to include a mistake. > It seems to be replaced by two limitation.

Re: [Pacemaker] display order in crm_mon output

2014-02-16 Thread Andrew Beekhof
On 11 Feb 2014, at 8:23 pm, Bauer, Stefan (IZLBW Extern) wrote: > Hi List, > > we’ve recovered a cluster after a failure and used a previously exported > cib.xml. Everything is back to normal state. > The strange thing is, that the order in the output of crm_mon is not like > before. > >

Re: [Pacemaker] Manual resource reload

2014-02-16 Thread Andrew Beekhof
On 11 Feb 2014, at 2:49 am, Vladislav Bogdanov wrote: > Hi, > > cannot find anywhere (am I blind?), is it possible to manually inject > 'reload' op for a given resource? > > Background for this is if some configuration files are edited, and > resource-agent (or LSB script) supports 'reload' op

Re: [Pacemaker] node1 fencing itself after node2 being fenced

2014-02-16 Thread Andrew Beekhof
On 7 Feb 2014, at 10:22 pm, Asgaroth wrote: > On 06/02/2014 05:52, Vladislav Bogdanov wrote: >> Hi, >> >> I bet your problem comes from the LSB clvmd init script. >> Here is what it does do: >> >> === >> ... >> clustered_vgs() { >> ${lvm_vgdisplay} 2>/dev/null | \ >> awk 'B

Re: [Pacemaker] pacemaker on different subnet machines

2014-02-16 Thread Andrew Beekhof
On 25 Jan 2014, at 12:55 am, Parveen Jain wrote: > Hi All, > Can there be any problem if we install pacemaker/corosync on different > subnets(for a simple two machine setup cluster) ? Corosync uses multicast by default which I could easily imagine would not like this kind of setup. Perhaps

Re: [Pacemaker] Stonith logging question

2014-02-16 Thread Andrew Beekhof
On 22 Jan 2014, at 12:18 am, Robert Lindgren wrote: > Hi, > > I'm trying to get rid of some stonith info logging but I fail :( Turn off debug and, for everything else, edit the C source code > > The log-lines are like this in syslog: > Jan 21 13:24:15 wolf1 stonith-ng: [6349]: info: stonith_

Re: [Pacemaker] Get group behaviour with Master slave or clones envolved

2014-02-16 Thread Andrew Beekhof
On 21 Jan 2014, at 10:50 pm, Néstor C. wrote: > Hello. > > When you need that some primitives switch in block you can use a group. Groups are just a syntactic shortcut for ordering and colocation constraints. > > There is a manner to get this when you have a clone or a master/slave > involv

Re: [Pacemaker] apache resource status stopped

2014-02-16 Thread Andrew Beekhof
Was there a question here somewhere? On 20 Jan 2014, at 9:59 pm, Praveen wrote: > [root@pcmk-2 ~]# pcs status > Cluster name: mycluster > Last updated: Mon Jan 20 15:56:13 2014 > Last change: Mon Jan 20 15:47:11 2014 via crm_attribute on pcmk-1 > Stack: corosync > Current DC: pcmk-2 (2) - partit

Re: [Pacemaker] restore old configuration

2014-02-16 Thread Andrew Beekhof
On 20 Jan 2014, at 8:05 pm, Axel wrote: > Hi all, > > this is my first post, i hope i meet all required informations and netiquette > on this list. > > I need help to backup and restore resource configuration i made with corosync > / pacemaker an an ubuntu server (precise) > > I configured

Re: [Pacemaker] When the ex-live server comes back online, it tries to failback causing a failure and restart in services

2014-02-16 Thread Andrew Beekhof
On 17 Jan 2014, at 4:33 pm, Michael Monette wrote: > Hi, > > I have 2 servers setup with Postgres and /dev/drbd1 is mounted at > /var/lib/pgsql. I also have pacemaker setup and it's setup to fail back and > forth between the 2 nodes. It works really well for the most part. > > I am having th

Re: [Pacemaker] ocf:lvm2:clvmd resource agent

2014-02-16 Thread Andrew Beekhof
On 13 Feb 2014, at 9:56 am, Andrew Daugherity wrote: > I noticed in recent discussions on this list that this RA is apparently a > SUSE thing and not upstreamed into resource-agents. This was news to me, but > apparently is indeed the case. > > I guess it's SUSE's decision whether to push it

Re: [Pacemaker] hangs pending

2014-02-16 Thread Andrew Beekhof
With no quick follow-up, dare one hope that means the patch worked? :-) On 14 Feb 2014, at 3:37 pm, Andrey Groshev wrote: > Yes, of course. Now beginning build world and test ) > > 14.02.2014, 04:41, "Andrew Beekhof" : >> The previous patch wasn't quite right.

Re: [Pacemaker] suggestions on pacemaker error thanks

2014-02-14 Thread Andrew Beekhof
Not without seeing your configuration. At a guess you've not defined a fencing agent in pacemaker. On 15 Feb 2014, at 5:02 am, Remo Mattei wrote: > Hello Andrew, > could you suggest a fix for this. i am using Red Hat 6.5. > > Thanks I got the same error on bot servers. > > Feb 13 19:01:10 c

Re: [Pacemaker] hangs pending

2014-02-13 Thread Andrew Beekhof
N__, peer, CRM_NODE_LOST, 0); +crm_update_peer_proc(__FUNCTION__, peer, crm_proc_none, NULL); +} +crm_update_peer_join(__FUNCTION__, peer, crm_join_none); +crm_update_peer_expected(__FUNCTION__, peer, CRMD_JOINSTATE_DOWN); +} On 16 Jan 2014, at 7:24 pm, Andrey Groshev wrote:

Re: [Pacemaker] Announcing Pacemaker v1.1.11

2014-02-13 Thread Andrew Beekhof
Great job David, thanks for taking the lead on this role while I was on baby duty :) On 14 Feb 2014, at 4:26 am, David Vossel wrote: > > I am excited to announce the release of Pacemaker v1.1.11 > > https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.11 > > There were no cha

Re: [Pacemaker] Manual fence confirmation by stonith_admin doesn't work again.

2014-02-06 Thread Andrew Beekhof
nual server" to successfully provided the needed ack > that stonith has been successful. > > Bob Haxo > > > On Mon, 2014-01-13 at 14:58 +1100, Andrew Beekhof wrote: >> On 10 Jan 2014, at 3:54 pm, Nikita Staroverov wrote: >> >> >> >> >

Re: [Pacemaker] Announce: SNMP agent for pacemaker

2014-02-06 Thread Andrew Beekhof
On 7 Feb 2014, at 6:30 pm, Michael Schwartzkopff wrote: > Am Freitag, 7. Februar 2014, 17:09:39 schrieb Andrew Beekhof: >> On 23 Jan 2014, at 12:30 am, Lars Marowsky-Bree wrote: >>> On 2014-01-22T09:37:33, Michael Schwartzkopff wrote: >>>> Hi, >>>&

Re: [Pacemaker] Announce: SNMP agent for pacemaker

2014-02-06 Thread Andrew Beekhof
On 23 Jan 2014, at 12:30 am, Lars Marowsky-Bree wrote: > On 2014-01-22T09:37:33, Michael Schwartzkopff wrote: > >> Hi, >> >> I am working on a SNMP agent for pacemaker. it is written in perl. At the >> moment it is in an alpha stadium. >> >> Any volunteers for testing? > > I'd be quite cur

Re: [Pacemaker] hangs pending

2014-02-06 Thread Andrew Beekhof
On 7 Feb 2014, at 3:55 pm, Andrey Groshev wrote: > Hi, Andrew and ALL! > > Andrew, We did not bury this topic? No, I've just not had a chance to return to it. I've spent the last week in dbus hell for the .11 release and had vaction before that signature.asc Description: Message signed with

Re: [Pacemaker] Running remote SSH commands on another server?

2014-02-06 Thread Andrew Beekhof
On 31 Jan 2014, at 2:15 am, Michael Monette wrote: > Hello, > > I am coming up short in my searches, but I don't know exactly what I am > searching for, hoping someone could point me in the right direction. > > I have Pacemaker setup in active/passive on my Email server. The systems are > in

Re: [Pacemaker] daemon cpg_join error retrying

2014-02-06 Thread Andrew Beekhof
On 31 Jan 2014, at 9:36 pm, Parveen Jain wrote: > Hi, > Even I had seen that bug. But that bug was force closed because they found > it to some switch error. > For me, it appears on a consistent basis. Sometimes automatically it goes > away and then reappears when I recreate the cluster using

Re: [Pacemaker] Restart of resources

2014-02-06 Thread Andrew Beekhof
On 3 Feb 2014, at 9:40 pm, Frank Brendel wrote: > I've solved the problem. > > When I set cluster-recheck-interval to a value less than failure-timeout > it works. > > Is this an expected behavior? Yes. > > This is not documented anywhere. Its somewhat inferred in the description of clust

Re: [Pacemaker] node1 fencing itself after node2 being fenced

2014-02-06 Thread Andrew Beekhof
On 6 Feb 2014, at 3:33 pm, Digimer wrote: > On 05/02/14 11:30 PM, Nikita Staroverov wrote: >> >>> Some archive messages seem to suggest that clvmd should be started >>> outside of the cluster at system boot (cman -> clvmd -> pacemaker), >>> however, my personal preference would be to have these

Re: [Pacemaker] Shutdown of pacemaker service takes 20 minutes

2014-02-06 Thread Andrew Beekhof
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_resource_operations.html On 4 Feb 2014, at 2:05 am, elo wrote: > > > > please, how do i add the stop timeout?? > > Thanks. will be really grateful to get a REPLY > > > ___ >

Re: [Pacemaker] LSB openswan script monitor problems

2014-02-05 Thread Andrew Beekhof
On 5 Feb 2014, at 2:57 pm, Dennis Jacobfeuerborn wrote: > On 05.02.2014 04:06, Andrew Beekhof wrote: >> >> On 3 Feb 2014, at 11:34 pm, Frank Brendel wrote: >> >>> As far as I know, Pacemaker does not parse STDOUT but it keeps STDERR >>> for loggin

Re: [Pacemaker] LSB openswan script monitor problems

2014-02-04 Thread Andrew Beekhof
On 3 Feb 2014, at 11:34 pm, Frank Brendel wrote: > As far as I know, Pacemaker does not parse STDOUT but it keeps STDERR > for logging. > Experts correct me if I am wrong. You are correct [snip] >> Does nobody have an idea? I checked this link: >> >> http://clusterlabs.org/doc/en-US/Pacemak

Re: [Pacemaker] Order of resources in a group and crm_diff

2014-01-28 Thread Andrew Beekhof
On 28 Jan 2014, at 10:11 pm, Vladislav Bogdanov wrote: > Hi all, > > Just discovered, that when I add resource to a middle of > (running) group, it is added to the end. > > I mean, if I update following (crmsh syntax) > > group dhcp-server vip-10-5-200-244 dhcpd > > with > > group dhcp-serv

Re: [Pacemaker] Time to get ready for 1.1.11

2014-01-28 Thread Andrew Beekhof
On 24 Jan 2014, at 3:25 pm, Digimer wrote: > On 23/01/14 11:08 PM, David Vossel wrote: >> You may have noticed the release did not happen. I'm investigating a bug in >> the service api involving systemd scripts. I'm postponing the release until >> I understand what is going on. >> >> -- Voss

Re: [Pacemaker] command to dump cluster configuration in "pcs" format?

2014-01-16 Thread Andrew Beekhof
On 17 Jan 2014, at 9:05 am, Lars Marowsky-Bree wrote: > On 2014-01-17T07:40:34, Andrew Beekhof wrote: > >>> Well, unless RHT states that installing crmsh on top of their >>> distribution invalidates support for the pacemaker back-end, you could >>> just ship

Re: [Pacemaker] command to dump cluster configuration in "pcs" format?

2014-01-16 Thread Andrew Beekhof
On 16 Jan 2014, at 10:59 pm, Lars Marowsky-Bree wrote: > On 2014-01-15T20:25:30, Bob Haxo wrote: > >> Unfortunately, it configuration has taken me weeks to develop (what now >> seems to be) a working configuration (including mods to the >> VirtualDomain agent to avoid spurious restarts of the

Re: [Pacemaker] Time to get ready for 1.1.11

2014-01-15 Thread Andrew Beekhof
id Vossel" >>> To: "The Pacemaker cluster resource manager" >>> Sent: Tuesday, January 7, 2014 4:50:11 PM >>> Subject: Re: [Pacemaker] Time to get ready for 1.1.11 >>> >>> - Original Message - >>>> From: "Andrew B

Re: [Pacemaker] command to dump cluster configuration in "pcs" format?

2014-01-15 Thread Andrew Beekhof
On 16 Jan 2014, at 3:25 pm, Bob Haxo wrote: > > On Thu, 2014-01-16 at 12:32 +1100, Andrew Beekhof wrote: >> On 16 Jan 2014, at 11:49 am, Bob Haxo wrote: >> >>>> On 01/15/2014 05:02 PM, Bob Haxo wrote: >>>>> Greetings, >>>>>

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-15 Thread Andrew Beekhof
On 16 Jan 2014, at 1:13 pm, Brian J. Murrell (brian) wrote: > On Thu, 2014-01-16 at 08:35 +1100, Andrew Beekhof wrote: >> >> I know, I was giving you another example of when the cib is not completely >> up-to-date with reality. > > Yeah, I understood that. I w

Re: [Pacemaker] command to dump cluster configuration in "pcs" format?

2014-01-15 Thread Andrew Beekhof
On 16 Jan 2014, at 11:49 am, Bob Haxo wrote: >> On 01/15/2014 05:02 PM, Bob Haxo wrote: >> > Greetings, >> > >> > The command "crm configure show" dumps the cluster configuration in a >> > format >> > that is suitable for use in configuring a cluster. >> > >> > The command "pcs config" generat

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-15 Thread Andrew Beekhof
On 16 Jan 2014, at 6:53 am, Brian J. Murrell (brian) wrote: > On Wed, 2014-01-15 at 17:11 +1100, Andrew Beekhof wrote: >> >> Consider any long running action, such as starting a database. >> We do not update the CIB until after actions have completed, so there can >

Re: [Pacemaker] hangs pending

2014-01-15 Thread Andrew Beekhof
On 16 Jan 2014, at 12:41 am, Andrey Groshev wrote: > > > 15.01.2014, 02:53, "Andrew Beekhof" : >> On 15 Jan 2014, at 12:15 am, Andrey Groshev wrote: >> >>> 14.01.2014, 10:00, "Andrey Groshev" : >>>> 14.01.2014, 07:47, "

Re: [Pacemaker] Question about new migration

2014-01-15 Thread Andrew Beekhof
On 15 Jan 2014, at 7:12 pm, Kazunori INOUE wrote: > Hi David, > > With new migration logic, when VM was migrated by 'node standby', > start was performed in migrate_target. (migrate_from was not performed.) > > Is this the designed behavior? > > > # crm_mon -rf1 > Stack: corosync > Current D

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-14 Thread Andrew Beekhof
On 14 Jan 2014, at 11:50 pm, Brian J. Murrell (brian) wrote: > On Tue, 2014-01-14 at 16:01 +1100, Andrew Beekhof wrote: >> >>> On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote: >>>> >>>> The local cib hasn't caught up yet by the looks

Re: [Pacemaker] Consider extra slave node resource when calculating actions for failover

2014-01-14 Thread Andrew Beekhof
On 14 Jan 2014, at 11:25 pm, Juraj Fabo wrote: > Hi > > I have master-slave cluster with configuration attached below. > It is based on documented postgresql master-slave cluster configuration. > Colocation constraints should work that way that if some of "master-group" > resources fails, > f

Re: [Pacemaker] [Enhancement] Change of the "globally-unique" attribute of the resource.

2014-01-14 Thread Andrew Beekhof
x27;d have expected the stop action to be performed with the old attributes. >>> crm_report tarball? >> >> Okay. >> >> I register this topic with Bugzilla. >> I attach the log to Bugzilla. >> >> Best Regards, >> Hideo Yamauchi. >> ---

Re: [Pacemaker] [Enhancement] Change of the "globally-unique" attribute of the resource.

2014-01-14 Thread Andrew Beekhof
On 14 Jan 2014, at 7:26 pm, renayama19661...@ybb.ne.jp wrote: > Hi All, > > When a user changes the "globally-unique" attribute of the resource, a > problem occurs. > > When it manages the resource with PID file, this occurs, but this is because > PID file name changes by "globally-unique" at

Re: [Pacemaker] hangs pending

2014-01-14 Thread Andrew Beekhof
On 15 Jan 2014, at 12:15 am, Andrey Groshev wrote: > > > 14.01.2014, 10:00, "Andrey Groshev" : >> 14.01.2014, 07:47, "Andrew Beekhof" : >> >>> Ok, here's what happens: >>> >>> 1. node2 is lost >>> 2. fencin

Re: [Pacemaker] [Linux-HA] Better way to change master in 3 node pgsql cluster

2014-01-14 Thread Andrew Beekhof
recognized option '--ban' >> >> >> No other way to move master? >> >> >> >> 2014/1/13 Andrew Beekhof >> >> On 13 Jan 2014, at 8:32 pm, Andrey Rogovsky wrote: >> >> > Hi >> > >> > I have 3 nod

Re: [Pacemaker] A resource starts with a standby node.(Latest attrd does not serve as the crmd-transition-delay parameter)

2014-01-13 Thread Andrew Beekhof
;> Only the new code makes (or at least should do) crmd-transition-delay >>>> redundant. >>> >>> It did not seem to work so that new attrd dispensed with >>> crmd-transition-delay to me. >>> I report the details again. >>> # Probably it will be B

Re: [Pacemaker] A resource starts with a standby node.(Latest attrd does not serve as the crmd-transition-delay parameter)

2014-01-13 Thread Andrew Beekhof
e new code makes (or at least should do) crmd-transition-delay >> redundant. > > It did not seem to work so that new attrd dispensed with > crmd-transition-delay to me. > I report the details again. > # Probably it will be Bugzilla. . . Sounds good > > Best R

Re: [Pacemaker] A resource starts with a standby node.(Latest attrd does not serve as the crmd-transition-delay parameter)

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 3:52 pm, renayama19661...@ybb.ne.jp wrote: > Hi All, > > I contributed next bugzilla by a problem to occur for the difference of the > timing of the attribute update by attrd before. > * https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2528 > > We can evade this pro

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 3:41 pm, Brian J. Murrell (brian) wrote: > On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote: >> >> The local cib hasn't caught up yet by the looks of it. > > Should crm_resource actually be [mis-]reporting as if it were > knowledgeable

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 3:34 pm, Andrey Groshev wrote: > > > 14.01.2014, 06:25, "Andrew Beekhof" : >> Apart from anything else, your timeout needs to be bigger: >> >> Jan 13 12:21:36 [17223] dev-cluster2-node1.unix.tensor.ru stonith-ng: ( >> commands.

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrew Beekhof
nce the script makes sure that the victim will rebooted and again available via ssh - it exit with 0." does not seem true. On 14 Jan 2014, at 1:19 pm, Andrew Beekhof wrote: > Apart from anything else, your timeout needs to be bigger: > > Jan 13 12:21:36 [17223] dev-cluste

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 1:19 pm, Andrew Beekhof wrote: > Apart from anything else, your timeout needs to be bigger: > > Jan 13 12:21:36 [17223] dev-cluster2-node1.unix.tensor.ru stonith-ng: ( > commands.c:1321 ) error: log_operation: Operation 'reboot' [11331] (call &g

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrew Beekhof
7; with device 'st1' returned: -62 (Timer expired) On 14 Jan 2014, at 7:18 am, Andrew Beekhof wrote: > > On 13 Jan 2014, at 8:31 pm, Andrey Groshev wrote: > >> >> >> 13.01.2014, 02:51, "Andrew Beekhof" : >>> On 10 Jan 2014, at

Re: [Pacemaker] Location / Colocation constraints issue

2014-01-13 Thread Andrew Beekhof
On 19 Dec 2013, at 1:08 am, Gaëtan Slongo wrote: > Hi ! > > I'm currently building a 2 node cluster for firewalling. > I would like to run a shorewall on both on the master and the "Slave" > node. I tried many things but nothing works as expected. Shorewall > configurations are good. > What I w

Re: [Pacemaker] [Linux-HA] Better way to change master in 3 node pgsql cluster

2014-01-13 Thread Andrew Beekhof
On 13 Jan 2014, at 8:32 pm, Andrey Rogovsky wrote: > Hi > > I have 3 node postgresql cluster. > It work well. But I have some trobule with change master. > > For now, if I need change master, I must: > 1) Stop PGSQL on each node and cluster service > 2) Start Setup new manual PGSQL replication

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 5:13 am, Brian J. Murrell (brian) wrote: > Hi, > > I found a situation using pacemaker 1.1.10 on RHEL6.5 where the output > of "crm_resource -L" is not trust-able, shortly after a node is booted. > > Here is the output from crm_resource -L on one of the nodes in a two > nod

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrew Beekhof
On 13 Jan 2014, at 8:31 pm, Andrey Groshev wrote: > > > 13.01.2014, 02:51, "Andrew Beekhof" : >> On 10 Jan 2014, at 9:55 pm, Andrey Groshev wrote: >> >>> 10.01.2014, 14:31, "Andrey Groshev" : >>>> 10.01.2014, 14:01, "A

Re: [Pacemaker] Manual fence confirmation by stonith_admin doesn't work again.

2014-01-12 Thread Andrew Beekhof
On 10 Jan 2014, at 3:54 pm, Nikita Staroverov wrote: > There is no-one to tell yet. We have to wait for cman to decide something needs fencing before pacemaker can perform the notification. >>> if I get you right i need own fencing agent that doing manual confirmed >>> fe

Re: [Pacemaker] hangs pending

2014-01-12 Thread Andrew Beekhof
On 10 Jan 2014, at 9:55 pm, Andrey Groshev wrote: > > > 10.01.2014, 14:31, "Andrey Groshev" : >> 10.01.2014, 14:01, "Andrew Beekhof" : >> >>> On 10 Jan 2014, at 5:03 pm, Andrey Groshev wrote: >>>> 10.01.2014, 05:29, "An

Re: [Pacemaker] again "return code", now in crm_attribute

2014-01-12 Thread Andrew Beekhof
On 10 Jan 2014, at 6:18 pm, Andrey Groshev wrote: > > > 10.01.2014, 10:15, "Andrew Beekhof" : >> On 10 Jan 2014, at 4:38 pm, Andrey Groshev wrote: >> >>> 10.01.2014, 09:06, "Andrew Beekhof" : >>>> On 10 Jan 2014, at 3:51

Re: [Pacemaker] [PATCH] Downgrade probe log message for promoted ms resources

2014-01-12 Thread Andrew Beekhof
Fair enough. Pull request? On 12 Jan 2014, at 8:29 pm, Vladislav Bogdanov wrote: > Hi, > > This is the only one message I see in logs in otherwise static cluster > (with rechecks enabled), probably it is good idea to downgrade it to info. > > diff --git a/lib/pengine/unpack.c b/lib/pengine/un

Re: [Pacemaker] hangs pending

2014-01-10 Thread Andrew Beekhof
On 10 Jan 2014, at 5:03 pm, Andrey Groshev wrote: > 10.01.2014, 05:29, "Andrew Beekhof" : > >> On 9 Jan 2014, at 11:11 pm, Andrey Groshev wrote: >>> 08.01.2014, 06:22, "Andrew Beekhof" : >>>> On 29 Nov 2013, at 7:17 pm, Andrey Groshe

Re: [Pacemaker] again "return code", now in crm_attribute

2014-01-09 Thread Andrew Beekhof
On 10 Jan 2014, at 4:38 pm, Andrey Groshev wrote: > > > 10.01.2014, 09:06, "Andrew Beekhof" : >> On 10 Jan 2014, at 3:51 pm, Andrey Groshev wrote: >> >>> 10.01.2014, 03:28, "Andrew Beekhof" : >>>> On 9 Jan 2014, at 4:44 pm, And

Re: [Pacemaker] again "return code", now in crm_attribute

2014-01-09 Thread Andrew Beekhof
On 10 Jan 2014, at 3:51 pm, Andrey Groshev wrote: > > > 10.01.2014, 03:28, "Andrew Beekhof" : >> On 9 Jan 2014, at 4:44 pm, Andrey Groshev wrote: >> >>> 09.01.2014, 02:39, "Andrew Beekhof" : >>>> On 18 Dec 2013, a

Re: [Pacemaker] hangs pending

2014-01-09 Thread Andrew Beekhof
On 9 Jan 2014, at 11:11 pm, Andrey Groshev wrote: > > > 08.01.2014, 06:22, "Andrew Beekhof" : >> On 29 Nov 2013, at 7:17 pm, Andrey Groshev wrote: >> >>> Hi, ALL. >>> >>> I'm still trying to cope with the fact that after

Re: [Pacemaker] again "return code", now in crm_attribute

2014-01-09 Thread Andrew Beekhof
On 9 Jan 2014, at 4:44 pm, Andrey Groshev wrote: > 09.01.2014, 02:39, "Andrew Beekhof" : > >> On 18 Dec 2013, at 11:55 pm, Andrey Groshev wrote: >>> Hi, Andrew and ALL. >>> >>> I'm sorry, but I again found an error. :) >&g

Re: [Pacemaker] Breaking dependency loop && stonith

2014-01-09 Thread Andrew Beekhof
On 9 Jan 2014, at 5:05 pm, Andrey Groshev wrote: > > > 08.01.2014, 06:15, "Andrew Beekhof" : >> On 27 Nov 2013, at 12:26 am, Andrey Groshev wrote: >> >>> Hi, ALL. >>> >>> I want to clarify two more questions. >>> Aft

Re: [Pacemaker] starting resources with failed stonith resource

2014-01-09 Thread Andrew Beekhof
On 9 Jan 2014, at 8:29 pm, Frank Van Damme wrote: > 2014/1/8 Andrew Beekhof : >>> I don't understand it: if this means that the stonith devices have >>> failed a million times, >> >> We also set it to 100 when the start action fails. >> >

Re: [Pacemaker] again "return code", now in crm_attribute

2014-01-08 Thread Andrew Beekhof
On 18 Dec 2013, at 11:55 pm, Andrey Groshev wrote: > Hi, Andrew and ALL. > > I'm sorry, but I again found an error. :) > Crux of the problem: > > # crm_attribute --type crm_config --attr-name stonith-enabled --query; echo $? > scope=crm_config name=stonith-enabled value=true > 0 > > # crm_at

Re: [Pacemaker] How to permanently delete ghostly nodes?

2014-01-08 Thread Andrew Beekhof
On 7 Dec 2013, at 8:19 pm, Andrey Rogovsky wrote: > I renamed several nodes and restart the cluster > Now I show a old nodes in status offline > I tried to delete them, but every time you change the cluster configuration > they show in offline again It depends a bit on the version of pacemaker

Re: [Pacemaker] pacemaker + cman - node names and bind address

2014-01-08 Thread Andrew Beekhof
On 5 Dec 2013, at 8:51 pm, Nikola Ciprich wrote: > Hello Digimer, > > and thanks for Your reply. I understand your points, but my question > is about something a bit different.. > > example: I have two nodes, node1 (lan address resolves to 192.168.1.1) > and node2 (lan address resolves to 192

Re: [Pacemaker] catch-22: can't fence node A because node A has the fencing resource

2014-01-08 Thread Andrew Beekhof
On 4 Dec 2013, at 11:47 am, Brian J. Murrell wrote: > > On Tue, 2013-12-03 at 18:26 -0500, David Vossel wrote: >> >> We did away with all of the policy engine logic involved with trying to move >> fencing devices off of the target node before executing the fencing action. >> Behind the scen

Re: [Pacemaker] monitoring redis in master-slave mode

2014-01-08 Thread Andrew Beekhof
On 13 Dec 2013, at 11:06 pm, ESWAR RAO wrote: > > Hi All, > > I have a 3 node setup with HB+pacemaker. I wanted to run redis in > master-slave mode using an ocf script. > https://groups.google.com/forum/#!msg/redis-db/eY3zCKnl0G0/lW5fObHrjwQJ > > But with the below configuration , I am able

Re: [Pacemaker] Error: node does not appear to exist in configuration

2014-01-08 Thread Andrew Beekhof
On 6 Jan 2014, at 8:09 pm, Jerald B. Darow wrote: > Where am I going wrong here? Good question... Chris? > > [root@zero mysql]# pcs cluster standby zero.acenet.us > Error: node 'zero.acenet.us' does not appear to exist in configuration > [root@zero mysql]# pcs cluster cib | grep "node id" >

Re: [Pacemaker] starting resources with failed stonith resource

2014-01-08 Thread Andrew Beekhof
On 8 Jan 2014, at 2:41 am, Frank Van Damme wrote: > Hi list, > > I recently had some trouble with a dual-node mysql cluster, which runs > in master-slave mode with Percona resource manager. While analyzing > what happened to the cluster, I found this in syslog (network trouble, > the cluster lo

<    2   3   4   5   6   7   8   9   10   11   >