Re: [Pacemaker] Java application failover problem

2013-07-08 Thread Andrew Beekhof
Can you include a crm_report for your test scenario? a) I need the pe files, but also b) parsing line wrapped logs is seriously painful On 05/07/2013, at 7:09 PM, Martin Gazak wrote: > Hello, > we are facing the problem with the simple (I hope) cluster configuration > with 2 nodes ims0 and ims

Re: [Pacemaker] Pacemaker 1.1.10 rc 5 & rc 6

2013-07-08 Thread Andrew Beekhof
On 06/07/2013, at 1:28 AM, Andrii Moiseiev wrote: > Any clues? Not with only fragments of the logs Do you have a log file (not syslog) configured? I'll need that complete file If you could also set PCMK_trace_function=apply_xml_diff (see http://blog.clusterlabs.org/blog/2013/pacemaker-loggi

Re: [Pacemaker] Using "avoids" location constraint

2013-07-08 Thread Andrew Beekhof
On 08/07/2013, at 11:35 PM, Andrew Morgan wrote: > Thanks Florian. > > The problem I have is that I'd like to define a HA configuration that isn't > dependent on a specific set of fencing hardware (or any fencing hardware at > all for that matter) and as the stack has the quorum capability in

Re: [Pacemaker] Best way to notify stonith action

2013-07-08 Thread Andrew Beekhof
On 09/07/2013, at 12:50 AM, Andreas Mock wrote: > Hi all, > > thank you for your recommendations. > I just hoped that there is something pacemaker internal, > e.g. like sending traps via snmp or something like that. This is something crm_mon can now send traps, emails or call scripts for. >

Re: [Pacemaker] Full API description for Fence Agent

2013-07-08 Thread Andrew Beekhof
the agent and config. I'm glad you've got it working, but its hard to discuss whether an agent is correct/sane without knowing more about it. > > Additions and corrections welcome for all > fence agent programmers. > > Best regards > Andreas Mock > > > &

Re: [Pacemaker] Question concerning pacemaker-1-1-10-rc6

2013-07-08 Thread Andrew Beekhof
On 08/07/2013, at 5:57 PM, Andreas Mock wrote: > Hi Andrew, > > I'm taking the builds from > http://clusterlabs.org/rpm-test-next/rhel-6/x86_64/ > to avoid compiling on my own. > Do these build relate to the release candidates you're announcing? Not at all, those are whatever I happen to be t

Re: [Pacemaker] Another question about fencing/stonithing

2013-07-07 Thread Andrew Beekhof
On 06/07/2013, at 1:22 AM, Digimer wrote: > Andrew might know the trick. In theory, putting your agent into the /usr/sbin > or /sbin directory (where ever the other agents are) Yep. As long as its there, executable and takes arguments via stdin... > should "just work". You're sure the exit co

Re: [Pacemaker] Another question about fencing/stonithing

2013-07-07 Thread Andrew Beekhof
On 05/07/2013, at 5:34 PM, Andreas Mock wrote: > Hi all, > > I just wrote a stonith agent which IMHO implements the > API spec found at https://fedorahosted.org/cluster/wiki/FenceAgentAPI. > > But it seems it has a problem when used as pacemaker stonith device. > > What has to be done, to hav

Re: [Pacemaker] Full API description for Fence Agent

2013-07-07 Thread Andrew Beekhof
y message level filtering is done? There is no filtering. > > Best regards > Andreas > > > -Ursprüngliche Nachricht- > Von: Andrew Beekhof [mailto:and...@beekhof.net] > Gesendet: Donnerstag, 4. Juli 2013 13:41 > An: The Pacemaker cluster resource manager >

Re: [Pacemaker] Node addition policy

2013-07-04 Thread Andrew Beekhof
On 04/07/2013, at 7:24 PM, Vladislav Bogdanov wrote: > Hi, > > I think about safest way to expanding the cluster, and my observations > show that new nodes are always added in the "online" state > (standby="off"). I would like nodes to appear in standby="on" state > unless they can be fenced im

Re: [Pacemaker] changing cluster-ip

2013-07-04 Thread Andrew Beekhof
On 04/07/2013, at 8:37 PM, Leon Fauster wrote: > Am 04.07.2013 um 12:02 schrieb andreas graeper : >> >> i tried to change the IPaddr2 parameter ip >> >> 1) crm resource edit >> >> 2) pcs resource update = >> >> in both cases the cib is modified (`crm configure show` shows) >> but old

Re: [Pacemaker] Full API description for Fence Agent

2013-07-04 Thread Andrew Beekhof
On 04/07/2013, at 7:24 PM, Andreas Mock wrote: > Hi digimer, > > I would like to take your offer and asking the following: > > The API documents says nothing about the correct way > of giving messages back to the stonith daemon. > So, what is the right way to write error/warn/info messages. >

Re: [Pacemaker] drbd on passive node not started

2013-07-02 Thread Andrew Beekhof
On 21/06/2013, at 11:36 PM, andreas graeper wrote: > hi, > n1 active node is started and everything works fine, but after reboot n2 > drbd is not started by pacemaker. Are you sure? I see a couple of start operations: Jun 21 15:10:29 [5093] n2 lrmd:debug: operation_finished: dr

Re: [Pacemaker] setup advice

2013-07-02 Thread Andrew Beekhof
I wouldn't be doing anything without corosync2 and its option that requires all nodes to be online before quorum is granted. Otherwise I can imagine ways that the old master might try to promote itself. On 02/07/2013, at 7:18 PM, Michael Schwartzkopff wrote: > Am Dienstag, 2. Juli 2013, 09:47:3

Re: [Pacemaker] Disconnected from CIB?

2013-07-02 Thread Andrew Beekhof
On 02/07/2013, at 8:20 PM, Lars Marowsky-Bree wrote: > On 2013-07-02T20:12:08, Andrew Beekhof wrote: > >>> It seems related to the number of times I poll the CIB, too; I seem to >>> hit a transient window there, maybe. Since I dropped the number of polls >>>

Re: [Pacemaker] Disconnected from CIB?

2013-07-02 Thread Andrew Beekhof
On 02/07/2013, at 7:54 PM, Lars Marowsky-Bree wrote: > On 2013-07-02T08:25:18, Andrew Beekhof wrote: > >>> if (cli_config_update(&cib_copy, NULL, FALSE) == FALSE) { >> Also, change FALSE -> TRUE here so that you see the validation errors. >

Re: [Pacemaker] some pacemaker questions

2013-07-02 Thread Andrew Beekhof
On 02/07/2013, at 7:32 PM, Lars Marowsky-Bree wrote: > On 2013-07-02T10:46:09, Andrew Beekhof wrote: > >>> Our problem is that if i give "crm resource stop vm1" and immediatly after >>> "crm resource stop vm2" >>> it happens that p

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Andrew Beekhof
Is "How important is the ability to use redundant PDUs for fencing?" better? On 02/07/2013, at 3:30 PM, Vladislav Bogdanov wrote: > 02.07.2013 03:10, Andrew Beekhof wrote: >> >> On 02/07/2013, at 8:51 AM, Andrew Beekhof wrote: >> >>> >>> On 0

Re: [Pacemaker] Disconnected from CIB?

2013-07-01 Thread Andrew Beekhof
On 02/07/2013, at 12:12 AM, Lars Marowsky-Bree wrote: > On 2013-07-01T14:15:01, Lars Marowsky-Bree wrote: > >> Reproducible on the non-DC node during full start-up of a cluster, yes. > > And it turns out to be a CIB problem afterall. Or I'm doing something > else wrong: > > I'm doing, basica

Re: [Pacemaker] some pacemaker questions

2013-07-01 Thread Andrew Beekhof
Apparently CC'ing the list on my replies was too subtle... can you please sign up to and reply to the mailing list? I don't do private support. On 28/06/2013, at 10:48 PM, Sartoratti Lorenzo wrote: > Our problem is that if i give "crm resource stop vm1" and immediatly after > "crm resource st

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Andrew Beekhof
On 02/07/2013, at 8:51 AM, Andrew Beekhof wrote: > > On 01/07/2013, at 10:19 PM, Vladislav Bogdanov wrote: > >> 01.07.2013 15:10, Andrew Beekhof wrote: >> >>> >>> And if people start using it, then we might look at simplifying it. >> &

Re: [Pacemaker] Question to fencing/stonithing

2013-07-01 Thread Andrew Beekhof
On 01/07/2013, at 10:28 PM, Andreas Mock wrote: > Hi all, > > just want to get clear about startup fencing. > > Scenario: RHEL 6.4, cman, 2-node-cluster, pacemaker, > fence via pcmk-redirect. pacemaker stonith enabled, > no-quorum-policy=ignore, CMAN_QUORUM_TIMEOUT=0 > > > When should a star

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Andrew Beekhof
On 01/07/2013, at 10:19 PM, Vladislav Bogdanov wrote: > 01.07.2013 15:10, Andrew Beekhof wrote: > >> >> And if people start using it, then we might look at simplifying it. > > May be it's worth to have anonymous poll at clusterlabs.org for that?

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Andrew Beekhof
On 02/07/2013, at 2:58 AM, Digimer wrote: > On 07/01/2013 12:43 PM, Lars Marowsky-Bree wrote: >> On 2013-07-01T11:53:29, Digimer wrote: >> >>> You are right, of course. Imagine though that the IPMI BMC's network >>> port or cable could have silently failed some time before the node >>> failed.

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Andrew Beekhof
On 02/07/2013, at 2:13 AM, Digimer wrote: >> >> Yes, but people around here also tend to be quite vocal when they think >> something is missing. >> More so if its something critical. > > I mean more than you, Jake and Vladislav. That's not quite a party yet :-) _

Re: [Pacemaker] Disconnected from CIB?

2013-07-01 Thread Andrew Beekhof
On 02/07/2013, at 12:12 AM, Lars Marowsky-Bree wrote: > On 2013-07-01T14:15:01, Lars Marowsky-Bree wrote: > >> Reproducible on the non-DC node during full start-up of a cluster, yes. > > And it turns out to be a CIB problem afterall. Or I'm doing something > else wrong: > > I'm doing, basica

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Andrew Beekhof
On 01/07/2013, at 10:06 PM, Vladislav Bogdanov wrote: > 01.07.2013 14:53, Andrew Beekhof wrote: >> >> On 01/07/2013, at 9:45 PM, Vladislav Bogdanov wrote: >> >>> 01.07.2013 14:14, Andrew Beekhof wrote: >>> ... >>>>>> I'm yet t

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Andrew Beekhof
On 01/07/2013, at 9:53 PM, Lars Marowsky-Bree wrote: > On 2013-07-01T21:37:38, Andrew Beekhof wrote: > >>> And apparently, this is one of the scenarios for which fence topology >>> was created and supports multiple devices per level. I'd venture the >>> o

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Andrew Beekhof
On 01/07/2013, at 9:45 PM, Vladislav Bogdanov wrote: > 01.07.2013 14:14, Andrew Beekhof wrote: > ... >>>> I'm yet to be convinced that having two PDUs is helping those people in >>>> the first place. >>>> If it were actually useful, I suspect mo

Re: [Pacemaker] some pacemaker questions

2013-07-01 Thread Andrew Beekhof
On 28/06/2013, at 10:48 PM, Sartoratti Lorenzo wrote: > Hi > Il 06/28/2013 12:30 PM, Andrew Beekhof ha scritto: >> On 28/06/2013, at 5:19 AM, Lorenzo Sartoratti >> wrote: >> >>> Hi, >>> we are using pacemaker since two years and we are quite

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Andrew Beekhof
On 30/06/2013, at 4:48 AM, Lars Marowsky-Bree wrote: > On 2013-06-29T09:22:20, Andrew Beekhof wrote: > >>> This doesn't help people who have dual power rails/PDUs for power >>> redundancy. >> I'm yet to be convinced that having two PDUs is helping those

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Andrew Beekhof
On 01/07/2013, at 5:17 PM, Florian Crouzat wrote: > Le 29/06/2013 01:22, Andrew Beekhof a écrit : >> >> On 29/06/2013, at 12:22 AM, Digimer wrote: >> >>> On 06/28/2013 06:21 AM, Andrew Beekhof wrote: >>>> >>>> On 28/06/2013, at 5:22

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Andrew Beekhof
On 01/07/2013, at 5:32 PM, Vladislav Bogdanov wrote: > 29.06.2013 02:22, Andrew Beekhof wrote: >> >> On 29/06/2013, at 12:22 AM, Digimer wrote: >> >>> On 06/28/2013 06:21 AM, Andrew Beekhof wrote: >>>> >>>> On 28/06/2013, at 5:22 PM, L

Re: [Pacemaker] Disconnected from CIB?

2013-07-01 Thread Andrew Beekhof
On 30/06/2013, at 10:09 PM, Lars Marowsky-Bree wrote: > Hi, > > sbd connects to the CIB and watches updates come in to see if pacemaker > considers the node healthy still, and if the cluster partition is > quorate according to the CIB. That's all working fine. > > But I've noticed that during

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-06-28 Thread Andrew Beekhof
On 29/06/2013, at 12:36 AM, Lars Marowsky-Bree wrote: > On 2013-06-28T10:20:56, Digimer wrote: > primitive fence_n01_psu1_off stonith:fence_apc_snmp \ params ipaddr="an-p01" pcmk_reboot_action="off" port="1" pcmk_host_list="an-c03n01.alteeve.ca" primitive fence_n01_p

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-06-28 Thread Andrew Beekhof
On 29/06/2013, at 12:22 AM, Digimer wrote: > On 06/28/2013 06:21 AM, Andrew Beekhof wrote: >> >> On 28/06/2013, at 5:22 PM, Lars Marowsky-Bree wrote: >> >>> On 2013-06-27T12:53:01, Digimer wrote: >>> >>>> primitive fence_n01_psu1_off st

Re: [Pacemaker] Release model

2013-06-28 Thread Andrew Beekhof
On 29/06/2013, at 12:15 AM, Digimer wrote: > On 06/28/2013 08:04 AM, Andrew Beekhof wrote: >> Under this model, not only do I have to find the time to write and test the >> new addition, but I also have to: >> * keep maintaining the old code until... when? >> * pr

Re: [Pacemaker] Release model

2013-06-28 Thread Andrew Beekhof
On 28/06/2013, at 11:37 PM, Lars Marowsky-Bree wrote: >>> >>> I'm not sure there's a huge downside in it for you? >> Ok, lets take attrd for example - which I've been wanted to rewrite to be >> truly atomic for half a decade or more. > > If it's rewritten in a way that doesn't affect external

Re: [Pacemaker] Node name problems after upgrading to 1.1.9

2013-06-28 Thread Andrew Beekhof
On 28/06/2013, at 9:16 PM, Andrew Beekhof wrote: > > On 28/06/2013, at 8:10 PM, Andrew Beekhof wrote: > >> >> On 28/06/2013, at 6:42 PM, Bernardo Cabezas Serra wrote: >> >>> Hello Andrew, >>> >>> El 27/06/13 14:44, Andrew Beekhof escr

Re: [Pacemaker] Release model

2013-06-28 Thread Andrew Beekhof
On 28/06/2013, at 8:59 PM, Lars Marowsky-Bree wrote: > On 2013-06-28T18:41:35, Andrew Beekhof wrote: > >>> There's an exception: dropping commonly used external interfaces (say, >>> "ptest") needs to be announced a few releases in advance before e

Re: [Pacemaker] WARNINGS and ERRORS on syslog after update to 1.1.7

2013-06-28 Thread Andrew Beekhof
On 27/06/2013, at 10:46 PM, Andrew Beekhof wrote: > > On 25/06/2013, at 9:44 PM, Francesco Namuri wrote: > >>> Can you attach /var/lib/pengine/pe-input-64.bz2 from SERVERNAME1 please? >>> >>> I'll be able to see if its something we've already f

Re: [Pacemaker] Node name problems after upgrading to 1.1.9

2013-06-28 Thread Andrew Beekhof
On 28/06/2013, at 8:10 PM, Andrew Beekhof wrote: > > On 28/06/2013, at 6:42 PM, Bernardo Cabezas Serra wrote: > >> Hello Andrew, >> >> El 27/06/13 14:44, Andrew Beekhof escribió: >>> You should see additional logs sent to /var/log/pacemaker.log >>

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-06-28 Thread Andrew Beekhof
On 28/06/2013, at 8:46 PM, Lars Marowsky-Bree wrote: > On 2013-06-28T20:21:22, Andrew Beekhof wrote: > >>> It looks correct, but not quite sane. ;-) That seems not to be >>> something you can address, though. I'm thinking that fencing topology >>>

Re: [Pacemaker] some pacemaker questions

2013-06-28 Thread Andrew Beekhof
On 28/06/2013, at 5:19 AM, Lorenzo Sartoratti wrote: > Hi, > we are using pacemaker since two years and we are quite satisfied: thanks! > We have 30 virtual machines running in the cluster and maintained by > pacemaker. > When we stop the machines with crm, they are not stopped in parallel but

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-06-28 Thread Andrew Beekhof
On 28/06/2013, at 5:22 PM, Lars Marowsky-Bree wrote: > On 2013-06-27T12:53:01, Digimer wrote: > >> primitive fence_n01_psu1_off stonith:fence_apc_snmp \ >>params ipaddr="an-p01" pcmk_reboot_action="off" port="1" >> pcmk_host_list="an-c03n01.alteeve.ca" >> primitive fence_n01_psu1_on st

Re: [Pacemaker] Node name problems after upgrading to 1.1.9

2013-06-28 Thread Andrew Beekhof
On 28/06/2013, at 6:42 PM, Bernardo Cabezas Serra wrote: > Hello Andrew, > > El 27/06/13 14:44, Andrew Beekhof escribió: >> You should see additional logs sent to /var/log/pacemaker.log > > Finally yesterday issue happened again. This time, node "selavi" was

[Pacemaker] Release model

2013-06-28 Thread Andrew Beekhof
On 28/06/2013, at 5:30 PM, Lars Marowsky-Bree wrote: > On 2013-06-28T11:11:00, Andrew Beekhof wrote: > >>>> Maybe you're right, maybe I should stop fighting it and go with the >>>> firefox approach. >>>> That certainly seemed to piss a lot of

Re: [Pacemaker] weird drbd/cluster behaviour

2013-06-27 Thread Andrew Beekhof
On 27/06/2013, at 2:21 AM, Саша Александров wrote: > Hi! > > Fencing is disabled for now, the issue is not with fencing: the question is - > why only one out of three DRBD master-slave sets is recognized by pacemaker, Pacemaker knows nothing of drbd or any other kind of service. All that know

Re: [Pacemaker] corosync stop and consequences

2013-06-27 Thread Andrew Beekhof
On 27/06/2013, at 3:40 AM, andreas graeper wrote: > thanks four your answer. > but still question open. > > when i switch off the active node: though this is done reliable for me, the > still passive node wants to know for sure and will kill the (already dead) > former active node. > i have

Re: [Pacemaker] Reminder: Pacemaker-1.1.10-rc5 is out there

2013-06-27 Thread Andrew Beekhof
On 28/06/2013, at 12:52 AM, Lars Marowsky-Bree wrote: >> Maybe you're right, maybe I should stop fighting it and go with the >> firefox approach. >> That certainly seemed to piss a lot of people off though... > > If there's one message I've learned in 13 years of work on Linux HA, > then it is

Re: [Pacemaker] WARNINGS and ERRORS on syslog after update to 1.1.7

2013-06-27 Thread Andrew Beekhof
On 25/06/2013, at 9:44 PM, Francesco Namuri wrote: >> Can you attach /var/lib/pengine/pe-input-64.bz2 from SERVERNAME1 please? >> >> I'll be able to see if its something we've already fixed. Nope still there. I will attempt to fix this tomorrow. ___

Re: [Pacemaker] Node name problems after upgrading to 1.1.9

2013-06-27 Thread Andrew Beekhof
On 27/06/2013, at 10:29 PM, Bernardo Cabezas Serra wrote: > Hello, > > Ohhh, sorry, but I have deleted node selavi and restarted, and now works > OK and I can't reproduce the bug :( That is unfortunate > > El 27/06/13 12:32, Andrew Beekhof escribió: >> o,

Re: [Pacemaker] [OT] MySQL Replication

2013-06-27 Thread Andrew Beekhof
On 27/06/2013, at 1:53 AM, Denis Witt wrote: > On Wed, 26 Jun 2013 21:33:30 +1000 > Andrew Beekhof wrote: > >>> When you run ./autogen.sh it tries to start an rpm command, this >>> failed because I didn't had rpm installed. >> >> How did it

Re: [Pacemaker] Reminder: Pacemaker-1.1.10-rc5 is out there

2013-06-27 Thread Andrew Beekhof
On 27/06/2013, at 5:40 PM, Lars Marowsky-Bree wrote: > On 2013-06-27T14:28:19, Andrew Beekhof wrote: > >> I wouldn't say the 6 months between 1.1.7 and 1.1.8 was a particularly >> aggressive release cycle. > > For the amount of changes in there, I think yes. An

Re: [Pacemaker] Node name problems after upgrading to 1.1.9

2013-06-27 Thread Andrew Beekhof
On 27/06/2013, at 8:20 PM, Bernardo Cabezas Serra wrote: > ¿Do you think it's a configuration problem? No, more likely a bug. Which is concerning since I thought I had this particular kind ironed out. Could you set PCMK_trace_functions=crm_get_peer on selavi and repeat the test? The exact

Re: [Pacemaker] Reminder: Pacemaker-1.1.10-rc5 is out there

2013-06-26 Thread Andrew Beekhof
On 26/06/2013, at 10:37 PM, Lars Marowsky-Bree wrote: > On 2013-06-26T21:31:14, Andrew Beekhof wrote: > >>> Distributions can take care of them when they integrate them; basically >>> they'll trickle through until the whole stack the distributions ship >>&g

Re: [Pacemaker] Reminder: Pacemaker-1.1.10-rc5 is out there

2013-06-26 Thread Andrew Beekhof
On 26/06/2013, at 7:30 PM, Lars Marowsky-Bree wrote: > On 2013-06-25T20:28:29, Andrew Beekhof wrote: > >>> Perhaps a numbering scheme like the Linux kernel would fit better than a >>> stable/unstable branch distinction. Changes that deserve the "unstable" &g

Re: [Pacemaker] corosync stop and consequences

2013-06-26 Thread Andrew Beekhof
On 26/06/2013, at 12:24 AM, Digimer wrote: > On 06/25/2013 07:29 AM, andreas graeper wrote: >> hi, >> maybe again and again the same question, please excuse. >> >> two nodes (n1 active / n2 passive) and `service corosync stop` on active. >> does the node, that is going down, tells the other tha

Re: [Pacemaker] [OT] MySQL Replication

2013-06-26 Thread Andrew Beekhof
On 26/06/2013, at 6:51 PM, Denis Witt wrote: > On Wed, 26 Jun 2013 12:35:33 +1000 > Andrew Beekhof wrote: > >>> System is Debian Wheezy which means version 0.11.1-2 for libqb-dev. > >> rpm errors on debian? >> I'm confused. > > When you run .

Re: [Pacemaker] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file

2013-06-26 Thread Andrew Beekhof
Sent from a mobile device On 26/06/2013, at 5:44 PM, Jacek Konieczny wrote: > On Wed, 26 Jun 2013 14:35:03 +1000 > Andrew Beekhof wrote: >> Urgh: >> >> infoJun 25 13:40:10 lrmd_ipc_connect(913):0: Connecting to lrmd >> trace Jun 25 13:40:10 pick_ipc_bu

Re: [Pacemaker] [OT] MySQL Replication

2013-06-25 Thread Andrew Beekhof
On 26/06/2013, at 3:01 AM, Denis Witt wrote: > On Tue, 25 Jun 2013 17:12:15 +0200 > Denis Witt wrote: > >> ./configure runs fine, but make didn't. I don't remember the exact >> error message and before I could run it again I have to solve my >> OCFS2-Problem. But I'll try again and post it he

Re: [Pacemaker] WARNINGS and ERRORS on syslog after update to 1.1.7

2013-06-25 Thread Andrew Beekhof
On 25/06/2013, at 5:37 PM, Francesco Namuri wrote: > Hi, > after an update to the new debian stable, from pacemaker 1.0.9.1 to > 1.1.7 I'm getting some strange errors on syslog: Thats a hell of a jump there. Can you attach /var/lib/pengine/pe-input-64.bz2 from SERVERNAME1 please? I'll be able

Re: [Pacemaker] Reminder: Pacemaker-1.1.10-rc5 is out there

2013-06-25 Thread Andrew Beekhof
On 25/06/2013, at 6:32 PM, Lars Marowsky-Bree wrote: > On 2013-06-25T10:16:58, Andrey Groshev wrote: > >> Ok, I recently became engaged in the PСMK, so for me it is a surprize. >> The more so in all the major linux distributions version 1.1.х. > > Pacemaker has very strong regression and syst

Re: [Pacemaker] [corosync] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file

2013-06-25 Thread Andrew Beekhof
On 25/06/2013, at 5:56 PM, Jacek Konieczny wrote: > On Tue, 25 Jun 2013 10:50:14 +0300 > Vladislav Bogdanov wrote: >> I would recommend qb 1.4.4. 1.4.3 had at least one nasty bug which >> affects pacemaker. > > Just tried that. It didn't help. Can you turn on the blockbox please? Details at h

Re: [Pacemaker] [corosync] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file

2013-06-24 Thread Andrew Beekhof
On 25/06/2013, at 4:34 PM, Jacek Konieczny wrote: > On Tue, 25 Jun 2013 10:10:13 +1000 > Andrew Beekhof wrote: >> On 24/06/2013, at 9:31 PM, Jacek Konieczny wrote: >> >>> >>> After I have upgraded Pacemaker from 1.1.8 to 1.1.9 on a node I get >&

Re: [Pacemaker] Reminder: Pacemaker-1.1.10-rc5 is out there

2013-06-24 Thread Andrew Beekhof
On 25/06/2013, at 2:33 PM, Andrey Groshev wrote: > > > 25.06.2013, 04:46, "Andrew Beekhof" : >> On 24/06/2013, at 3:44 PM, Vladislav Bogdanov wrote: >> >>> 24.06.2013 04:17, Andrew Beekhof wrote: >>>> Either people have given up on te

Re: [Pacemaker] error when build pacemaker 1.1.10-rc5 and corosync-2.3.0

2013-06-24 Thread Andrew Beekhof
On 25/06/2013, at 1:20 PM, Takatoshi MATSUO wrote: > 2013/6/25 Andrew Beekhof : >> >> On 25/06/2013, at 12:12 PM, Takatoshi MATSUO wrote: >> >>> 2013/6/25 Andrew Beekhof : >>>> >>>> On 24/06/2013, at 3:03 PM, Takatoshi MATSUO wrot

Re: [Pacemaker] Additional help with ClusterMon

2013-06-24 Thread Andrew Beekhof
On 24/06/2013, at 7:45 PM, Michael Furman wrote: > > What kind of status? > > > We want to run one command that will return the status of the node in one > value: online or offline (or standby). > > Please find attached 3 core files. I can't read the core files. They're specific to your m

Re: [Pacemaker] Additional help with ClusterMon

2013-06-24 Thread Andrew Beekhof
On 24/06/2013, at 7:45 PM, Michael Furman wrote: > > What kind of status? > > > We want to run one command that will return the status of the node in one > value: online or offline (or standby). That could reasonably be added to crm_node. ___ Pa

Re: [Pacemaker] error when build pacemaker 1.1.10-rc5 and corosync-2.3.0

2013-06-24 Thread Andrew Beekhof
On 25/06/2013, at 12:12 PM, Takatoshi MATSUO wrote: > 2013/6/25 Andrew Beekhof : >> >> On 24/06/2013, at 3:03 PM, Takatoshi MATSUO wrote: >> >>> Hi Andrew >>> >>> 2013/6/24 Andrew Beekhof : >>>> >>>> On 24/06/2013, at

Re: [Pacemaker] output crm_mon

2013-06-24 Thread Andrew Beekhof
On 24/06/2013, at 8:37 PM, andreas graeper wrote: > hi, > crm_mon -rA1 shows : > > ClusterIP(ocf::heartbeat:IPaddr2):Started lisel1 > Master/Slave Set: ms_drbd_r0 [p_drbd_r0] > Masters: [ lisel1 ] > Slaves: [ lisel2 ] > p_lvm_r0(ocf::heartbeat:LVM):Started lisel1 >

Re: [Pacemaker] Can resource agents know the cause of stop action?

2013-06-24 Thread Andrew Beekhof
On 24/06/2013, at 8:55 PM, Munehiro SATO wrote: > Hi all, > > Can resource agents know the cause of stop action? Not really, no. > > I want to know following situations in RA for my application(it's > Master/Slave resource). > > * stop by "crm resource stop" > In this case, my application

Re: [Pacemaker] error when build pacemaker 1.1.10-rc5 and corosync-2.3.0

2013-06-24 Thread Andrew Beekhof
On 24/06/2013, at 3:03 PM, Takatoshi MATSUO wrote: > Hi Andrew > > 2013/6/24 Andrew Beekhof : >> >> On 24/06/2013, at 12:46 PM, Takatoshi MATSUO wrote: >> >>> Hi Andrew >>> >>> I received similar error using 6ea4b7e(HEAD) under

Re: [Pacemaker] Reminder: Pacemaker-1.1.10-rc5 is out there

2013-06-24 Thread Andrew Beekhof
On 24/06/2013, at 3:44 PM, Vladislav Bogdanov wrote: > 24.06.2013 04:17, Andrew Beekhof wrote: >> Either people have given up on testing, or rc5[1] is looking good for the >> final release. > > Is it going to be 1.1.10 or 1.2.0 (2.0.0)? First its going to be 1.1.10 and,

Re: [Pacemaker] [OT] MySQL Replication

2013-06-24 Thread Andrew Beekhof
On 22/06/2013, at 5:31 AM, Denis Witt wrote: > Hi List, > > might be offtopic but I'm sure there are may People on this List who had > answered this question for themselfs. > > I have a MySQL Master/Master/Slave setup which is rather unreliable, so i'm > asking myself if it might be better

Re: [Pacemaker] [corosync] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file

2013-06-24 Thread Andrew Beekhof
On 24/06/2013, at 9:31 PM, Jacek Konieczny wrote: > > After I have upgraded Pacemaker from 1.1.8 to 1.1.9 on a node I get the > following errors > in my syslog and Pacemaker doesn't seem to be able to start services on this > node. What else did you upgrade? libqb too? > > Jun 24 13:19:44

Re: [Pacemaker] Additional help with ClusterMon

2013-06-24 Thread Andrew Beekhof
On 24/06/2013, at 6:29 PM, Michael Furman wrote: > Andrew, > I have send the core files attached to mail from Date: Thu, 20 Jun 2013 > 13:18:04 +0300. Unfortunately the mail server did not sent it. > Can you look on the mail server? 100s of spam messages but nothing from you. Please send them

Re: [Pacemaker] error when build pacemaker 1.1.10-rc5 and corosync-2.3.0

2013-06-23 Thread Andrew Beekhof
rtbeat/attrd > /usr/lib64/heartbeat/cib > /usr/lib64/heartbeat/crmd > /usr/lib64/heartbeat/pengine > /usr/lib64/heartbeat/stonithd > /usr/sbin/crm_uuid > make: *** [rpm] Error 1 > - > > Regards, > Takatoshi MATSUO > > 2013/6/21 Andrew Beekhof :

[Pacemaker] Reminder: Pacemaker-1.1.10-rc5 is out there

2013-06-23 Thread Andrew Beekhof
Either people have given up on testing, or rc5[1] is looking good for the final release. So just a reminder, we're particularly looking for feedback in the following areas: | plugin-based clusters, ACLs, the new –ban and –clear commands, and admin actions | (such as moving and stopping resour

Re: [Pacemaker] Pacemaker fails to switch on or off PDU sockets with fence_wti

2013-06-23 Thread Andrew Beekhof
On 21/06/2013, at 5:38 PM, Thibaut Pouzet wrote: > Le 20/06/2013 12:23, Andrew Beekhof a écrit : >> On 20/06/2013, at 6:51 PM, Thibaut Pouzet >> wrote: >> >>> Le 19/06/2013 23:57, Andrew Beekhof a écrit : >>>> On 20/06/2013, at 1:57 AM, T

Re: [Pacemaker] known problem with corosync 1.4.1 on centos64 ?

2013-06-23 Thread Andrew Beekhof
On 22/06/2013, at 5:13 AM, Andreas Mock wrote: > Hi Andreas, > > my two cents to your questions: > > a) If you want to learn most, take any distro and compile the components from > source and afterwards use them. => Most learned. Well, yes, but not always about clustering and not always thi

Re: [Pacemaker] Starting Pacemaker Cluster Manager: [FAILED]

2013-06-20 Thread Andrew Beekhof
h? > > Thx, > CB > > > -Original Message- > From: Andrew Beekhof [mailto:and...@beekhof.net] > Sent: Tuesday, June 18, 2013 8:03 PM > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] Starting Pacemaker Cluster Manager: [FAILED] >

Re: [Pacemaker] Additional help with ClusterMon

2013-06-20 Thread Andrew Beekhof
On 20/06/2013, at 8:49 PM, Michael Furman wrote: > Sending the second mail without attachments - the first one with with > attachments :) I don't see the one with attachments. Did you paste the output from gdb in them? > > From: michael_fur...@hotmail.com > To: pacemaker@oss.clusterlabs.org

Re: [Pacemaker] small warning when build packages

2013-06-20 Thread Andrew Beekhof
Just this is also sufficient: index 930f8b5..483f94d 100644 --- a/configure.ac +++ b/configure.ac @@ -1010,7 +1010,7 @@ LIBQB_LOG=1 PCMK_FEATURES="$PCMK_FEATURES libqb-logging libqb-ipc" if - !pkg-config --atleast-version 0.13 libqb + ! pkg-config --atleast-version 0.13 libqb then AC_

Re: [Pacemaker] error when build pacemaker 1.1.10-rc5 and corosync-2.3.0

2013-06-20 Thread Andrew Beekhof
On 20/06/2013, at 11:22 PM, Andrey Groshev wrote: > Hi, again. > Still one week ago package normaly rebuilding. > Today, I resive error: Just remove all references to clusterlib-devel from pacemaker.spec.in I'll see what I can do > > # make rpm-dep > if [ x != x`which yum-builddep 2>/dev/null

Re: [Pacemaker] STONITH without mandatory success possible?

2013-06-20 Thread Andrew Beekhof
On 20/06/2013, at 11:50 PM, Doug Clow wrote: > I'll do some experiments to see if I can get Corosync more reliable. I'm > using Corosync v1 as part of cman-corosync-pacemaker. RRP with one port on a > switch and the other port on a crossover cable between the two hosts > (although technical

Re: [Pacemaker] Pacemaker fails to switch on or off PDU sockets with fence_wti

2013-06-20 Thread Andrew Beekhof
On 20/06/2013, at 6:51 PM, Thibaut Pouzet wrote: > Le 19/06/2013 23:57, Andrew Beekhof a écrit : >> On 20/06/2013, at 1:57 AM, Thibaut Pouzet >> wrote: >> >>> Hi, >>> >>> I am trying to configure fencing on a test platform with two nodes

Re: [Pacemaker] STONITH without mandatory success possible?

2013-06-20 Thread Andrew Beekhof
On 20/06/2013, at 5:02 PM, Vladislav Bogdanov wrote: > 20.06.2013 09:00, Andrew Beekhof wrote: >> >> On 20/06/2013, at 2:52 PM, Vladislav Bogdanov wrote: >> >>> 20.06.2013 00:36, Andrew Beekhof wrote: >>>> >>>> On 20/06/201

Re: [Pacemaker] STONITH without mandatory success possible?

2013-06-19 Thread Andrew Beekhof
On 20/06/2013, at 2:52 PM, Vladislav Bogdanov wrote: > 20.06.2013 00:36, Andrew Beekhof wrote: >> >> On 20/06/2013, at 6:33 AM, Doug Clow wrote: >> >>> Hello All, >>> >>> I have some 2-node active-passive clusters that occasionally lose >

Re: [Pacemaker] Pacemaker fails to switch on or off PDU sockets with fence_wti

2013-06-19 Thread Andrew Beekhof
On 20/06/2013, at 1:57 AM, Thibaut Pouzet wrote: > Hi, > > I am trying to configure fencing on a test platform with two nodes under > corosync+cman+pacemaker on CentOS 6.4. Both nodes have a double power supply > from a WTI NPS-8HD16-3. IPMI fencing works like a charm, however I cannot get

Re: [Pacemaker] STONITH without mandatory success possible?

2013-06-19 Thread Andrew Beekhof
On 20/06/2013, at 6:33 AM, Doug Clow wrote: > Hello All, > > I have some 2-node active-passive clusters that occasionally lose Corosync > connectivity. The connectivity is fixed with a reboot. They don't have > shared storage so stonith doesn't have to happen for another node to take > con

Re: [Pacemaker] fence_xvm / fence_virtd problem

2013-06-18 Thread Andrew Beekhof
Try the default multicast address perhaps? address = "225.0.0.12"; You checked /etc/cluster/fence_xvm.key match on both machines I assume? On 17/06/2013, at 4:09 AM, Digimer wrote: > Guest's firewall is off entirely, as is selinux. > > On 06/16/2013 02:25 AM, Vladislav Bogdanov wrote: >> 16.0

Re: [Pacemaker] Additional help with ClusterMon

2013-06-18 Thread Andrew Beekhof
On 18/06/2013, at 11:58 PM, Michael Furman wrote: > I was able to run the ClusterMon agent. > > Couple of comments: > 1. When I uses ocf:pacemaker:ClusterMon it crashes a lot with the > following core file: > > file core.22992 > core.22992: ELF 64-bit LSB core file x86-64, version 1 (SY

Re: [Pacemaker] add / rmv resources

2013-06-18 Thread Andrew Beekhof
On 19/06/2013, at 2:23 AM, andreas graeper wrote: > hi, > i use s_xxx.sh and k_xxx.sh scripts to create / remove resources xxx > > when after removing a resource, i call crm_mon, i can see lots of resources > are stopped. > little later they are started again. > > does a > pcs resource s

Re: [Pacemaker] Starting Pacemaker Cluster Manager: [FAILED]

2013-06-18 Thread Andrew Beekhof
rvice cman start" or "service corosync start"? > > > Can you provide a link to a newer pacemaker package compatible with UBUNTU > 11.10 Server? No. The debian/ubuntu people like to do their own thing. > > R, > CB > > -Original Message- >

Re: [Pacemaker] resource removed (stop + delete) but still in cib-status

2013-06-18 Thread Andrew Beekhof
On 18/06/2013, at 8:30 PM, andreas graeper wrote: > hi, > pacemaker is started as plugin from corosync > but when i `service corosync stop` there are still > pacemaker/lrmd > pacemaker/pengine > > is this a problem / error ? quite likely, yes > > thanks > andreas > > > 2013/6/18 andrea

Re: [Pacemaker] Errors compiling PM 1.1.10 RC3

2013-06-18 Thread Andrew Beekhof
On 18/06/2013, at 9:47 PM, Nikita Michalko wrote: > Hi all, > > I tried build/compile the last version of pacemeker from sources > (http://blog.clusterlabs.org/blog/2013/release-candidate-1-dot-1-10-rc3/) > on SLES11/SP2 (kernel 3.0.58-0.6.2-default) with libqb-0.14.4 as follows: > ./configure

Re: [Pacemaker] Pacemaker issues on Amazon EC2

2013-06-17 Thread Andrew Beekhof
On 18/06/2013, at 2:23 PM, Jon Eisenstein wrote: > > On Jun 18, 2013, at 12:12 AM, Andrew Beekhof wrote: > >> >> On 18/06/2013, at 1:46 PM, Jon Eisenstein wrote: >> >>> >>> On Jun 17, 2013, at 11:31 PM, Andrew Beekhof wrote: >>>

Re: [Pacemaker] Pacemaker issues on Amazon EC2

2013-06-17 Thread Andrew Beekhof
On 18/06/2013, at 1:46 PM, Jon Eisenstein wrote: > > On Jun 17, 2013, at 11:31 PM, Andrew Beekhof wrote: > >> >> On 18/06/2013, at 7:19 AM, Jon Eisenstein wrote: >> >>> tl;dr summary: On EC2, we can't reuse IP addresses, and we need a reliable,

Re: [Pacemaker] Weired resource-stickiness behavior

2013-06-17 Thread Andrew Beekhof
uot;ext3" \ > > meta target-role="Started" > > As I assume this resource can only be started on 1 node, I think it should > > be stopped automatically when pacemaker detects it's not in a HA cluster. > > Is this incorrect assumption? >

<    6   7   8   9   10   11   12   13   14   15   >