Re: [Pacemaker] Stonith question

2013-11-10 Thread Andrew Beekhof
On 9 Nov 2013, at 1:55 am, s.oreilly wrote: > Hi Chrissie, thanks I did try that and it didn't work, but then, neither has > adding the location constraints so maybe (and this is very possible) I am > doing > something else wrong!! Quite probably. But we cant say for sure without logs. > > S

Re: [Pacemaker] crm_mon segment fault con fedora 20

2013-11-10 Thread Andrew Beekhof
On 9 Nov 2013, at 8:56 am, emmanuel segura wrote: > Hello Andrew, > > You can the file in the attachment. It would be very useful to know what is NULL at: 1196node_name = g_strdup_printf("%s:%s", node->details->uname, node->details->remote_rsc->container->id); ie. p *node

Re: [Pacemaker] Remove a "ghost" node

2013-11-10 Thread Andrew Beekhof
On 8 Nov 2013, at 12:59 pm, Sean Lutner wrote: > > On Nov 7, 2013, at 8:34 PM, Andrew Beekhof wrote: > >> >> On 8 Nov 2013, at 4:45 am, Sean Lutner wrote: >> >>> I have a confusing situation that I'm hoping to get help with. Last night >>> after configuring STONITH on my two node cluster,

Re: [Pacemaker] DRBD promotion timeout after pacemaker stop on other node

2013-11-10 Thread Andrew Beekhof
On 5 Nov 2013, at 2:22 am, Vladislav Bogdanov wrote: > Hi Andrew, David, all, > > Just found interesting fact, don't know is it a bug or not. > > When doing service pacemaker stop on a node which has drbd resource > promoted, that resource does not promote on another node, and promote > operat

Re: [Pacemaker] pacemaker - ClusterMon

2013-11-10 Thread Andrew Beekhof
On 18 Oct 2013, at 7:49 am, Denise Cosso wrote: > Hello, > > >I configured in Pacemaker ClusterMon but not receive email. Already tested > the email from the command line and working. > >I think it's the version of crm_mon > >Could anyone help me?? You have an MTA configured on

Re: [Pacemaker] why pacemaker does not control the resources

2013-11-10 Thread Andrew Beekhof
On 8 Nov 2013, at 7:49 am, Andrey Groshev wrote: > Hi, PPL! > I need help. I do not understand... Why has stopped working. > This configuration work on other cluster, but on corosync1. > > So... cluster postgres with master/slave. > Classic config as in wiki. > I build cluster, start, he is wor

Re: [Pacemaker] What value should be in the $OCF_RESKEY_CRM_meta_notify_slave_uname when a quorum is lost?

2013-11-10 Thread Andrew Beekhof
On 6 Nov 2013, at 12:42 am, Andrey Groshev wrote: > Hi All! > I am interested in this subject, because happens is the following situation. > I build cluster on four nodes with postgres master/slave configuration. > Set quorum-policy=stop > Run the cluster and conducted an experiment - turned off

Re: [Pacemaker] Pacemaker-corosync update attribute issue

2013-11-10 Thread Andrew Beekhof
On 22 Oct 2013, at 3:43 am, A66A wrote: > Hello, > I have a problem with my 2-node cluster. In some reasons one of my nodes > can't update attributes due to error - warning: attrd_cib_callback: > Update PostgreSQL-status=HS:async failed: Transport endpoint is not > connected. Where can be

Re: [Pacemaker] Remove a "ghost" node

2013-11-10 Thread Sean Lutner
On Nov 10, 2013, at 6:27 PM, Andrew Beekhof wrote: > > On 8 Nov 2013, at 12:59 pm, Sean Lutner wrote: > >> >> On Nov 7, 2013, at 8:34 PM, Andrew Beekhof wrote: >> >>> >>> On 8 Nov 2013, at 4:45 am, Sean Lutner wrote: >>> I have a confusing situation that I'm hoping to get help with

Re: [Pacemaker] Monitoring on master node not running after standby is connected

2013-11-10 Thread Andrew Beekhof
On 16 Oct 2013, at 12:21 am, Juraj Fabo wrote: > Juraj Fabo writes: >> >> Hello Andrew >> >> >> thank you for the response. >> >> I've patched crmd, cleaned the cluster, done the scenario steps and > created crm_report which is attached. >> >> After loading the cluster configuration both

Re: [Pacemaker] Remove a "ghost" node

2013-11-10 Thread Andrew Beekhof
On 11 Nov 2013, at 11:44 am, Sean Lutner wrote: > > On Nov 10, 2013, at 6:27 PM, Andrew Beekhof wrote: > >> >> On 8 Nov 2013, at 12:59 pm, Sean Lutner wrote: >> >>> >>> On Nov 7, 2013, at 8:34 PM, Andrew Beekhof wrote: >>> On 8 Nov 2013, at 4:45 am, Sean Lutner wrote:

Re: [Pacemaker] Remove a "ghost" node

2013-11-10 Thread Sean Lutner
On Nov 10, 2013, at 7:54 PM, Andrew Beekhof wrote: > > On 11 Nov 2013, at 11:44 am, Sean Lutner wrote: > >> >> On Nov 10, 2013, at 6:27 PM, Andrew Beekhof wrote: >> >>> >>> On 8 Nov 2013, at 12:59 pm, Sean Lutner wrote: >>> On Nov 7, 2013, at 8:34 PM, Andrew Beekhof wrote: >

Re: [Pacemaker] Remove a "ghost" node

2013-11-10 Thread Andrew Beekhof
On 11 Nov 2013, at 12:03 pm, Sean Lutner wrote: > > On Nov 10, 2013, at 7:54 PM, Andrew Beekhof wrote: > >> >> On 11 Nov 2013, at 11:44 am, Sean Lutner wrote: >> >>> >>> On Nov 10, 2013, at 6:27 PM, Andrew Beekhof wrote: >>> On 8 Nov 2013, at 12:59 pm, Sean Lutner wrote: >>>

Re: [Pacemaker] DRBD promotion timeout after pacemaker stop on other node

2013-11-10 Thread Vladislav Bogdanov
11.11.2013 02:30, Andrew Beekhof wrote: > > On 5 Nov 2013, at 2:22 am, Vladislav Bogdanov wrote: > >> Hi Andrew, David, all, >> >> Just found interesting fact, don't know is it a bug or not. >> >> When doing service pacemaker stop on a node which has drbd resource >> promoted, that resource does

Re: [Pacemaker] DRBD promotion timeout after pacemaker stop on other node

2013-11-10 Thread Vladislav Bogdanov
11.11.2013 06:32, Vladislav Bogdanov wrote: > 11.11.2013 02:30, Andrew Beekhof wrote: >> >> On 5 Nov 2013, at 2:22 am, Vladislav Bogdanov wrote: >> >>> Hi Andrew, David, all, >>> >>> Just found interesting fact, don't know is it a bug or not. >>> >>> When doing service pacemaker stop on a node whi

Re: [Pacemaker] The larger cluster is tested.

2013-11-10 Thread yusuke iida
Hi, Andrew I tested by the following versions. https://github.com/yuusuke/pacemaker/commit/3b90af1b11a4389f8b4a95a20ef12b8c259e73dc However, the problem has not been solved yet. I do not think that this problem can cope with it by batch-limit. Execution of a job is interrupted by batch-limit tem

Re: [Pacemaker] Simple installation Pacemaker + CMAN + fence-agents

2013-11-10 Thread Jan Friesse
Andrew Beekhof napsal(a): > Something seems very wrong with this at the corosync level. > Even fenced and the dlm are having issues. > > Jan: Could this be firewall related? Yes. This can be ether firewall on mcast issue. I would recommend to turn off firewall completely (for testing). If this do