Re: [Linux-HA] resource agent iSCSILogicalUnit failing unexpectedly on occasion

2013-08-05 Thread Mark Nipper
On 05 Aug 2013, Sven Arnold wrote: > > SCSI SN: (stdin)= > > which is caused by a version mismatch between your iSCSILogicalUnit > resource agent and openssl. The upstream fix is here: Interesting. Thanks for the pointer. > To your actual problem I can not offer too much he

Re: [Linux-HA] resource agent iSCSILogicalUnit failing unexpectedly on occasion

2013-08-05 Thread Sven Arnold
Hi Mark, I don't think this is the source of your problem, nor do I know if it causes any problem to you at all, but... I noticed, that your SCSI Serial Number is always: > SCSI SN: (stdin)= which is caused by a version mismatch between your iSCSILogicalUnit resource agent and

[Linux-HA] resource agent iSCSILogicalUnit failing unexpectedly on occasion

2013-08-05 Thread Mark Nipper
One of our DRBD clusters has 47 LUN's being published. We're using RHEL 6.4. Here are the various package versions being used: --- pacemaker-1.1.7-6.el6.x86_64 corosync-1.4.1-7.el6.x86_64 resource-agents-3.9.2-12.el6.x86_64 scsi-target-utils-1.0.24-2.el6.x86_64 Somewhere after 40

Re: [Linux-HA] Antw: Re: pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-05 Thread Thomas Glanzmann
Hello Ulrich, > Did it happen when you put the cluster into maintenance-mode, or did > it happen after someone fiddled with the resources manually? Or did it > happen when you turned maintenance-mode off again? I did not remember, but checked the log files, and yes I did a config change (I remove

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-05 Thread Thomas Glanzmann
Hello Andrew, > You will need to run crm_report and email us the resulting tarball. > This will include the version of the software you're running and log > files (both system and cluster) - without which we can't do anything. Find the files here: I manually packaged it because crm_report output

[Linux-HA] MailTo resources, 'message too long' errors

2013-08-05 Thread Marcus Bointon
I have two nodes running heartbeat 3.0.5 and pacemaker 1.1.6 (both from the linux-ha lucid ppa). They are running 11 groups each comprising an ocf:heartbeat:IPaddr2, an ocf:heartbeat:SendArp and an ocf:heartbeat:MailTo. There is also a mailto resource configured for the overall cluster. Despite

Re: [Linux-HA] Wheezy / heartbeat / pacemaker: Howto make persistent configuration changes

2013-08-05 Thread Thomas Glanzmann
Hello Andrew, > did they ensure everything was flushed to disk first? (apache-03) [/var] cat /proc/sys/vm/dirty_expire_centisecs 3000 So dirty data should be flushed within 3 seconds. But I lost at least 24 hours maybe even more. So it seems that pacemaker / heartbeat does not do persistant cha

Re: [Linux-HA] Wheezy / heartbeat / pacemaker: Howto make persistent configuration changes

2013-08-05 Thread Andrew Beekhof
On 05/08/2013, at 5:20 PM, Thomas Glanzmann wrote: > Hello Andrew, > >> Any change to the configuration section is automatically written to >> disk. The cluster only stops doing this if writing to disk fails at >> some point - but there would have been an error in your logs if that >> were the

Re: [Linux-HA] Wheezy / heartbeat / pacemaker: Howto make persistent configuration changes

2013-08-05 Thread Thomas Glanzmann
Hello Andrew, > Any change to the configuration section is automatically written to > disk. The cluster only stops doing this if writing to disk fails at > some point - but there would have been an error in your logs if that > were the case. than I do not get it. Yesterday, when the nodes sucide

[Linux-HA] Antw: Re: pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-05 Thread Ulrich Windl
>>> Thomas Glanzmann schrieb am 04.08.2013 um 19:11 in Nachricht <20130804171121.ga17...@glanzmann.de>: > Hello Andrew, > I just got another crash when putting a node into unmanaged node, this > time it hit me hard: Hi! Did it happen when you put the cluster into maintenance-mode, or did it happ