Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-08 Thread Andrew Beekhof
On 08/08/2013, at 3:49 PM, Thomas Glanzmann tho...@glanzmann.de wrote: Hello Andrew, It really helps to read the output of the commands you're running: Did you not see these messages the first time? apache-03: WARN: Unknown cluster type: any apache-03: ERROR: Could not determine the

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-07 Thread Thomas Glanzmann
Hello Andrew, I can try and fix that if you re-run with -x and paste the output. (apache-03) [~] crm_report -l /var/adm/syslog/2013/08/05 -f 2013-08-04 18:30:00 -t 2013-08-04 19:15 -x + shift + true + [ ! -z ] + break + [ x != x ] + [ x1375633800 != x ] + masterlog= + [ -z ] + log WARNING:

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-07 Thread Andrew Beekhof
On 07/08/2013, at 5:42 PM, Thomas Glanzmann tho...@glanzmann.de wrote: Hello Andrew, I can try and fix that if you re-run with -x and paste the output. (apache-03) [~] crm_report -l /var/adm/syslog/2013/08/05 -f 2013-08-04 18:30:00 -t 2013-08-04 19:15 -x + shift + true + [ ! -z ] +

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-07 Thread Thomas Glanzmann
Hello Andrew, It really helps to read the output of the commands you're running: Did you not see these messages the first time? apache-03: WARN: Unknown cluster type: any apache-03: ERROR: Could not determine the location of your cluster logs, try specifying --logfile /some/path

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-06 Thread Andrew Beekhof
On 06/08/2013, at 2:29 AM, Thomas Glanzmann tho...@glanzmann.de wrote: Hello Andrew, You will need to run crm_report and email us the resulting tarball. This will include the version of the software you're running and log files (both system and cluster) - without which we can't do

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-05 Thread Thomas Glanzmann
Hello Andrew, You will need to run crm_report and email us the resulting tarball. This will include the version of the software you're running and log files (both system and cluster) - without which we can't do anything. Find the files here: I manually packaged it because crm_report output

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-04 Thread Thomas Glanzmann
Hello Andrew, I just got another crash when putting a node into unmanaged node, this time it hit me hard: - Both nodes sucided or snothined each other - One out of four md devices where detected on both nodes after reset. - Half of the config was gone. Could you

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-04 Thread Andrew Beekhof
On 05/08/2013, at 3:11 AM, Thomas Glanzmann tho...@glanzmann.de wrote: Hello Andrew, I just got another crash when putting a node into unmanaged node, this time it hit me hard: - Both nodes sucided or snothined each other - One out of four md devices where detected on both

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-06-07 Thread Thomas Glanzmann
Hello Andrew, Jun 6 10:17:37 astorage1 crmd: [2947]: ERROR: crm_abort: abort_transition_graph: Triggered assert at te_utils.c:339 : transition_graph != NULL This is the cause of the coredump. What version of pacemaker is this? 1.1.7-1 Installing pacemaker's debug symbols would also

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-06-07 Thread Thomas Glanzmann
Hello Andrew, Installing pacemaker's debug symbols would also make the stack trace more useful. we tried to install heartbeat-dev to see more, but there are no debugging symbols available. Also I tried to reproduce the issue with a 64 bit Debian Wheezy as I used 32 bit before, I was not able

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-06-07 Thread Ferenc Wagner
Thomas Glanzmann tho...@glanzmann.de writes: Installing pacemaker's debug symbols would also make the stack trace more useful. we tried to install heartbeat-dev to see more, but there are no debugging symbols available. You'd probably need the pacemaker-dbg package, which is not present for

[Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-06-06 Thread Thomas Glanzmann
Hello, over the last couple of days, I setup an active passive nfs server and iSCSI storage using drbd, pacemaker, heartbeat, lio and nfs kernel server. While testing cluster I was often setting it to unmanaged using: crm configure property maintenance-mode=true Sometimes when I did that, both

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-06-06 Thread Andrew Beekhof
On 06/06/2013, at 7:11 PM, Thomas Glanzmann tho...@glanzmann.de wrote: Jun 6 10:17:37 astorage1 crmd: [2947]: ERROR: crm_abort: abort_transition_graph: Triggered assert at te_utils.c:339 : transition_graph != NULL This is the cause of the coredump. What version of pacemaker is this?