Thank you for taking the time to report this bug. In an effort to keep an
up-to-date and valid list of bugs to work on, I have reviewed this report
to verify it still requires effort and occurs on an Ubuntu release in
standard support, and it does not.
Judging by the existing comments, and the pointed upstream discussion thread, it
appears that the wrong lrmd was being set after upgrade and that led existing
cluster to confusion when connecting to the local resource manager daemon.
It is unfortunate that we were unable to resolve this defect, however
there appears to be no further action possible at this time. I am
therefore moving the bug to 'Incomplete'. If you disagree or have
new information, we would be grateful if you could please add a comment
stating why and then change the status of the bug to 'New'.
** Changed in: cluster-glue (Ubuntu)
Status: Confirmed => Incomplete
** Also affects: cluster-glue (Ubuntu Trusty)
Importance: Undecided
Status: New
** Changed in: cluster-glue (Ubuntu Trusty)
Status: New => Incomplete
** Changed in: cluster-glue (Ubuntu)
Status: Incomplete => Fix Released
** Changed in: cluster-glue (Ubuntu)
Assignee: Rafael David Tinoco (rafaeldtinoco) => (unassigned)
--
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to cluster-glue in Ubuntu.
https://bugs.launchpad.net/bugs/1251298
Title:
Failed to sign on to LRMd with Heartbeat/Pacemaker
Status in cluster-glue package in Ubuntu:
Fix Released
Status in cluster-glue source package in Trusty:
Incomplete
Bug description:
I'm running a 2 node heartbeat/pacemaker cluster, which was working fine with
Ubuntu 13.04
After upgrading from Ubuntu 13.04 to Ubuntu 13.10, Heartbeat/Pacemaker keeps
restarting the system due to sign on errors of lrmd and heartbeat tries to
recover.
As one system is already on ubuntu 13.10 and one system still running
13.04, I've tried it without the second node, which leads to the same
behavior, which occurs before any cluster communication happens.
Syslog:
Nov 14 15:53:06 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 1 (30 max) times
Nov 14 15:53:06 wolverine crmd[2464]: notice: crmd_client_status_callback:
Status update: Client wolverine.domain.tld/crmd now has status [join] (DC=false)
Nov 14 15:53:06 wolverine crmd[2464]: notice: crmd_client_status_callback:
Status update: Client wolverine.domain.tld/crmd now has status [online]
(DC=false)
Nov 14 15:53:06 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 2 (30 max) times
Nov 14 15:53:06 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 3 (30 max) times
Nov 14 15:53:07 wolverine stonith-ng[2462]: notice: setup_cib: Watching for
stonith topology changes
Nov 14 15:53:07 wolverine stonith-ng[2462]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Nov 14 15:53:08 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 4 (30 max) times
Nov 14 15:53:10 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 5 (30 max) times
Nov 14 15:53:12 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 6 (30 max) times
Nov 14 15:53:14 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 7 (30 max) times
Nov 14 15:53:16 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 8 (30 max) times
Nov 14 15:53:18 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 9 (30 max) times
Nov 14 15:53:20 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 10 (30 max) times
Nov 14 15:53:22 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 11 (30 max) times
Nov 14 15:53:24 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 12 (30 max) times
Nov 14 15:53:26 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 13 (30 max) times
Nov 14 15:53:28 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 14 (30 max) times
Nov 14 15:53:30 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 15 (30 max) times
Nov 14 15:53:32 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 16 (30 max) times
Nov 14 15:53:34 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 17 (30 max) times
Nov 14 15:53:36 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 18 (30 max) times
Nov 14 15:53:38 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 19 (30 max) times
Nov 14 15:53:40 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 20 (30 max) times
Nov 14 15:53:42 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 21 (30 max) times
Nov 14 15:53:44 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 22 (30 max) times
Nov 14 15:53:46 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 23 (30 max) times
Nov 14 15:53:48 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 24 (30 max) times
Nov 14 15:53:50 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 25 (30 max) times
Nov 14 15:53:52 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 26 (30 max) times
Nov 14 15:53:54 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 27 (30 max) times
Nov 14 15:53:56 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 28 (30 max) times
Nov 14 15:53:58 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 29 (30 max) times
Nov 14 15:54:00 wolverine crmd[2464]: error: do_lrm_control: Failed to
sign on to the LRM 30 (max) times
Nov 14 15:54:00 wolverine crmd[2464]: error: do_log: FSA: Input I_ERROR
from do_lrm_control() received in state S_STARTING
Nov 14 15:54:00 wolverine crmd[2464]: notice: do_state_transition: State
transition S_STARTING -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL
origin=do_lrm_control ]
Nov 14 15:54:00 wolverine crmd[2464]: warning: do_recover: Fast-tracking
shutdown in response to errors
Symlinking lrmd from pacemaker package solved this problem partly:
root@wolverine ~ # mv /usr/lib/heartbeat/lrmd{,.cluster-glue}
root@wolverine ~ # cd /usr/lib/heartbeat/
root@wolverine /usr/lib/heartbeat # ln -s ../pacemaker/lrmd
root@wolverine /usr/lib/heartbeat # ls -la lrmd
lrwxrwxrwx 1 root root 17 Nov 14 16:35 lrmd -> ../pacemaker/lrmd
root@wolverine /usr/lib/heartbeat # ls -la lrmd*
lrwxrwxrwx 1 root root 17 Nov 14 16:35 lrmd -> ../pacemaker/lrmd
-rwxr-xr-x 1 root root 92816 Jul 18 17:55 lrmd.cluster-glue
root@wolverine /usr/lib/heartbeat #
Stopping heartbeat will still result in an unexpected reboot:
Nov 14 16:37:27 wolverine crmd[2259]: notice: process_lrm_event: LRM
operation drbd-backup:1_notify_0 (call=45, rc=0, cib-update=0, confirmed=true)
ok
Nov 14 16:37:28 wolverine crmd[2259]: notice: process_lrm_event: LRM
operation drbd-rsyslog:1_notify_0 (call=48, rc=0, cib-update=0, confirmed=true)
ok
Nov 14 16:37:28 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2673
failed authorization [no default client auth]
Nov 14 16:37:28 wolverine heartbeat: [2238]: ERROR:
api_process_registration_msg: cannot add client(crm_node)
Nov 14 16:37:28 wolverine attrd[2258]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd-backup:1 (10000)
Nov 14 16:37:28 wolverine attrd[2258]: notice: attrd_perform_update: Sent
update 10: master-drbd-backup:1=10000
Nov 14 16:37:28 wolverine crmd[2259]: notice: process_lrm_event: LRM
operation drbd-backup:1_monitor_31000 (call=50, rc=0, cib-update=17,
confirmed=false) ok
Nov 14 16:37:29 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2700
failed authorization [no default client auth]
Nov 14 16:37:29 wolverine heartbeat: [2238]: ERROR:
api_process_registration_msg: cannot add client(crm_node)
Nov 14 16:37:29 wolverine attrd[2258]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd-rsyslog:1 (10000)
Nov 14 16:37:29 wolverine attrd[2258]: notice: attrd_perform_update: Sent
update 13: master-drbd-rsyslog:1=10000
Nov 14 16:37:29 wolverine crmd[2259]: notice: process_lrm_event: LRM
operation drbd-rsyslog:1_monitor_31000 (call=54, rc=0, cib-update=18,
confirmed=false) ok
Nov 14 16:37:59 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2812
failed authorization [no default client auth]
Nov 14 16:37:59 wolverine heartbeat: [2238]: ERROR:
api_process_registration_msg: cannot add client(crm_node)
Nov 14 16:38:00 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2839
failed authorization [no default client auth]
Nov 14 16:38:00 wolverine heartbeat: [2238]: ERROR:
api_process_registration_msg: cannot add client(crm_node)
Nov 14 16:38:05 wolverine heartbeat: [2238]: info: killing
/usr/lib/heartbeat/crmd process group 2259 with signal 15
Nov 14 16:38:05 wolverine crmd[2259]: notice: crm_shutdown: Requesting
shutdown, upper limit is 1200000ms
Nov 14 16:38:05 wolverine attrd[2258]: notice: attrd_trigger_update:
Sending flush op to all hosts for: shutdown (1384443485)
Nov 14 16:38:05 wolverine attrd[2258]: notice: attrd_perform_update: Sent
update 16: shutdown=1384443485
Nov 14 16:38:06 wolverine crmd[2259]: notice: process_lrm_event: LRM
operation drbd-backup:1_notify_0 (call=57, rc=0, cib-update=0, confirmed=true)
ok
Nov 14 16:38:06 wolverine crmd[2259]: notice: process_lrm_event: LRM
operation drbd-rsyslog:1_notify_0 (call=59, rc=0, cib-update=0, confirmed=true)
ok
Nov 14 16:38:07 wolverine kernel: [ 255.385984] d-con backup: Requested
state change failed by peer: Refusing to be Primary while peer is not outdated
(-7)
Nov 14 16:38:07 wolverine kernel: [ 255.386415] d-con backup: peer( Primary
-> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated )
pdsk( UpToDate -> DUnknown )
Nov 14 16:38:07 wolverine kernel: [ 255.386428] d-con backup: asender
terminated
Nov 14 16:38:07 wolverine kernel: [ 255.386438] d-con backup: Terminating
drbd_a_backup
Nov 14 16:38:07 wolverine kernel: [ 255.386693] d-con backup: Connection
closed
Nov 14 16:38:07 wolverine kernel: [ 255.386716] d-con backup: conn(
Disconnecting -> StandAlone )
Nov 14 16:38:07 wolverine kernel: [ 255.386718] d-con backup: receiver
terminated
Nov 14 16:38:07 wolverine kernel: [ 255.386722] d-con backup: Terminating
drbd_r_backup
Nov 14 16:38:07 wolverine kernel: [ 255.386750] block drbd0: disk( Outdated
-> Failed )
Nov 14 16:38:07 wolverine kernel: [ 255.409861] block drbd0: bitmap WRITE of
0 pages took 0 jiffies
Nov 14 16:38:07 wolverine kernel: [ 255.409930] block drbd0: 0 KB (0 bits)
marked out-of-sync by on disk bit-map.
Nov 14 16:38:07 wolverine kernel: [ 255.409943] block drbd0: disk( Failed ->
Diskless )
Nov 14 16:38:07 wolverine kernel: [ 255.410041] block drbd0: drbd_bm_resize
called with capacity == 0
Nov 14 16:38:07 wolverine kernel: [ 255.411773] d-con backup: Terminating
drbd_w_backup
Nov 14 16:38:07 wolverine kernel: [ 255.466428] d-con rsyslog: Requested
state change failed by peer: Refusing to be Primary while peer is not outdated
(-7)
Nov 14 16:38:07 wolverine kernel: [ 255.466796] d-con rsyslog: peer( Primary
-> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated )
pdsk( UpToDate -> DUnknown )
Nov 14 16:38:07 wolverine kernel: [ 255.466814] d-con rsyslog: asender
terminated
Nov 14 16:38:07 wolverine kernel: [ 255.466832] d-con rsyslog: Terminating
drbd_a_rsyslog
Nov 14 16:38:07 wolverine kernel: [ 255.467098] d-con rsyslog: Connection
closed
Nov 14 16:38:07 wolverine kernel: [ 255.467121] d-con rsyslog: conn(
Disconnecting -> StandAlone )
Nov 14 16:38:07 wolverine kernel: [ 255.467123] d-con rsyslog: receiver
terminated
Nov 14 16:38:07 wolverine kernel: [ 255.467128] d-con rsyslog: Terminating
drbd_r_rsyslog
Nov 14 16:38:07 wolverine kernel: [ 255.467169] block drbd1: disk( Outdated
-> Failed )
Nov 14 16:38:07 wolverine kernel: [ 255.481716] block drbd1: bitmap WRITE of
0 pages took 0 jiffies
Nov 14 16:38:07 wolverine kernel: [ 255.481778] block drbd1: 0 KB (0 bits)
marked out-of-sync by on disk bit-map.
Nov 14 16:38:07 wolverine kernel: [ 255.481791] block drbd1: disk( Failed ->
Diskless )
Nov 14 16:38:07 wolverine kernel: [ 255.481881] block drbd1: drbd_bm_resize
called with capacity == 0
Nov 14 16:38:07 wolverine kernel: [ 255.482011] d-con rsyslog: Terminating
drbd_w_rsyslog
Nov 14 16:38:07 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2986
failed authorization [no default client auth]
Nov 14 16:38:07 wolverine heartbeat: [2238]: ERROR:
api_process_registration_msg: cannot add client(crm_node)
Nov 14 16:38:07 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2989
failed authorization [no default client auth]
Nov 14 16:38:07 wolverine heartbeat: [2238]: ERROR:
api_process_registration_msg: cannot add client(crm_node)
Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd-backup:1 (<null>)
Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_perform_update: Sent
delete 18: node=19f64c15-2545-4b18-8d1a-39d9c3a88a56,
attr=master-drbd-backup:1, id=<n/a>, set=(null), section=status
Nov 14 16:38:07 wolverine crmd[2259]: notice: process_lrm_event: LRM
operation drbd-backup:1_stop_0 (call=64, rc=0, cib-update=19, confirmed=true) ok
Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd-rsyslog:1 (<null>)
Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_perform_update: Sent
delete 20: node=19f64c15-2545-4b18-8d1a-39d9c3a88a56,
attr=master-drbd-rsyslog:1, id=<n/a>, set=(null), section=status
Nov 14 16:38:07 wolverine crmd[2259]: notice: process_lrm_event: LRM
operation drbd-rsyslog:1_stop_0 (call=67, rc=0, cib-update=20, confirmed=true)
ok
Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_perform_update: Sent
delete 22: node=19f64c15-2545-4b18-8d1a-39d9c3a88a56,
attr=master-drbd-backup:1, id=<n/a>, set=(null), section=status
Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_perform_update: Sent
delete 24: node=19f64c15-2545-4b18-8d1a-39d9c3a88a56,
attr=master-drbd-rsyslog:1, id=<n/a>, set=(null), section=status
Nov 14 16:38:08 wolverine crmd[2259]: notice: do_state_transition: State
transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE
origin=route_message ]
Nov 14 16:38:08 wolverine crmd[2259]: notice: lrm_state_verify_stopped:
Stopped 0 recurring operations at (null) (1274719202 ops remaining)
Nov 14 16:38:08 wolverine crmd[2259]: notice: do_lrm_control: Disconnected
from the LRM
Nov 14 16:38:08 wolverine ccm: [2254]: info: client (pid=2259) removed from
ccm
Nov 14 16:38:08 wolverine heartbeat: [2238]: EMERG: Rebooting system.
Reason: /usr/lib/heartbeat/crmd
root@wolverine ~ # lsb_release -rd
Description: Ubuntu 13.10
Release: 13.10
root@wolverine ~ # apt-cache policy cluster-glue
cluster-glue:
Installed: 1.0.11+hg2754-1.1
Candidate: 1.0.11+hg2754-1.1
Version table:
*** 1.0.11+hg2754-1.1 0
500 http://de.archive.ubuntu.com/ubuntu/ saucy/main amd64 Packages
100 /var/lib/dpkg/status
root@wolverine ~ #
root@wolverine ~ # apt-cache policy heartbeat
heartbeat:
Installed: 1:3.0.5-3.1ubuntu1
Candidate: 1:3.0.5-3.1ubuntu1
Version table:
*** 1:3.0.5-3.1ubuntu1 0
500 http://de.archive.ubuntu.com/ubuntu/ saucy/main amd64 Packages
100 /var/lib/dpkg/status
root@wolverine ~ #
root@wolverine ~ # apt-cache policy pacemaker
pacemaker:
Installed: 1.1.10+git20130802-1ubuntu1
Candidate: 1.1.10+git20130802-1ubuntu1
Version table:
*** 1.1.10+git20130802-1ubuntu1 0
500 http://de.archive.ubuntu.com/ubuntu/ saucy/main amd64 Packages
100 /var/lib/dpkg/status
root@wolverine ~ #
Expected:
- Working heartbeat/pacemaker setup after ubuntu upgrade
What happend:
- System reboots after about one minute due to heartbeat recovery tries
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cluster-glue/+bug/1251298/+subscriptions
_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help : https://help.launchpad.net/ListHelp