** Also affects: pacemaker (Ubuntu Xenial)
Importance: Undecided
Status: New
** Also affects: pacemaker (Ubuntu Trusty)
Importance: Undecided
Status: New
** Changed in: pacemaker (Ubuntu)
Status: Confirmed => Fix Released
** Also affects: pacemaker (Ubuntu Bionic)
Importance: Undecided
Status: New
** Changed in: pacemaker (Ubuntu Bionic)
Status: New => Fix Released
** Changed in: pacemaker (Ubuntu Xenial)
Status: New => Confirmed
** Changed in: pacemaker (Ubuntu Trusty)
Status: New => Confirmed
** Changed in: pacemaker (Ubuntu Xenial)
Importance: Undecided => High
** Changed in: pacemaker (Ubuntu Trusty)
Importance: Undecided => Medium
** Changed in: pacemaker (Ubuntu Xenial)
Importance: High => Medium
--
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1439649
Title:
Pacemaker unable to communicate with corosync on restart under lxc
Status in pacemaker package in Ubuntu:
Fix Released
Status in pacemaker source package in Trusty:
Won't Fix
Status in pacemaker source package in Xenial:
Confirmed
Status in pacemaker source package in Bionic:
Fix Released
Bug description:
We've seen this a few times with three node clusters, all running in
LXC containers; pacemaker fails to restart correctly as it can't
communicate with corosync, resulting in a down cluster. Rebooting the
containers resolves the issue, so suspect some sort of bad state
either in corosync or pacemaker.
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
mcp_read_config: Configured corosync to accept connections from group 115:
Library error (2)
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: main:
Starting Pacemaker 1.1.10 (Build: 42f2063): generated-manpages agent-manpages
ncurses libqb-logging libqb-ipc lha-fencing upstart nagios heartbeat
corosync-native snmp libesmtp
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
cluster_connect_quorum: Quorum acquired
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
corosync_node_name: Unable to get node name for nodeid 1000
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
corosync_node_name: Unable to get node name for nodeid 1001
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
corosync_node_name: Unable to get node name for nodeid 1003
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
corosync_node_name: Unable to get node name for nodeid 1001
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
get_node_name: Defaulting to uname -n for the local corosync node name
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
crm_update_peer_state: pcmk_quorum_notification: Node
juju-machine-4-lxc-4[1001] - state is now member (was (null))
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
corosync_node_name: Unable to get node name for nodeid 1003
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
crm_update_peer_state: pcmk_quorum_notification: Node (null)[1003] - state is
now member (was (null))
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice: main: CRM Git
Version: 42f2063
Apr 2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]: notice:
crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr 2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]: notice:
corosync_node_name: Unable to get node name for nodeid 1001
Apr 2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]: notice:
get_node_name: Defaulting to uname -n for the local corosync node name
Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]: notice:
crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr 2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]: [MAIN ] Denied
connection attempt from 109:115
Apr 2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]: [QB ] Invalid IPC
credentials (1033732-1033746).
Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]: error:
cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11
Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]: error: main: HA
Signon failed
Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]: error: main: Aborting
startup
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: error:
pcmk_child_exit: Child process attrd (1033746) exited: Network is down (100)
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: warning:
pcmk_child_exit: Pacemaker child process attrd no longer wishes to be
respawned. Shutting ourselves down.
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
pcmk_shutdown_worker: Shuting down Pacemaker
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
stop_child: Stopping crmd: Sent -15 to process 1033748
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: warning: do_cib_control:
Couldn't complete CIB registration 1 times... pause and retry
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice: crm_shutdown:
Requesting shutdown, upper limit is 1200000ms
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: warning: do_log: FSA:
Input I_SHUTDOWN from crm_shutdown() received in state S_STARTING
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice:
do_state_transition: State transition S_STARTING -> S_STOPPING [
input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
Apr 2 11:41:32 juju-machine-4-lxc-4 cib[1033743]: notice:
crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice:
terminate_cs_connection: Disconnecting from Corosync
Apr 2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]: [MAIN ] Denied
connection attempt from 109:115
Apr 2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]: [QB ] Invalid IPC
credentials (1033732-1033743).
Apr 2 11:41:32 juju-machine-4-lxc-4 cib[1033743]: error:
cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11
Apr 2 11:41:32 juju-machine-4-lxc-4 cib[1033743]: crit: cib_init: Cannot
sign in to the cluster... terminating
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
stop_child: Stopping pengine: Sent -15 to process 1033747
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: error:
pcmk_child_exit: Child process cib (1033743) exited: Network is down (100)
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: warning:
pcmk_child_exit: Pacemaker child process cib no longer wishes to be respawned.
Shutting ourselves down.
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
stop_child: Stopping lrmd: Sent -15 to process 1033745
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
stop_child: Stopping stonith-ng: Sent -15 to process 1033744
Apr 2 11:41:34 juju-machine-4-lxc-4 corosync[1033732]: [TOTEM ] A new
membership (10.245.160.62:284) was formed. Members joined: 1000
Apr 2 11:41:41 juju-machine-4-lxc-4 stonith-ng[1033744]: error:
setup_cib: Could not connect to the CIB service: Transport endpoint is not
connected (-107)
Apr 2 11:41:41 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
pcmk_shutdown_worker: Shutdown complete
Apr 2 11:41:41 juju-machine-4-lxc-4 pacemakerd[1033741]: notice:
pcmk_shutdown_worker: Attempting to inhibit respawning after fatal error
ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: pacemaker 1.1.10+git20130802-1ubuntu2.3
ProcVersionSignature: User Name 3.16.0-33.44~14.04.1-generic 3.16.7-ckt7
Uname: Linux 3.16.0-33-generic x86_64
NonfreeKernelModules: vhost_net vhost macvtap macvlan xt_conntrack ipt_REJECT
ip6table_filter ip6_tables ebtable_nat ebtables veth 8021q garp xt_CHECKSUM mrp
iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables nbd
ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi openvswitch gre vxlan dm_crypt bridge
dm_multipath intel_rapl stp scsi_dh x86_pkg_temp_thermal llc intel_powerclamp
coretemp ioatdma kvm_intel ipmi_si joydev sb_edac kvm hpwdt hpilo dca
ipmi_msghandler acpi_power_meter edac_core lpc_ich shpchp serio_raw mac_hid xfs
libcrc32c btrfs xor raid6_pq hid_generic usbhid hid crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul
glue_helper ablk_helper cryptd psmouse tg3 ptp pata_acpi hpsa pps_core
ApportVersion: 2.14.1-0ubuntu3.7
Architecture: amd64
Date: Thu Apr 2 11:42:18 2015
SourcePackage: pacemaker
UpgradeStatus: No upgrade log present (probably fresh install)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1439649/+subscriptions
_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help : https://help.launchpad.net/ListHelp