Hi all, I just upgrade lucid box to latest set of packages and now my nodes fails to join back together... Most probably a bug got introduced somewhere...
Here is my package list: ii cluster-glue 1.0.8-2ubuntu0~ppa1 The reusable cluster components for Linux HA ii corosync 1.4.2-1ubuntu0~ppa1 Standards-based cluster framework (daemon an ii dlm-pcmk 3.0.12-2ubuntu6~ppa4 Red Hat cluster suite - DLM pacemaker module ii gfs-pcmk 3.0.12-2ubuntu6~ppa4 Red Hat cluster suite - GFS pacemaker module ii gfs2-tools 3.0.12-2ubuntu6~ppa4 Red Hat cluster suite - global file system 2 ii libcib1 1.1.6-2ubuntu0~ppa2 Pacemaker libraries - CIB ii libcluster-glue 1.0.8-2ubuntu0~ppa1 Reusable cluster libraries (transitional pac ii libcorosync4 1.4.2-1ubuntu0~ppa1 Standards-based cluster framework (libraries ii libcrmcluster1 1.1.6-2ubuntu0~ppa2 Pacemaker libraries - CRM ii libcrmcommon2 1.1.6-2ubuntu0~ppa2 Pacemaker libraries - common CRM ii libopenais3 1.1.4-3ubuntu0~ppa1 Standards-based cluster framework (transitio ii libpe-rules2 1.1.6-2ubuntu0~ppa2 Pacemaker libraries - rules for P-Engine ii libpe-status3 1.1.6-2ubuntu0~ppa2 Pacemaker libraries - status for P-Engine ii libpengine3 1.1.6-2ubuntu0~ppa2 Pacemaker libraries - P-Engine ii libstonithd1 1.1.6-2ubuntu0~ppa2 Pacemaker libraries - stonith ii libtransitioner1 1.1.6-2ubuntu0~ppa2 Pacemaker libraries - transitioner ii ocfs2-tools 1.6.3-2ubuntu3~ppa3 tools for managing OCFS2 cluster filesystems ii ocfs2-tools-pacemaker 1.6.3-2ubuntu3~ppa3 tools for managing OCFS2 cluster filesystems ii ocfs2console 1.6.3-2ubuntu3~ppa3 tools for managing OCFS2 cluster filesystems ii openais 1.1.4-3ubuntu0~ppa1 Standards-based cluster framework (daemon an ii pacemaker 1.1.6-2ubuntu0~ppa2 HA cluster resource manager ii resource-agents 1:3.9.2-4ubuntu0~ppa2 Cluster Resource Agents Mainly have theses erro / warning messages: Oct 25 19:44:40 logan crmd: [6699]: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms) Oct 25 19:44:40 logan crmd: [6699]: WARN: lrm_signon: can not initiate connection Oct 25 19:44:40 logan crmd: [6699]: WARN: do_lrm_control: Failed to sign on to the LRM 8 (30 max) times Oct 25 19:44:42 logan crmd: [6699]: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms) Oct 25 19:44:42 logan crmd: [6699]: WARN: lrm_signon: can not initiate connection Oct 25 19:44:42 logan crmd: [6699]: WARN: do_lrm_control: Failed to sign on to the LRM 9 (30 max) times Oct 25 19:44:44 logan crmd: [6699]: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms) Oct 25 19:44:44 logan crmd: [6699]: WARN: lrm_signon: can not initiate connection Oct 25 19:44:44 logan crmd: [6699]: WARN: do_lrm_control: Failed to sign on to the LRM 10 (30 max) times Oct 25 19:44:46 logan crmd: [6699]: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms) Oct 25 19:44:46 logan crmd: [6699]: WARN: lrm_signon: can not initiate connection Oct 25 19:44:46 logan crmd: [6699]: WARN: do_lrm_control: Failed to sign on to the LRM 11 (30 max) times Oct 25 19:44:48 logan crmd: [6699]: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms) Oct 25 19:44:48 logan crmd: [6699]: WARN: lrm_signon: can not initiate connection Oct 25 19:44:48 logan crmd: [6699]: WARN: do_lrm_control: Failed to sign on to the LRM 12 (30 max) times Oct 25 19:44:50 logan crmd: [6699]: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms) Oct 25 19:44:50 logan crmd: [6699]: WARN: lrm_signon: can not initiate connection Oct 25 19:44:50 logan crmd: [6699]: WARN: do_lrm_control: Failed to sign on to the LRM 13 (30 max) times Here is more info: Oct 25 19:45:35 logan crmd: [7034]: info: ais_status_callback: status: logan is now member (was unknown) Oct 25 19:45:35 logan crmd: [7034]: info: crm_update_peer: Node logan: id=22063296 state=member (new) addr=r(0) ip(192.168.80.1) (new) votes=1 (new) born=0 seen=1032 proc=00000000000000000000000000111312 (new) Oct 25 19:45:35 logan crmd: [7041]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:35 logan crmd: [7034]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:35 logan crmd: [7034]: info: do_started: The local CRM is operational Oct 25 19:45:35 logan crmd: [7034]: info: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ] Oct 25 19:45:35 logan snmpd[1007]: Connection from UDP: [127.0.0.1]:51622->[127.0.0.1] Oct 25 19:45:35 logan snmpd[1007]: Connection from UDP: [127.0.0.1]:51622->[127.0.0.1] Oct 25 19:45:35 logan snmpd[1007]: Connection from UDP: [127.0.0.1]:48028->[127.0.0.1] Oct 25 19:45:35 logan snmpd[1007]: Connection from UDP: [127.0.0.1]:37822->[127.0.0.1] Oct 25 19:45:36 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:45:36 logan pengine: [7051]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:45:36 logan crmd: [7034]: info: te_connect_stonith: Attempting connection to fencing daemon... Oct 25 19:45:37 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7051, rc=0) Oct 25 19:45:37 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:37 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:45:37 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7055 for process pengine Oct 25 19:45:37 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:37 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:37 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:37 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 2 children Oct 25 19:45:37 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:37 logan pengine: [7055]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:45:37 logan crmd: [7034]: info: te_connect_stonith: Connected Oct 25 19:45:37 logan crmd: [7034]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:38 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:45:38 logan pengine: [7055]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:45:39 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7055, rc=0) Oct 25 19:45:39 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:39 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:45:39 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7057 for process pengine Oct 25 19:45:39 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:39 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:39 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:39 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 2 children Oct 25 19:45:39 logan crmd: [7034]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:39 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:39 logan pengine: [7057]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:45:40 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:45:40 logan pengine: [7057]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:45:40 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7057, rc=0) Oct 25 19:45:40 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:40 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:45:40 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7058 for process pengine Oct 25 19:45:40 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:40 logan pengine: [7058]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:45:42 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:45:42 logan pengine: [7058]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:45:42 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7058, rc=0) Oct 25 19:45:42 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:42 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:45:42 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7067 for process pengine Oct 25 19:45:42 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:42 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:42 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:42 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 2 children Oct 25 19:45:42 logan crmd: [7034]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:42 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:42 logan pengine: [7067]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:45:43 logan snmpd[1007]: error on subcontainer 'ia_addr' insert (-1) Oct 25 19:45:43 logan snmpd[1007]: error on subcontainer 'ia_addr' insert (-1) Oct 25 19:45:44 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:45:44 logan pengine: [7067]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:45:44 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7067, rc=0) Oct 25 19:45:44 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:44 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:45:44 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7068 for process pengine Oct 25 19:45:44 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:44 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:44 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:44 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 2 children Oct 25 19:45:44 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:44 logan crmd: [7034]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:44 logan pengine: [7068]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:45:46 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:45:46 logan pengine: [7068]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:45:46 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7068, rc=0) Oct 25 19:45:46 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:46 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:45:46 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7070 for process pengine Oct 25 19:45:46 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:46 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:46 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:46 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 2 children Oct 25 19:45:46 logan crmd: [7034]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:46 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:46 logan pengine: [7070]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:45:48 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:45:48 logan pengine: [7070]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:45:48 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7070, rc=0) Oct 25 19:45:48 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:48 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:45:48 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7071 for process pengine Oct 25 19:45:48 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:48 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:48 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:48 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 2 children Oct 25 19:45:48 logan crmd: [7034]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:48 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:48 logan pengine: [7071]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:45:50 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:45:50 logan pengine: [7071]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:45:50 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7071, rc=0) Oct 25 19:45:50 logan corosync[7022]: [pcmk ] info: update_member: 0x88ea158 Node 22063296 ((null)) born on: 1032 Oct 25 19:45:50 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:50 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:45:50 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7072 for process pengine Oct 25 19:45:50 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:50 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:50 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:50 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 2 children Oct 25 19:45:50 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:50 logan crmd: [7034]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:50 logan pengine: [7072]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:45:50 logan snmpd[1007]: Connection from UDP: [127.0.0.1]:46265->[127.0.0.1] Oct 25 19:45:50 logan snmpd[1007]: Connection from UDP: [127.0.0.1]:46265->[127.0.0.1] Oct 25 19:45:50 logan snmpd[1007]: Connection from UDP: [127.0.0.1]:41940->[127.0.0.1] Oct 25 19:45:50 logan snmpd[1007]: Connection from UDP: [127.0.0.1]:44758->[127.0.0.1] Oct 25 19:45:52 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:45:52 logan pengine: [7072]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:45:52 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7072, rc=0) Oct 25 19:45:52 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:52 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:45:52 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7074 for process pengine Oct 25 19:45:52 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:52 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:52 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:52 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 2 children Oct 25 19:45:52 logan crmd: [7034]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:52 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:52 logan pengine: [7074]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:45:54 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:45:54 logan pengine: [7074]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:45:54 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7074, rc=0) Oct 25 19:45:54 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:54 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:45:54 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7076 for process pengine Oct 25 19:45:54 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:54 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:54 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:54 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 2 children Oct 25 19:45:54 logan crmd: [7034]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:54 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:54 logan pengine: [7076]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:45:56 logan crmd: [7034]: info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped (20000ms) Oct 25 19:45:56 logan crmd: [7034]: WARN: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING Oct 25 19:45:56 logan crmd: [7034]: info: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ] Oct 25 19:45:56 logan crmd: [7034]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] Oct 25 19:45:56 logan crmd: [7034]: info: do_te_control: Registering TE UUID: 8c5f87f1-1f7c-47ba-8a85-0f54ea47c024 Oct 25 19:45:56 logan crmd: [7034]: info: set_graph_functions: Setting custom graph functions Oct 25 19:45:56 logan crmd: [7034]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses Oct 25 19:45:56 logan crmd: [7034]: info: do_dc_takeover: Taking over DC status for this partition Oct 25 19:45:56 logan cib: [7030]: info: cib_process_readwrite: We are now in R/W mode Oct 25 19:45:56 logan cib: [7030]: info: cib_process_request: Operation complete: op cib_master for section 'all' (origin=local/crmd/5, version=0.3002.1): ok (rc=0) Oct 25 19:45:56 logan cib: [7030]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/6, version=0.3002.2): ok (rc=0) Oct 25 19:45:56 logan cib: [7030]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/8, version=0.3002.3): ok (rc=0) Oct 25 19:45:56 logan crmd: [7034]: info: join_make_offer: Making join offers based on membership 1032 Oct 25 19:45:56 logan crmd: [7034]: info: do_dc_join_offer_all: join-1: Waiting on 1 outstanding join acks Oct 25 19:45:56 logan crmd: [7034]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:56 logan cib: [7030]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/10, version=0.3002.4): ok (rc=0) Oct 25 19:45:56 logan crmd: [7034]: info: crmd_ais_dispatch: Setting expected votes to 2 Oct 25 19:45:56 logan crmd: [7034]: info: update_dc: Set DC to logan (3.0.5) Oct 25 19:45:56 logan crmd: [7034]: info: config_query_callback: Shutdown escalation occurs after: 1200000ms Oct 25 19:45:56 logan crmd: [7034]: info: config_query_callback: Checking for expired actions every 900000ms Oct 25 19:45:56 logan crmd: [7034]: info: config_query_callback: Sending expected-votes=2 to corosync Oct 25 19:45:56 logan crmd: [7034]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:56 logan cib: [7030]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/13, version=0.3002.5): ok (rc=0) Oct 25 19:45:56 logan crmd: [7034]: info: crmd_ais_dispatch: Setting expected votes to 2 Oct 25 19:45:56 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:45:56 logan pengine: [7076]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:45:56 logan cib: [7030]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/16, version=0.3002.6): ok (rc=0) Oct 25 19:45:56 logan crmd: [7034]: info: pe_msg_dispatch: Received HUP from pengine:[7076] Oct 25 19:45:56 logan crmd: [7034]: CRIT: pe_connection_destroy: Connection to the Policy Engine failed (pid=7076, uuid=5f8ab55b-67d8-4cf4-8845-55759e76b427) Oct 25 19:45:56 logan crmd: [7034]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ] Oct 25 19:45:56 logan crmd: [7034]: info: do_state_transition: All 1 cluster nodes responded to the join offer. Oct 25 19:45:56 logan crmd: [7034]: info: do_dc_join_finalize: join-1: Syncing the CIB from logan to the rest of the cluster Oct 25 19:45:56 logan cib: [7030]: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/18, version=0.3002.6): ok (rc=0) Oct 25 19:45:56 logan crmd: [7034]: notice: save_cib_contents: Saved CIB contents after PE crash to /var/lib/pengine/pe-core-5f8ab55b-67d8-4cf4-8845-55759e76b427.bz2 Oct 25 19:45:56 logan crmd: [7034]: ERROR: do_log: FSA: Input I_ERROR from save_cib_contents() received in state S_FINALIZE_JOIN Oct 25 19:45:56 logan crmd: [7034]: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=save_cib_contents ] Oct 25 19:45:56 logan crmd: [7034]: ERROR: do_recover: Action A_RECOVER (0000000001000000) not supported Oct 25 19:45:56 logan crmd: [7034]: WARN: do_election_vote: Not voting in election, we're in state S_RECOVERY Oct 25 19:45:56 logan crmd: [7034]: info: do_dc_release: DC role released Oct 25 19:45:56 logan crmd: [7034]: info: do_te_control: Transitioner is now inactive Oct 25 19:45:56 logan crmd: [7034]: ERROR: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY Oct 25 19:45:56 logan crmd: [7034]: info: do_state_transition: State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_recover ] Oct 25 19:45:56 logan crmd: [7034]: info: do_shutdown: Disconnecting STONITH... Oct 25 19:45:56 logan crmd: [7034]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected Oct 25 19:45:56 logan crmd: [7034]: info: do_lrm_control: Disconnected from the LRM Oct 25 19:45:56 logan crmd: [7034]: notice: terminate_ais_connection: Disconnecting from AIS Oct 25 19:45:56 logan crmd: [7034]: info: do_ha_control: Disconnected from OpenAIS Oct 25 19:45:56 logan crmd: [7034]: info: do_cib_control: Disconnecting CIB Oct 25 19:45:56 logan crmd: [7034]: info: crmd_cib_connection_destroy: Connection to the CIB terminated... Oct 25 19:45:56 logan crmd: [7034]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd Oct 25 19:45:56 logan crmd: [7034]: ERROR: do_exit: Could not recover from internal error Oct 25 19:45:56 logan crmd: [7034]: info: free_mem: Dropping I_PENDING: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_election_vote ] Oct 25 19:45:56 logan crmd: [7034]: info: free_mem: Dropping I_RELEASE_SUCCESS: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_dc_release ] Oct 25 19:45:56 logan crmd: [7034]: info: free_mem: Dropping I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ] Oct 25 19:45:56 logan crmd: [7034]: info: crm_xml_cleanup: Cleaning up memory from libxml2 Oct 25 19:45:56 logan crmd: [7034]: info: do_exit: [crmd] stopped (2) Oct 25 19:45:56 logan corosync[7022]: [pcmk ] info: pcmk_ipc_exit: Client crmd (conn=0xb55005e0, async-conn=0xb55005e0) left Oct 25 19:45:56 logan cib: [7030]: WARN: send_ipc_message: IPC Channel to 7034 is not connected Oct 25 19:45:56 logan cib: [7030]: WARN: cib_notify_client: Notification of client 7034/ba0feaf7-d64b-4dd6-901d-e9ad794f9eda failed Oct 25 19:45:56 logan cib: [7030]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/19, version=0.3002.7): ok (rc=0) Oct 25 19:45:56 logan cib: [7030]: WARN: send_ipc_message: IPC Channel to 7034 is not connected Oct 25 19:45:56 logan cib: [7030]: WARN: send_via_callback_channel: Delivery of reply to client 7034/ba0feaf7-d64b-4dd6-901d-e9ad794f9eda failed Oct 25 19:45:56 logan cib: [7030]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed Oct 25 19:45:56 logan cib: [7030]: info: cib_process_readwrite: We are now in R/O mode Oct 25 19:45:56 logan cib: [7030]: WARN: send_ipc_message: IPC Channel to 7034 is not connected Oct 25 19:45:56 logan cib: [7030]: WARN: send_via_callback_channel: Delivery of reply to client 7034/ba0feaf7-d64b-4dd6-901d-e9ad794f9eda failed Oct 25 19:45:56 logan cib: [7030]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed Oct 25 19:45:56 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7076, rc=0) Oct 25 19:45:56 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:56 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:45:56 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7077 for process pengine Oct 25 19:45:56 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:56 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:56 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:56 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 1 children Oct 25 19:45:56 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:56 logan pengine: [7077]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:45:58 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:45:58 logan pengine: [7077]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:45:58 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7077, rc=0) Oct 25 19:45:58 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:58 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:45:58 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7078 for process pengine Oct 25 19:45:58 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:58 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:45:58 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:45:58 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 1 children Oct 25 19:45:58 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:45:58 logan pengine: [7078]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:46:00 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:46:00 logan pengine: [7078]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:46:00 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7078, rc=0) Oct 25 19:46:00 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:46:00 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:46:00 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7080 for process pengine Oct 25 19:46:00 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:46:00 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:46:00 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:46:00 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 1 children Oct 25 19:46:00 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:46:00 logan pengine: [7080]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:46:02 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:46:02 logan pengine: [7080]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:46:02 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7080, rc=0) Oct 25 19:46:02 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:46:02 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:46:02 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7081 for process pengine Oct 25 19:46:02 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:46:02 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:46:02 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:46:02 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 1 children Oct 25 19:46:02 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:46:02 logan pengine: [7081]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:46:04 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:46:04 logan pengine: [7081]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:46:04 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7081, rc=0) Oct 25 19:46:04 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:46:04 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:46:04 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7082 for process pengine Oct 25 19:46:04 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:46:04 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:46:04 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:46:04 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 1 children Oct 25 19:46:04 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:46:04 logan pengine: [7082]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:46:05 logan snmpd[1007]: Connection from UDP: [127.0.0.1]:53236->[127.0.0.1] Oct 25 19:46:05 logan snmpd[1007]: Connection from UDP: [127.0.0.1]:53236->[127.0.0.1] Oct 25 19:46:05 logan snmpd[1007]: Connection from UDP: [127.0.0.1]:52528->[127.0.0.1] Oct 25 19:46:05 logan snmpd[1007]: Connection from UDP: [127.0.0.1]:60208->[127.0.0.1] Oct 25 19:46:06 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:46:06 logan pengine: [7082]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:46:06 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7082, rc=0) Oct 25 19:46:06 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:46:06 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:46:06 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7083 for process pengine Oct 25 19:46:06 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:46:06 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:46:06 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:46:06 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 1 children Oct 25 19:46:06 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:46:06 logan pengine: [7083]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:46:08 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:46:08 logan pengine: [7083]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:46:08 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7083, rc=0) Oct 25 19:46:08 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:46:08 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:46:08 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7084 for process pengine Oct 25 19:46:08 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:46:08 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:46:08 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:46:08 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 1 children Oct 25 19:46:08 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:46:08 logan pengine: [7084]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:46:10 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:46:10 logan pengine: [7084]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:46:10 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7084, rc=0) Oct 25 19:46:10 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:46:10 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine Oct 25 19:46:10 logan corosync[7022]: [pcmk ] info: spawn_child: Forked child 7086 for process pengine Oct 25 19:46:10 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:46:10 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:46:10 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000111312 (1118994) Oct 25 19:46:10 logan corosync[7022]: [pcmk ] info: send_member_notification: Sending membership update 1032 to 1 children Oct 25 19:46:10 logan cib: [7037]: info: ais_dispatch_message: Membership 1032: quorum still lost Oct 25 19:46:10 logan pengine: [7086]: info: Invoked: /usr/lib/heartbeat/pengine Oct 25 19:46:12 logan pengine: [7033]: WARN: main: Terminating previous PE instance Oct 25 19:46:12 logan pengine: [7086]: WARN: process_pe_message: Received quit message, terminating Oct 25 19:46:12 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Child process pengine exited (pid=7086, rc=0) Oct 25 19:46:12 logan corosync[7022]: [pcmk ] info: update_member: Node logan now has process list: 00000000000000000000000000101312 (1053458) Oct 25 19:46:12 logan corosync[7022]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: pengine ..... Vincent Fortier _______________________________________________ Mailing list: https://launchpad.net/~ubuntu-ha Post to : [email protected] Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp

