cannot restart failed cluster node

stoyan Wed, 18 May 2011 03:57:43 -0700

hello
i seem to have had a network failure in my cluster of two nodes - main node
A lived on, while on node B qpid quit.
now, there are two queues (Q1, Q2 with same routing key) and after this
incident broker A kept receiving messages to these queues. 
after some time i tried to restart node B and couldn't - first i tried with
its data-dir untouched, then i removed the data dir contents altogether. 
judging by the qpid logs, the B broker joined the cluster and started
receiving state updates; it read all the messages for queue Q1 and then died
when reading the first message for Q2, the last log message is
'qpid.cluster-update: recv cmd 28: content (267 bytes) <?xml version="1.0"
encoding="ut...'


i managed to start B only when i 'drain'ed the contents of Q2

any hints of what i might be doing wrong when starting up the failed node?

thanks!


stoyan


btw: on node A corosync-cpgtool wrongly thought A and B are still in a
cluster all the time, while on B it properly showed A as the lone node in
the cluster, but thats a different matter

c++ qpid 0.8
corosync 1.3.1
rhel5

the initial network error indicator in corosync.log was 
corosync[8458]:   [TOTEM ] A processor failed, forming new configuration
later followed by
qpidd[8474]: 2011-05-17 21:44:32 critical Multicast error: Cannot mcast to
CPG group QpidCluster: not exist (12)


--
View this message in context: 
http://apache-qpid-users.2158936.n2.nabble.com/cannot-restart-failed-cluster-node-tp6377307p6377307.html
Sent from the Apache Qpid users mailing list archive at Nabble.com.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscr...@qpid.apache.org

cannot restart failed cluster node

Reply via email to