hello i seem to have had a network failure in my cluster of two nodes - main node A lived on, while on node B qpid quit. now, there are two queues (Q1, Q2 with same routing key) and after this incident broker A kept receiving messages to these queues. after some time i tried to restart node B and couldn't - first i tried with its data-dir untouched, then i removed the data dir contents altogether. judging by the qpid logs, the B broker joined the cluster and started receiving state updates; it read all the messages for queue Q1 and then died when reading the first message for Q2, the last log message is 'qpid.cluster-update: recv cmd 28: content (267 bytes) <?xml version="1.0" encoding="ut...'
i managed to start B only when i 'drain'ed the contents of Q2 any hints of what i might be doing wrong when starting up the failed node? thanks! stoyan btw: on node A corosync-cpgtool wrongly thought A and B are still in a cluster all the time, while on B it properly showed A as the lone node in the cluster, but thats a different matter c++ qpid 0.8 corosync 1.3.1 rhel5 the initial network error indicator in corosync.log was corosync[8458]: [TOTEM ] A processor failed, forming new configuration later followed by qpidd[8474]: 2011-05-17 21:44:32 critical Multicast error: Cannot mcast to CPG group QpidCluster: not exist (12) -- View this message in context: http://apache-qpid-users.2158936.n2.nabble.com/cannot-restart-failed-cluster-node-tp6377307p6377307.html Sent from the Apache Qpid users mailing list archive at Nabble.com. --------------------------------------------------------------------- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:users-subscr...@qpid.apache.org