Hello,

i have a following scenario causing a failure while trying to restart a
failed cluster node:

1) start a cluster on two nodes N1 and N2 (172.16.133.123/172.16.133.120)
2) start consumer C for queue Q
3) start producer P for queue Q sending one text message / sec
4) confirm with tcpdump that N1 is retrieving all of the traffic
5) shut down node N1 with 'qpidd --quit'
6) confirm with tcpdump that N2 is retrieving all of the traffic (successful
failover)
7) restart node N1 with 'qpidd'
8) check the qpidd.log with the error catch-up connection closed prematurely

Any ideas what's going on?

qpidd.conf:
cluster-mechanism=ANONYMOUS
cluster-name=MYCLUSTER
log-to-file=/home/qpid/qpid.log
daemon=yes
no-data-dir=yes
auth=no


qpidd.log (N1)
2011-09-19 13:58:35 notice Initializing CPG
2011-09-19 13:58:35 notice cluster(172.16.133.123:18918 PRE_INIT)
configuration change: 172.16.133.123:18918 
2011-09-19 13:58:35 notice cluster(172.16.133.123:18918 PRE_INIT) Members
joined: 172.16.133.123:18918 
2011-09-19 13:58:35 notice SASL disabled: No Authentication Performed
2011-09-19 13:58:35 notice Listening on TCP port 5672
2011-09-19 13:58:35 notice cluster(172.16.133.123:18918 INIT) cluster-uuid =
7ab02e1b-67dd-4fed-b176-79b567ab699f
2011-09-19 13:58:35 notice cluster(172.16.133.123:18918 READY) joined
cluster EMS_CLUSTER
2011-09-19 13:58:35 notice Broker running
2011-09-19 13:58:41 notice cluster(172.16.133.123:18918 READY) configuration
change: 172.16.133.120:29504 172.16.133.123:18918 
2011-09-19 13:58:41 notice cluster(172.16.133.123:18918 READY) Members
joined: 172.16.133.120:29504 
2011-09-19 13:58:41 notice cluster(172.16.133.123:18918 UPDATER) sending
update to 172.16.133.120:29504 at amqp:tcp:172.16.133.120:5672
2011-09-19 13:58:41 warning Broker closed connection: 200, OK
2011-09-19 13:58:41 notice cluster(172.16.133.123:18918 UPDATER) update sent
2011-09-19 14:00:40 notice Shut down
2011-09-19 14:00:43 notice Initializing CPG
2011-09-19 14:00:43 notice cluster(172.16.133.123:19037 PRE_INIT)
configuration change: 172.16.133.120:29504 172.16.133.123:19037 
2011-09-19 14:00:43 notice cluster(172.16.133.123:19037 PRE_INIT) Members
joined: 172.16.133.123:19037 
2011-09-19 14:00:43 notice SASL disabled: No Authentication Performed
2011-09-19 14:00:43 notice Listening on TCP port 5672
2011-09-19 14:00:43 notice cluster(172.16.133.123:19037 INIT) cluster-uuid =
7ab02e1b-67dd-4fed-b176-79b567ab699f
2011-09-19 14:00:43 notice cluster(172.16.133.123:19037 JOINER) joining
cluster MYCLUSTER
2011-09-19 14:00:43 notice Broker running
2011-09-19 14:00:43 notice cluster(172.16.133.123:19037 UPDATEE) receiving
update from 172.16.133.120:29504
2011-09-19 14:00:43 error deliveryRecord no update message
(qpid/cluster/Connection.cpp:537)
2011-09-19 14:00:43 critical cluster(172.16.133.123:19037 UPDATEE) catch-up
connection closed prematurely
172.16.133.120:5672-172.16.136.143:53170(172.16.133.123:19037-4
local,catchup)
2011-09-19 14:00:43 notice cluster(172.16.133.123:19037 LEFT) leaving
cluster MYCLUSTER
2011-09-19 14:00:43 notice Shut down



--
View this message in context: 
http://apache-qpid-users.2158936.n2.nabble.com/Cannot-restart-a-failed-cluster-node-catch-up-connection-closed-prematurely-tp6807962p6807962.html
Sent from the Apache Qpid users mailing list archive at Nabble.com.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

Reply via email to