I have to restart my 2 broker cluster on a daily basis due to the following sequence of events: ----------------------------------------------------------------------------------------------- master 04:51:14,501 AMQ212037: Connection failure has been detected: AMQ119014: Did not receive data from /10.202.147.99:58739 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT] 04:51:14,510 AMQ222092: Connection to the backup node failed, removing replication now: ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ119014: Did not receive data from /10.202.147.99:58739 within the 60,000ms connection TTL. The connection will now be closed.] 04:51:24,517 AMQ212041: Timed out waiting for netty channel to close 04:51:24,517 AMQ212037: Connection failure has been detected: AMQ119014: Did not receive data from /10.202.147.99:58738 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT] ----------------------------------------------------------------------------------------------- slave 04:51:42,306 AMQ212037: Connection failure has been detected: AMQ119011: Did not receive data from server for org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@1c54a4bc[local= /10.202.147.99:58738, remote=nj09mhf0681/10.202.147.99:41410] [code=CONNECTION_TIMEDOUT] 04:51:42,316 AMQ212037: Connection failure has been detected: AMQ119011: Did not receive data from server for org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@65ace922[local= /10.202.147.99:58739, remote=nj09mhf0681/10.202.147.99:41410] [code=CONNECTION_TIMEDOUT] 04:51:46,955 AMQ221037: ActiveMQServerImpl::serverUUID=7ffa29a0-7c48-11e7-9784-e83935127b09 to become 'live' 04:51:59,360 AMQ221014: 40% loaded 04:52:01,854 AMQ221014: 81% loaded 04:52:03,037 AMQ222028: Could not find page cache for page PagePositionImpl [pageNr=8, messageNr=-1, recordID=8662153341] removing it from the journal 04:52:03,051 AMQ222028: Could not find page cache for page PagePositionImpl [pageNr=13, messageNr=-1, recordID=8662204094] removing it from the journal 04:52:03,208 AMQ221003: Deploying queue jms.queue.DLQ 04:52:03,281 AMQ221003: Deploying queue jms.queue.ExpiryQueue 04:52:03,827 AMQ212034: There are more than one servers on the network broadcasting the same node id. ----------------------------------------------------------------------------------------------- master 04:52:03,827 AMQ212034: There are more than one servers on the network broadcasting the same node id. ----------------------------------------------------------------------------------------------- slave 04:52:03,910 AMQ221007: Server is now live 04:52:04,003 AMQ221020: Started Acceptor at nj09mhf0681:41411 for protocols [CORE,MQTT,AMQP,STOMP,HORNETQ,OPENWIRE] 04:52:11,949 AMQ212034: There are more than one servers on the network broadcasting the same node id. ----------------------------------------------------------------------------------------------- I understand that at some point master (now live) loses slave and closes connection to it. Slave (backup now) in turn detects that master is not present and becomes live. Now both brokers are live and never recover to normal until restart. How can I avois this? Will appreciate any help. Thank you.
-- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html