I have a clustered J2EE application that starts up a broker on each node using failover protocol and a shared data directory.
Node 1: Starts up a broker and creates a transport connector at tcp://<host>:61616. Consumer and producer connect via a broker url of failover:(tcp://<host>:61616,tcp://<host>:61617) Node 2: Starts up a broker and creates a transport connector at tcp://<host>:61617. Consumer and producer connect via a broker url of failover:(tcp://<host>:61616,tcp://<host>:61617) The simple use case that is failing for me is the following: 1) Start up node 1 first so it is the master. Start up node 2. 2) Send 4 messages on node 1 with a delay so that the node can be killed before the messages finish processing. 2 messages are being processed on node 1 and 2 on node 2. 3) Forcefully kill node 1 while messages are being processed. The two threads on node 2 that were consuming the messages were both hanging after calling TransactionContext#end. They would go into ResponseCorrelator#request and send a TransactionInfo command. The TransactionInfo command is consumed and creates a response command which is sent correctly. The problem seems to be that this response command is never read in TcpTransport#doRun. Because of this, ResponseCorrelator#request blocks when trying to return the response.getResult(). The transactions for the 2 messages being processed on node 2 block so they are never committed. If I modify my test to only send 2 messages so that each node is processing 1 message, everything runs without any problems. The second node is able to end the transaction successfully by going through the exact same code path except that the response command is consumed. After that it processes the message that was being consumed by node 1 correctly as well. Once I send 4 or more messages, this issue will occur. Does anyone have any insight as to what might be happening? I haven't been able to figure out why the response command doesn't get consumed in the unsuccessful case. There are no exceptions either and the response command seems to be sent successfully. -- View this message in context: http://activemq.2283324.n4.nabble.com/Calling-end-on-TransactionContext-hangs-during-failover-when-using-master-slave-tp4720859.html Sent from the ActiveMQ - User mailing list archive at Nabble.com.