I have a clustered J2EE application that starts up a broker on each node
using failover protocol and a shared data directory.

Node 1: Starts up a broker and creates a transport connector at
tcp://<host>:61616. Consumer and producer connect via a broker url of
failover:(tcp://<host>:61616,tcp://<host>:61617)

Node 2: Starts up a broker and creates a transport connector at
tcp://<host>:61617. Consumer and producer connect via a broker url of
failover:(tcp://<host>:61616,tcp://<host>:61617)

The simple use case that is failing for me is the following:
1) Start up node 1 first so it is the master. Start up node 2.
2) Send 4 messages on node 1 with a delay so that the node can be killed
before the messages finish processing. 2 messages are being processed on
node 1 and 2 on node 2.
3) Forcefully kill node 1 while messages are being processed.

The two threads on node 2 that were consuming the messages were both hanging
after calling TransactionContext#end. They would go into
ResponseCorrelator#request and send a TransactionInfo command. The
TransactionInfo command is consumed and creates a response command which is
sent correctly. The problem seems to be that this response command is never
read in TcpTransport#doRun. Because of this, ResponseCorrelator#request
blocks when trying to return the response.getResult().

The transactions for the 2 messages being processed on node 2 block so they
are never committed.

If I modify my test to only send 2 messages so that each node is processing
1 message, everything runs without any problems. The second node is able to
end the transaction successfully by going through the exact same code path
except that the response command is consumed. After that it processes the
message that was being consumed by node 1 correctly as well. Once I send 4
or more messages, this issue will occur.

Does anyone have any insight as to what might be happening? I haven't been
able to figure out why the response command doesn't get consumed in the
unsuccessful case. There are no exceptions either and the response command
seems to be sent successfully.



--
View this message in context: 
http://activemq.2283324.n4.nabble.com/Calling-end-on-TransactionContext-hangs-during-failover-when-using-master-slave-tp4720859.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Reply via email to