It looks like the broker maybe suffering from exactly the same problem we encountered when implementing client-side failover. Namely that when the master broker went down a subsequent read on the socket by the client would hang (well actually take a very long time to fail/timeout). In that case our TCP connection was ESTABLISHED and looking at the broker I see the same thing after the client host goes away (the connection is ESTABLISHED). We fixed this issue in our client by setting the socket option SO_RCVTIMEO on the connection to the broker.

I noted what the broker appears to do the same thing with the TCP transport option soTimeout. It looks like when this is set it winds up as a call to when the socket is getting initialized. I have not done any socket programming in Java but my assumption is that SO_TIMEOUT maps to both SO_RCVTIMEO and SO_SNDTIMEO in the C world.

I was hopeful with this option but when I set in in my transport connector:

<transportConnector name="stomp" uri="stomp://mmq1:61613?soTimeout=60000"/>

the timeout does not occur. I actually ran my test case about 15 hours ago and I can still see that the broker still has an ESTABLISHED connection to the dead client and has a message dispatched to it.

Am I miss understanding what soTimeout is for? I can see in org.apache.activemq.transport.tcp.TcpTransport.initialiseSocket that setSoTimeout is getting called unconditionally. So what I'm wondering is if it is actually calling it with a 0 value despite the way I set up my transport connector. I suppose setting this to 0 would explain why it apparently never times out where in our client case it eventually did timeout (because we were not setting the option at all before).

The re-dispatch is triggered by the tcp connection dying, netstat can help with the diagnosis here. Check the connection state of the broker port after the client host is rebooted, if the connection is still active, possibly in a timed_wait state, you may need to configure some additional timeout options on the broker side.

    I am using client acknowledgements with a prefetch size of 1 with
    no message expiration policy. When a consumer subscribes to a
    queue I can see that the message gets dispatched correctly. If the
    process gets killed before retrieving and acknowledging the
    message I see the message getting re-dispatched (correctly). I
    expected this same behaviour if the host running the process gets
    rebooted or crashes. However, after reboot I can see that the
    message is stuck in the dispatched state to the consumer that is
    long gone. Is there a way that I can get messages re-dispatched
    when a host hosting consumer processes gets re-booted? How does it
    detect the case when a process dies (even with SIGKILL)?

    I did notice that if I increase my prefetch size and enqueue
    another message after the reboot, that activemq will re-dispatch
    the original message. However with prefetch size equal to one the
    message never seems to get re-dispatched.


