Hi all,

I have been having some issues which are currently show-stoppers with the
use of Durable subscriptions with Active MQ Topics for our large scale
integration project.

I've been writing the subscriber in C#, but the issue also remains for the
Java implementation. The standard approach for establishing a durable
subscriber is to perform all the standard steps for setting up a subscriber,
along with setting a ClientID (to provide the unique ID for the subscriber
application), and calling CreateDurableConsumer on the session.

On the first attempt, the subscription is established, and the messages are
correctly received.  If the subscriber is then shutdown in a controlled
manner manor, and the connection is correctly stopped and closed, then
subsequent restarts of the subscriber will perform as expected. Great, all
works fine.

However, under real failure scenarios (machine goes pop, or goes offline for
some reason resulting in a subscriber restart) the connection doesn't have
chance to correctly terminate the connection with the broker - essentially
an "uncontrolled shutdown".  This is where the problem arises.  If the
subscriber now attempts to establish a durable subscription with the same
ClientID and name as before, the broker returns a 'Client XXX already
connected' error, and prevents the connection from being made - even though
the previous client/subscriber is not actually connected, or even running. 
This doesn’t seem to be time bound either - even waiting for a period of
time (minutes) and retrying, will produce the same results, so it's not the
socket in TIME_WAIT state which is causing it.

After further investigation, I've discovered the following:

Using Jconsole to look into the state of the broker, it seems that,
following an uncontrolled client disconnect (as previously performed) the
previously created Connection instance is still classed (by the broker) as
being both live and connected, although it blatantly isn't connected (or
even live), because the client is no longer there.  This is persisted by the
broker, and never seems to be cleared (until a broker restart, which is
unacceptable in an enterprise scale environment just to recover from a
single subscriber failure)

The broker should detect the socket disconnect from the failed subscriber,
and clean up the connection status in the broker.

Another observation is that if the connection is manually cleared using
Jconsole (using the relevant operation on the connection instance), the
subscriber can indeed reconnect using the durable subscription.

Another observation is that this only happens if NO messages are published
to the topic during the subscriber downtime.  If however a message is
published to the topic during the subscriber downtime, the broker will
detect that the subscriber is no longer live, and clear up the connection. 
This results in the subscriber being able to reconnect successfully. 
However, in production environments, we cannot guarantee that a message will
be sent on a topic during the subscriber downtime - although most topics
will have high utilisation, some have low throughput - but this cannot be
relied upon, and the failure of a single durable subscription will result in
the failure of the complete subscribing application.

It seems that all the ActiveMQ unit tests (or the ones I've looked at) to
test the durability of the connection, perform orderly shutdown of the
connection during the test.  This results in the broker correctly cleaning
the connection status, and the remaining tests being successful.

Under other JMS implementations (namely Tibco EMS but I've performed similar
in the past with JBossMQ), this doesn't happen.  Many JMS resources specify
that if a durable subscription is attempted and one is already established,
then the existing subscription is overwritten, and the new one is
established.  This doesn't seem to be the case with ActiveMQ - instead it
throws an exception.

My main questions to the ActiveMQ forum are:
1) Is there a workaround for this to allow subsequent durable subscriptions
to work following an "uncontrolled" subscriber shutdown?
2) Does ActiveMQ have a configuration parameter to allow subsequent durable
subscriptions to overwrite existing ones (even if the existing ones are
actually dead connections)
3) Is there anything within ActiveMQ which can periodically test the
connections in the broker to see if they are still live - if not, then clean
them up to overcome this problem
4) Has anybody else experienced this issue in a production quality
environment or otherwise - I've seen many posts to do with 'Client XXX
already connected' but nothing which resolves the issue other than 'fixed in
the 4.1…. Release'.  We are using 4.1.1 so we should see the fix - this
sounds like another issue which has slipped though the net.

Any feedback on this would be much appreciated.

Kind regards

Simon Vicary
Integration and Technical Delivery Lead.
-- 
View this message in context: 
http://www.nabble.com/ActiveMQ-and-Durable-Topic-subscriptions-after-subscriber-is-uncleanly-terminated-tf4102045s2354.html#a11665143
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Reply via email to