Failover Question

daniel.stucky Thu, 27 May 2010 09:16:14 -0700

Hi ActiveMQ Team,


in the eclipse open source project SMILA we use ActiveMQ (version 5.3.2)
to implement a producer/consumer pattern with JMS. The basic setup is as
follows:

-          the software runs in a cluster of machines (usually between 4
and 16)

-          we use the Pure Master/Slave configuration for Queue failover

-          a producer creates a large data chunk in a data repository
and creates a JMS message containing the Id of the created chunk of data

-          a consumer receives a JMS message and processes the data
chunk with the given Id. Some consumers also function an producers as
they create a new data chunk and another JMS message

-          all machines in the cluster work as producers and consumers

 

 

In general this works fine, but we have problems on a machine failure.
For simplicity assume that one machine (except for the Master or Slave)
has a hardware failure and crashes. Also assume that this machine was
currently processing a received JMS message. The Session from which the
message was received was not committed yet, as the session is only
committed if the processing of the data was successful. Otherwise it is
rolled back.

Now as the machine crashes the session is neither committed nor rolled
back. How can we assure that any messages that were delivered but not
committed or rolled back are redelivered or put into the DLQ?

 

 

Our first assumption was that if the connection of a session drops all
not committed messages of that session are automatically redelivered.
Unfortunately this was not the case. Does this only work in certain
scenarios with specific settings ?

 

 

The second  idea was to set TTL for each message, so that when TTL is
reached the message goes into the DLQ and can be consumed there (e.g. by
another consumer that creates a copy of the message in the actual
queue). This would automatically cover the machine crash described
above, as sending no commit or rollback eventually leads to reaching the
set TTL of the message. However during tests we had strange behavior for
messages that were processed by the crashing machine:

-          some messages were handled  correctly (they were moved to the
DLQ)

-          other messages simply disappeared, in JMX console these
messages were shown as dequeued which should only be the case if the
session was committed. There were no exceptions in the log files.

 

Is there anything that has to be addressed, either in the configuration
or our code for this to work correctly?

 

Besides this TTL has a drawback, as it is set when the message is
created. The processing of our data takes quite a while and we also have
to assure the processing in a certain time frame. Producers are
generally faster than Consumers, so the number of enqueued messages
increases. So by setting TTL we cannot assure that a message is consumed
in a certain time frame but only that it is available for the set time.
Are there any mechanisms that would allow us the set a "processing
timeout" or "commit timeout" by that a message must be committed or it
is sent to the DLQ ?

 

BTW, what about the parameter maxInactivityDuration ? Does it have any
effect on opened sessions/transactions ? We also set this but it did not
seem to have any effect.

 

 

Some information on our environment:

-          ActiveMQ 5.3.2

-          JDK 1.6.0_20

-          Equinox OSGi container (eclipse 3.5)

-          Linux Open Suse  11.1

-          Connection-URL:
failover://(tcp://masterhost:61616,tcp://slavehost:61616)?randomize=fals
e

 

 

It would be great if you could share your thoughts on this issue.

 

Bye,

Daniel

Failover Question

Reply via email to