Re: Failover Question

Gary Tully Fri, 28 May 2010 08:25:34 -0700

I just had a cursory look at the code and I think the receiveNoWait() call
may be part of the problem.


receiveNoWait does not work well with activemq just after a consumer has
been created. It can take some time for the consumer to register and
dispatch to occur and it ocurrs async to the receiveNoWait call.
Either use a small timeout receive(1000) or loop while receivedMessage ==
null for a few iterations.

On 27 May 2010 17:15, <daniel.stu...@attensity.com> wrote:

> Hi ActiveMQ Team,
>
>
>
> in the eclipse open source project SMILA we use ActiveMQ (version 5.3.2)
> to implement a producer/consumer pattern with JMS. The basic setup is as
> follows:
>
> -          the software runs in a cluster of machines (usually between 4
> and 16)
>
> -          we use the Pure Master/Slave configuration for Queue failover
>
> -          a producer creates a large data chunk in a data repository
> and creates a JMS message containing the Id of the created chunk of data
>
> -          a consumer receives a JMS message and processes the data
> chunk with the given Id. Some consumers also function an producers as
> they create a new data chunk and another JMS message
>
> -          all machines in the cluster work as producers and consumers
>
>
>
>
>
> In general this works fine, but we have problems on a machine failure.
> For simplicity assume that one machine (except for the Master or Slave)
> has a hardware failure and crashes. Also assume that this machine was
> currently processing a received JMS message. The Session from which the
> message was received was not committed yet, as the session is only
> committed if the processing of the data was successful. Otherwise it is
> rolled back.
>
> Now as the machine crashes the session is neither committed nor rolled
> back. How can we assure that any messages that were delivered but not
> committed or rolled back are redelivered or put into the DLQ?
>
>
>
>
>
> Our first assumption was that if the connection of a session drops all
> not committed messages of that session are automatically redelivered.
> Unfortunately this was not the case. Does this only work in certain
> scenarios with specific settings ?
>
>
>
>
>
> The second  idea was to set TTL for each message, so that when TTL is
> reached the message goes into the DLQ and can be consumed there (e.g. by
> another consumer that creates a copy of the message in the actual
> queue). This would automatically cover the machine crash described
> above, as sending no commit or rollback eventually leads to reaching the
> set TTL of the message. However during tests we had strange behavior for
> messages that were processed by the crashing machine:
>
> -          some messages were handled  correctly (they were moved to the
> DLQ)
>
> -          other messages simply disappeared, in JMX console these
> messages were shown as dequeued which should only be the case if the
> session was committed. There were no exceptions in the log files.
>
>
>
> Is there anything that has to be addressed, either in the configuration
> or our code for this to work correctly?
>
>
>
> Besides this TTL has a drawback, as it is set when the message is
> created. The processing of our data takes quite a while and we also have
> to assure the processing in a certain time frame. Producers are
> generally faster than Consumers, so the number of enqueued messages
> increases. So by setting TTL we cannot assure that a message is consumed
> in a certain time frame but only that it is available for the set time.
> Are there any mechanisms that would allow us the set a "processing
> timeout" or "commit timeout" by that a message must be committed or it
> is sent to the DLQ ?
>
>
>
> BTW, what about the parameter maxInactivityDuration ? Does it have any
> effect on opened sessions/transactions ? We also set this but it did not
> seem to have any effect.
>
>
>
>
>
> Some information on our environment:
>
> -          ActiveMQ 5.3.2
>
> -          JDK 1.6.0_20
>
> -          Equinox OSGi container (eclipse 3.5)
>
> -          Linux Open Suse  11.1
>
> -          Connection-URL:
> failover://(tcp://masterhost:61616,tcp://slavehost:61616)?randomize=fals
> e
>
>
>
>
>
> It would be great if you could share your thoughts on this issue.
>
>
>
> Bye,
>
> Daniel
>
>
>
>
>
>


-- 
http://blog.garytully.com

Open Source Integration
http://fusesource.com

Re: Failover Question

Reply via email to