Hi, I changed the implementation from receiveNoWait() to receive(10000) but it did not change anything in the behavior.
After the client crashes I can still see the delivered message in the Queue (using browse) but no receive() call can get this message again, it seems to be stuck in the queue. I set breakpoints (over 20) to all onException() methods of implementations of TransportListener in the ActiveMQ. No breakpoint is triggered when the client crashes. However, if I set up a TransportListener in my JUnit test (in method testSendReceiveOnCrash()) then there onException() is triggered with the following exception: java.io.EOFException Client transport error: at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269) at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:211) at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:203) at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:186) at java.lang.Thread.run(Thread.java:619) Shouldn't this exception end up somewhere in the ActiveMQ server code ? Bye, Daniel -----Ursprüngliche Nachricht----- Von: Gary Tully [mailto:gary.tu...@gmail.com] Gesendet: Freitag, 28. Mai 2010 17:25 An: users@activemq.apache.org Betreff: Re: Failover Question I just had a cursory look at the code and I think the receiveNoWait() call may be part of the problem. receiveNoWait does not work well with activemq just after a consumer has been created. It can take some time for the consumer to register and dispatch to occur and it ocurrs async to the receiveNoWait call. Either use a small timeout receive(1000) or loop while receivedMessage == null for a few iterations. On 27 May 2010 17:15, <daniel.stu...@attensity.com> wrote: > Hi ActiveMQ Team, > > > > in the eclipse open source project SMILA we use ActiveMQ (version 5.3.2) > to implement a producer/consumer pattern with JMS. The basic setup is as > follows: > > - the software runs in a cluster of machines (usually between 4 > and 16) > > - we use the Pure Master/Slave configuration for Queue failover > > - a producer creates a large data chunk in a data repository > and creates a JMS message containing the Id of the created chunk of data > > - a consumer receives a JMS message and processes the data > chunk with the given Id. Some consumers also function an producers as > they create a new data chunk and another JMS message > > - all machines in the cluster work as producers and consumers > > > > > > In general this works fine, but we have problems on a machine failure. > For simplicity assume that one machine (except for the Master or Slave) > has a hardware failure and crashes. Also assume that this machine was > currently processing a received JMS message. The Session from which the > message was received was not committed yet, as the session is only > committed if the processing of the data was successful. Otherwise it is > rolled back. > > Now as the machine crashes the session is neither committed nor rolled > back. How can we assure that any messages that were delivered but not > committed or rolled back are redelivered or put into the DLQ? > > > > > > Our first assumption was that if the connection of a session drops all > not committed messages of that session are automatically redelivered. > Unfortunately this was not the case. Does this only work in certain > scenarios with specific settings ? > > > > > > The second idea was to set TTL for each message, so that when TTL is > reached the message goes into the DLQ and can be consumed there (e.g. by > another consumer that creates a copy of the message in the actual > queue). This would automatically cover the machine crash described > above, as sending no commit or rollback eventually leads to reaching the > set TTL of the message. However during tests we had strange behavior for > messages that were processed by the crashing machine: > > - some messages were handled correctly (they were moved to the > DLQ) > > - other messages simply disappeared, in JMX console these > messages were shown as dequeued which should only be the case if the > session was committed. There were no exceptions in the log files. > > > > Is there anything that has to be addressed, either in the configuration > or our code for this to work correctly? > > > > Besides this TTL has a drawback, as it is set when the message is > created. The processing of our data takes quite a while and we also have > to assure the processing in a certain time frame. Producers are > generally faster than Consumers, so the number of enqueued messages > increases. So by setting TTL we cannot assure that a message is consumed > in a certain time frame but only that it is available for the set time. > Are there any mechanisms that would allow us the set a "processing > timeout" or "commit timeout" by that a message must be committed or it > is sent to the DLQ ? > > > > BTW, what about the parameter maxInactivityDuration ? Does it have any > effect on opened sessions/transactions ? We also set this but it did not > seem to have any effect. > > > > > > Some information on our environment: > > - ActiveMQ 5.3.2 > > - JDK 1.6.0_20 > > - Equinox OSGi container (eclipse 3.5) > > - Linux Open Suse 11.1 > > - Connection-URL: > failover://(tcp://masterhost:61616,tcp://slavehost:61616)?randomize=fals > e > > > > > > It would be great if you could share your thoughts on this issue. > > > > Bye, > > Daniel > > > > > > -- http://blog.garytully.com Open Source Integration http://fusesource.com