Re: Failover Question

Gary Tully Mon, 31 May 2010 02:55:03 -0700

odd, yea, you would expect to stop in
org.apache.activemq.broker.TransportConnection.TransportConnection(...).new
DefaultTransportListener() {...}.onException(IOException)


does the connection still appear active with netstat or does the consumer
still appear in the console?

On 31 May 2010 10:14, <daniel.stu...@attensity.com> wrote:

> Hi,
>
> I changed the implementation from receiveNoWait() to receive(10000) but it
> did not change anything in the behavior.
>
> After the client crashes I can still see the delivered message in the Queue
> (using browse) but no receive() call can get this message again, it seems to
> be stuck in the queue.
>
>
> I set breakpoints (over 20) to all onException() methods of implementations
> of TransportListener in the ActiveMQ. No breakpoint is triggered when the
> client crashes. However, if I set up a TransportListener in my JUnit test
> (in method testSendReceiveOnCrash()) then there onException() is triggered
> with the following exception:
> java.io.EOFException Client transport error:
>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>         at
> org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
>         at
> org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:211)
>        at
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:203)
>        at
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:186)
>        at java.lang.Thread.run(Thread.java:619)
>
>
> Shouldn't this exception end up somewhere in the ActiveMQ server code ?
>
>
> Bye,
> Daniel
>
> -----Ursprüngliche Nachricht-----
> Von: Gary Tully [mailto:gary.tu...@gmail.com]
> Gesendet: Freitag, 28. Mai 2010 17:25
> An: users@activemq.apache.org
> Betreff: Re: Failover Question
>
> I just had a cursory look at the code and I think the receiveNoWait() call
> may be part of the problem.
>
> receiveNoWait does not work well with activemq just after a consumer has
> been created. It can take some time for the consumer to register and
> dispatch to occur and it ocurrs async to the receiveNoWait call.
> Either use a small timeout receive(1000) or loop while receivedMessage ==
> null for a few iterations.
>
> On 27 May 2010 17:15, <daniel.stu...@attensity.com> wrote:
>
> > Hi ActiveMQ Team,
> >
> >
> >
> > in the eclipse open source project SMILA we use ActiveMQ (version 5.3.2)
> > to implement a producer/consumer pattern with JMS. The basic setup is as
> > follows:
> >
> > -          the software runs in a cluster of machines (usually between 4
> > and 16)
> >
> > -          we use the Pure Master/Slave configuration for Queue failover
> >
> > -          a producer creates a large data chunk in a data repository
> > and creates a JMS message containing the Id of the created chunk of data
> >
> > -          a consumer receives a JMS message and processes the data
> > chunk with the given Id. Some consumers also function an producers as
> > they create a new data chunk and another JMS message
> >
> > -          all machines in the cluster work as producers and consumers
> >
> >
> >
> >
> >
> > In general this works fine, but we have problems on a machine failure.
> > For simplicity assume that one machine (except for the Master or Slave)
> > has a hardware failure and crashes. Also assume that this machine was
> > currently processing a received JMS message. The Session from which the
> > message was received was not committed yet, as the session is only
> > committed if the processing of the data was successful. Otherwise it is
> > rolled back.
> >
> > Now as the machine crashes the session is neither committed nor rolled
> > back. How can we assure that any messages that were delivered but not
> > committed or rolled back are redelivered or put into the DLQ?
> >
> >
> >
> >
> >
> > Our first assumption was that if the connection of a session drops all
> > not committed messages of that session are automatically redelivered.
> > Unfortunately this was not the case. Does this only work in certain
> > scenarios with specific settings ?
> >
> >
> >
> >
> >
> > The second  idea was to set TTL for each message, so that when TTL is
> > reached the message goes into the DLQ and can be consumed there (e.g. by
> > another consumer that creates a copy of the message in the actual
> > queue). This would automatically cover the machine crash described
> > above, as sending no commit or rollback eventually leads to reaching the
> > set TTL of the message. However during tests we had strange behavior for
> > messages that were processed by the crashing machine:
> >
> > -          some messages were handled  correctly (they were moved to the
> > DLQ)
> >
> > -          other messages simply disappeared, in JMX console these
> > messages were shown as dequeued which should only be the case if the
> > session was committed. There were no exceptions in the log files.
> >
> >
> >
> > Is there anything that has to be addressed, either in the configuration
> > or our code for this to work correctly?
> >
> >
> >
> > Besides this TTL has a drawback, as it is set when the message is
> > created. The processing of our data takes quite a while and we also have
> > to assure the processing in a certain time frame. Producers are
> > generally faster than Consumers, so the number of enqueued messages
> > increases. So by setting TTL we cannot assure that a message is consumed
> > in a certain time frame but only that it is available for the set time.
> > Are there any mechanisms that would allow us the set a "processing
> > timeout" or "commit timeout" by that a message must be committed or it
> > is sent to the DLQ ?
> >
> >
> >
> > BTW, what about the parameter maxInactivityDuration ? Does it have any
> > effect on opened sessions/transactions ? We also set this but it did not
> > seem to have any effect.
> >
> >
> >
> >
> >
> > Some information on our environment:
> >
> > -          ActiveMQ 5.3.2
> >
> > -          JDK 1.6.0_20
> >
> > -          Equinox OSGi container (eclipse 3.5)
> >
> > -          Linux Open Suse  11.1
> >
> > -          Connection-URL:
> > failover://(tcp://masterhost:61616,tcp://slavehost:61616)?randomize=fals
> > e
> >
> >
> >
> >
> >
> > It would be great if you could share your thoughts on this issue.
> >
> >
> >
> > Bye,
> >
> > Daniel
> >
> >
> >
> >
> >
> >
>
>
> --
> http://blog.garytully.com
>
> Open Source Integration
> http://fusesource.com
>



-- 
http://blog.garytully.com

Open Source Integration
http://fusesource.com

Re: Failover Question

Reply via email to