Hi Norbert, this sounds like a different problem that this one. Take a look at http://issues.apache.org/activemq/browse/AMQ-2149 which is being worked on and give 5.3-SNAPSHOT a try.
Cheers -- Dejan Bosanac Open Source Integration - http://fusesource.com/ ActiveMQ in Action - http://www.manning.com/snyder/ Blog - http://www.nighttale.net On Mon, Apr 6, 2009 at 9:06 AM, Norbert Pfistner < norbert.pfist...@picturesafe.de> wrote: > Hallo Murty, > > We also experience the same problems when using failover: Sometimes clients > stop working after a slave became a master and processing a bunch of > messages with this new master. > And yes, we use 5.1 . We also did some testing with 5.2, unfortunately with > the same result. So it looks like 5.2 is suffering from the same bug. > Actually we do not use failover in our productive environment due to this > unreliable feature. > > Would be fine when this bug is fixed. > > Greetings, > Norbert > > > Murty Dasari schrieb: > > Thanks Dejan for the reply. >> >> I've not tried with 5.2 as yet, but I wanted to get a confirmation on the >> issue before I try pushing the new version to our servers (that is little >> lengthy process). I looked at the 5.2 source code and I suspect the >> problem >> is still there. >> >> I'm surprised to see that others are not running into any issues with it, >> may be there is something wrong with my topology and setup. Does the >> following setup look right? >> >> 1. We have a bunch of applications posting messages to a local >> (localhost) AMQ. (We have several boxes like this) >> 2. We setup a camel route to delivery the messages to a central AMQ host >> with durable subscription. (There is only one box like this) >> >> ---------------------------------------------------------------- >> <camelContext> >> <route> >> <from >> >> uri="LOCALMQ:topic:Topic1?clientId=prod1-Topic1&durableSubscriptionName=prod1-Topic1&subscriptionDurable=true"/> >> <to uri="CENTRALMQ:topic:Topic1"/> >> </route> >> ...... Few other routes >> </camelContext> >> >> <bean id="LOCALMQ" class="org.apache.camel.component.jms.JmsComponent"> >> <property name="connectionFactory"> >> <bean class="org.apache.activemq.ActiveMQConnectionFactory"> >> <property name="brokerURL" >> value="vm://LOCALMQ?broker.persistent=false" /> >> </bean> >> </property> >> </bean> >> <bean id="CENTRALMQ" >> class="org.apache.camel.component.jms.JmsComponent"> >> <property name="connectionFactory"> >> <bean class="org.apache.activemq.ActiveMQConnectionFactory"> >> <property name="brokerURL" value="failover://(tcp:// >> 10.87.129.196:61616,tcp://10.87.129.196:61616)?initialReconnectDelay=100" >> /> >> </bean> >> </property> >> </bean> >> ----------------------------------------- >> >> The main change compared with other config I saw is, we are using failover >> with two end points that are same, basically with this model we were able >> to >> achieve retries between LOCALMQ and CENTRALMQ if there were any connection >> problems. We need retries but not really failover (i.e, send to secondary >> if >> primary were down), as messages would still be there in LOCALMQ if there >> were some connectivity problems. >> >> Is there any other way to achieve retries without using "failover >> transport"? >> >> thanks for your time. >> >> cheers >> - mdasari >> >> On Fri, Apr 3, 2009 at 12:36 AM, Dejan Bosanac <de...@nighttale.net> >> wrote: >> >> Hi, >>> >>> did you try 5.2.0 version? Probably some of those issues are already >>> addressed. >>> >>> Cheers >>> -- >>> Dejan Bosanac >>> >>> Open Source Integration - http://fusesource.com/ >>> ActiveMQ in Action - http://www.manning.com/snyder/ >>> Blog - http://www.nighttale.net >>> >>> >>> On Thu, Apr 2, 2009 at 6:47 PM, mdasari <mdas...@gmail.com> wrote: >>> >>> Hi, >>>> >>>> We are using AMQ 5.1.0 on some of our servers. We noticed that (on few >>>> servers) after a while the AMQ failover transport stops working thus >>>> >>> making >>> >>>> messages to be not delivered. (from a producer AMQ server box to a >>>> >>> central >>> >>>> consumer AMQ server box through camel) >>>> >>>> -------------------------------------------------------------- >>>> The following is the data from our log files: >>>> -------------------------------------------------------------- >>>> INFO | jvm 1 | 2009/03/16 21:25:42 | DEBUG FailoverTransport >>>> - Connection established >>>> INFO | jvm 1 | 2009/03/16 21:25:42 | INFO FailoverTransport >>>> - Successfully connected to tcp://10.87.129.196:61616 >>>> INFO | jvm 1 | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2 >>>> - Executing callback on JMS Session: ActiveMQSession >>>> {id=ID:LOCALMQ-3675-1236961500048-2:218:1,started=false} >>>> INFO | jvm 1 | 2009/03/16 21:25:43 | DEBUG JmsProducer >>>> - Endpoint[centralMQ:topic:Topic1] sending JMS message: >>>> >>> ActiveMQTextMessage >>> >>>> {...} >>>> INFO | jvm 1 | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2 >>>> - Sending created message: ActiveMQTextMessage {...} >>>> INFO | jvm 1 | 2009/03/16 21:25:43 | DEBUG ActiveMQSession >>>> - ID:LOCALMQ-3675-1236961500048-2:218:1 sending message: >>>> ActiveMQTextMessage >>>> {...} >>>> INFO | jvm 1 | 2009/03/16 21:25:43 | DEBUG FailoverTransport >>>> - Stopped. >>>> INFO | jvm 1 | 2009/03/16 21:25:43 | DEBUG TcpTransport >>>> - Stopping transport tcp:///10.87.129.196:61616 >>>> INFO | jvm 1 | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter >>>> - Checkpoint started. >>>> INFO | jvm 1 | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter >>>> - Checkpoint done. >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG ActiveMQMessageConsumer >>>> - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message: >>>> MessageDispatch >>>> {...} >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG EndpointMessageListener >>>> - Endpoint[localMQ:topic:Topic1?clientId=...&subscriptionDurable=true] >>>> receiving JMS message: ActiveMQTextMessage {...} >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG FailoverTransport >>>> - Waking up reconnect task >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG FailoverTransport >>>> - Started. >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG FailoverTransport >>>> - Waking up reconnect task >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG FailoverTransport >>>> - Attempting connect to: tcp://10.87.129.196:61616 >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator >>>> - Sending: WireFormatInfo { version=3, properties={CacheSize=1024, >>>> CacheEnabled=true, SizePrefixDisabled=false, >>>> MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true, >>>> MaxInactivityDuration=30000, TightEncodingEnabled=true, >>>> StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]} >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator >>>> - Received WireFormat: WireFormatInfo { version=3, >>>> properties={CacheSize=1024, CacheEnabled=true, SizePrefixDisabled=false, >>>> MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true, >>>> MaxInactivityDuration=30000, TightEncodingEnabled=true, >>>> StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]} >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator >>>> - tcp:///10.87.129.196:61616 before negotiation: >>>> >>> OpenWireFormat{version=3, >>> >>>> cacheEnabled=false, stackTraceEnabled=false, tightEncodingEnabled=false, >>>> sizePrefixDisabled=false} >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator >>>> - tcp:///10.87.129.196:61616 after negotiation: >>>> >>> OpenWireFormat{version=3, >>> >>>> cacheEnabled=true, stackTraceEnabled=true, tightEncodingEnabled=true, >>>> sizePrefixDisabled=false} >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG FailoverTransport >>>> - Connection established >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | INFO FailoverTransport >>>> - Successfully connected to tcp://10.87.129.196:61616 >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2 >>>> - Executing callback on JMS Session: ActiveMQSession >>>> {id=ID:LOCALMQ-3675-1236961500048-2:219:1,started=false} >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG JmsProducer >>>> - Endpoint[centralMQ:topic:Topic1] sending JMS message: >>>> >>> ActiveMQTextMessage >>> >>>> {...} >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2 >>>> - Sending created message: ActiveMQTextMessage {...} >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG ActiveMQSession >>>> - ID:LOCALMQ-3675-1236961500048-2:219:1 sending message: >>>> ActiveMQTextMessage >>>> {...} >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG FailoverTransport >>>> - Stopped. >>>> INFO | jvm 1 | 2009/03/16 21:26:13 | DEBUG TcpTransport >>>> - Stopping transport tcp:///10.87.129.196:61616 >>>> INFO | jvm 1 | 2009/03/16 21:26:14 | DEBUG ActiveMQMessageConsumer >>>> - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message: >>>> MessageDispatch >>>> {...} >>>> INFO | jvm 1 | 2009/03/16 21:26:14 | DEBUG EndpointMessageListener >>>> - Endpoint[localmq:topic:Topic1?clientId=...&subscriptionDurable=true] >>>> receiving JMS message: ActiveMQTextMessage {...} >>>> INFO | jvm 1 | 2009/03/16 21:26:15 | DEBUG FailoverTransport >>>> - Waiting 10 ms before attempting connection. >>>> INFO | jvm 1 | 2009/03/16 21:26:15 | Exception in thread "ActiveMQ >>>> Failover Worker: 1889455" java.lang.NullPointerException >>>> INFO | jvm 1 | 2009/03/16 21:26:15 | at >>>> >>>> >>>> org.apache.activemq.transport.failover.FailoverTransport$2.iterate(FailoverTransport.java:124) >>> >>>> INFO | jvm 1 | 2009/03/16 21:26:15 | at >>>> >>>> >>>> org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98) >>> >>>> INFO | jvm 1 | 2009/03/16 21:26:15 | at >>>> >>>> >>>> org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36) >>> >>>> INFO | jvm 1 | 2009/03/16 21:26:15 | DEBUG FailoverTransport >>>> - Waking up reconnect task >>>> INFO | jvm 1 | 2009/03/16 21:26:15 | DEBUG FailoverTransport >>>> - Started. >>>> INFO | jvm 1 | 2009/03/16 21:26:15 | DEBUG FailoverTransport >>>> - Waking up reconnect task >>>> INFO | jvm 1 | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter >>>> - Checkpoint started. >>>> INFO | jvm 1 | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter >>>> - Checkpoint done. >>>> INFO | jvm 1 | 2009/03/16 21:28:00 | DEBUG AMQPersistenceAdapter >>>> - Checkpoint started. >>>> --------------------------------------------- >>>> >>>> >>>> Basically, it was able to deliver a message (and few more prior to that >>>> time >>>> period), but for another message that is very close (in time) to the >>>> previous message it is running into a NullPointerException, after that >>>> it >>>> stops functioning totally. >>>> >>>> I took a brief look at the FailoverTransport.java code, I'm not an >>>> expert >>>> on >>>> the AMQ code, but I suspect that FailoverTransport.java reconnectTask >>>> member >>>> variable is attempted to be used by the task-runner thread before it was >>>> completely initialized (basically race conditions without proper >>>> synchronization) >>>> >>>> I can provide more details on our network topology if it is required. I >>>> searched around but didn't find any related issues or bugs. Does anyone >>>> know >>>> if this is a known issue, and which version this is going to be >>>> >>> addressed. >>> >>>> If not I'll open a JIRA. >>>> >>>> Appreciate your help. >>>> >>>> cheers >>>> - mdasari >>>> >>>> >>>> -- >>>> View this message in context: >>>> >>>> >>> http://www.nabble.com/FailoverTransport-stops-working-after-a-while-tp22851122p22851122.html >>> >>>> Sent from the ActiveMQ - User mailing list archive at Nabble.com. >>>> >>>> >>>>