Bryan, Thanks for your trouble-shooting report. Looks like you guys were trying everything. I logged a JIRA (AMQNET-114) to enable the KeepAlive feature in NMS. Even though you were able to solve your issue with a work-around, you shouldn't have to do that, and the next person may not be able to do what you did.
-Jim Bryan Murphy-4 wrote: > > We basically run a server here in our local office behind a firewall, and > the rest of our stuff out on Amazon's EC2 cloud. We suspect there were > issues with NAT timeouts and half dead TCP connections. > The specific behaviors we saw using NMS manifested themselves in the > following ways: > > 1. Client blocked on TCP connection waiting for messages, server does not > think client is connected anymore. > > 2. Client blocked on TCP connection, server reports *multiple* listeners > for > a queue that should only have one listener (the number changes over time, > tended to tick upwards, and then to downwards, probably after the server > timed out a dead tcp connection, sometimes saw a listener count upwards of > 9 > or 10 when there should only be 1). > > 3. Clients do not appear to always re-establish connection to server once > connection is dead. Frequently had to restart clients, occasionally had > to > restart server. > > 4. Message queues that were idle for long periods at a time exhibited > problematic behavior. Messages queues that were active remained available > (a huge indicator what was going on after fixing #5). > > 5. Hitting ^C to kill our application and not handling break to properly > close connections caused behaviors very similar to what we were eventually > seeing with our TCP connections. This, of course, made the issue that > much > more confusing and difficult to debug since not all communication problems > were rooted at the network layer and the results were at least initially > maddeningly inconsistent. > > We experimented with more aggressive request timeouts on the transport > layer/session/connection (even modified the driver to ensure these were > getting set), setting up static routes, opening up firewall ports and > playing with the TCP timeouts (at least on our end, we have no control on > the Amazon side). We tried prefetch size of one and tried to enable the > keep alive but never figured out how to do it. The only solution that > worked was the ActiveMQ to ActiveMQ bridge, and I suspect some of that may > have to do with that we were never able to get keep alives working and we > have no control over fine-grained NAT settings on the Amazon side. > > Bryan > > > On Tue, Sep 9, 2008 at 10:09 AM, James Strachan > <[EMAIL PROTECTED]>wrote: > >> Maybe the WAN is dropping connections; we have failover in Java; am >> not sure we've added that to NMS yet have we? >> > > -- View this message in context: http://www.nabble.com/ActiveMQ%2BNMS%2BTCP-Connection-Problems-tp19321592p19398051.html Sent from the ActiveMQ - User mailing list archive at Nabble.com.