Bryan,

Thanks for your trouble-shooting report.  Looks like you guys were trying
everything.  I logged a JIRA (AMQNET-114) to enable the KeepAlive feature in
NMS.  Even though you were able to solve your issue with a work-around, you
shouldn't have to do that, and the next person may not be able to do what
you did.

-Jim


Bryan Murphy-4 wrote:
> 
> We basically run a server here in our local office behind a firewall, and
> the rest of our stuff out on Amazon's EC2 cloud.  We suspect there were
> issues with NAT timeouts and half dead TCP connections.
> The specific behaviors we saw using NMS manifested themselves in the
> following ways:
> 
> 1. Client blocked on TCP connection waiting for messages, server does not
> think client is connected anymore.
> 
> 2. Client blocked on TCP connection, server reports *multiple* listeners
> for
> a queue that should only have one listener (the number changes over time,
> tended to tick upwards, and then to downwards, probably after the server
> timed out a dead tcp connection, sometimes saw a listener count upwards of
> 9
> or 10 when there should only be 1).
> 
> 3. Clients do not appear to always re-establish connection to server once
> connection is dead.  Frequently had to restart clients, occasionally had
> to
> restart server.
> 
> 4. Message queues that were idle for long periods at a time exhibited
> problematic behavior.  Messages queues that were active remained available
> (a huge indicator what was going on after fixing #5).
> 
> 5. Hitting ^C to kill our application and not handling break to properly
> close connections caused behaviors very similar to what we were eventually
> seeing with our TCP connections.  This, of course, made the issue that
> much
> more confusing and difficult to debug since not all communication problems
> were rooted at the network layer and the results were at least initially
> maddeningly inconsistent.
> 
> We experimented with more aggressive request timeouts on the transport
> layer/session/connection (even modified the driver to ensure these were
> getting set), setting up static routes, opening up firewall ports and
> playing with the TCP timeouts (at least on our end, we have no control on
> the Amazon side).  We tried prefetch size of one and tried to enable the
> keep alive but never figured out how to do it.  The only solution that
> worked was the ActiveMQ to ActiveMQ bridge, and I suspect some of that may
> have to do with that we were never able to get keep alives working and we
> have no control over fine-grained NAT settings on the Amazon side.
> 
> Bryan
> 
> 
> On Tue, Sep 9, 2008 at 10:09 AM, James Strachan
> <[EMAIL PROTECTED]>wrote:
> 
>> Maybe the WAN is dropping connections; we have failover in Java; am
>> not sure we've added that to NMS yet have we?
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/ActiveMQ%2BNMS%2BTCP-Connection-Problems-tp19321592p19398051.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Reply via email to