Re: ActiveMQ+NMS+TCP Connection Problems

Rob Davies Tue, 09 Sep 2008 23:10:36 -0700

Awesome Jim!

On 10 Sep 2008, at 06:48, Jim Gomes wrote:

FYI, the NMS trunk now has the keep alive support implemented. Youcan turnit on with the URI parameter "wireFormat.MaxInactivityDuration=nnnn"and"wireFormat.MaxInactivityDurationInitialDelay=nnnn" where 'n' equalsthe
number of milliseconds.  The initial delay option is optional and not
required to be used at the same time. It should operate just likethe Javaclient. I observed that the server will send a KeepAliveInfocommand to theclient periodically. The client then responds back. This shouldkeep the
socket connection alive even when no messages are flowing.  I would be
willing to bet that this is what the two ActiveMQ servers are doingto each
other, which is why that solution worked for you.

Best,
Jim
On Tue, Sep 9, 2008 at 8:32 AM, Bryan Murphy <[EMAIL PROTECTED]>wrote:
We basically run a server here in our local office behind afirewall, andthe rest of our stuff out on Amazon's EC2 cloud. We suspect therewere
issues with NAT timeouts and half dead TCP connections.
The specific behaviors we saw using NMS manifested themselves in the
following ways:
1. Client blocked on TCP connection waiting for messages, serverdoes not
think client is connected anymore.
2. Client blocked on TCP connection, server reports *multiple*listeners
for
a queue that should only have one listener (the number changes overtime,tended to tick upwards, and then to downwards, probably after theservertimed out a dead tcp connection, sometimes saw a listener countupwards of
9
or 10 when there should only be 1).
3. Clients do not appear to always re-establish connection toserver onceconnection is dead. Frequently had to restart clients,occasionally had to
restart server.

4. Message queues that were idle for long periods at a time exhibited
problematic behavior. Messages queues that were active remainedavailable
(a huge indicator what was going on after fixing #5).
5. Hitting ^C to kill our application and not handling break toproperlyclose connections caused behaviors very similar to what we wereeventuallyseeing with our TCP connections. This, of course, made the issuethat muchmore confusing and difficult to debug since not all communicationproblemswere rooted at the network layer and the results were at leastinitially
maddeningly inconsistent.
We experimented with more aggressive request timeouts on thetransportlayer/session/connection (even modified the driver to ensure thesewere
getting set), setting up static routes, opening up firewall ports and
playing with the TCP timeouts (at least on our end, we have nocontrol onthe Amazon side). We tried prefetch size of one and tried toenable thekeep alive but never figured out how to do it. The only solutionthatworked was the ActiveMQ to ActiveMQ bridge, and I suspect some ofthat mayhave to do with that we were never able to get keep alives workingand we
have no control over fine-grained NAT settings on the Amazon side.

Bryan


On Tue, Sep 9, 2008 at 10:09 AM, James Strachan <[EMAIL PROTECTED]
wrote:
Maybe the WAN is dropping connections; we have failover in Java; am
not sure we've added that to NMS yet have we?

2008/9/9 Jim Gomes <[EMAIL PROTECTED]>:
Hi Bryan,
That's interesting. I wonder where the problem is with ActiveMQ=> NMSconnection. Without knowing your exact network topology, I can'tpoint
to
where the problem is.  All I can do is speak to my experience and I
have
been able to keep connections alive for a very long time without
errors,
both with high- and low-activity, even going over what my
infrastructure
team has told me is a WAN connection.

Best,
Jim
On Tue, Sep 9, 2008 at 7:35 AM, Bryan Murphy<[EMAIL PROTECTED]>
wrote:
Thanks for the info. I suspected that's what the timeout meant,but
you
never really know until you ask..
Anyway, we finally solved our issue.  We setup two instances of
ActiveMQ
in
the two data centers to forward messages back and forth betweeneach
other.
This is working much better for us.  It seems the ActiveMQ to
ActiveMQ
communication is a bit more robust than the ActiveMQ to Apache.NMS
communication (at least when running over a WAN).

Bryan
On Mon, Sep 8, 2008 at 2:49 PM, Jim Gomes <[EMAIL PROTECTED]>wrote:
Hi Bryan,
I can't answer all of your questions, yet. But I can answersome of
them,
anyway.

1. As far as the ResponseTimeout property goes, that is used for
network
timeouts.  It's not a JMS timeout value like TimeToLive.  The
ResponseTimeout is used by the client to wait for a response from
the
broker.  Since a network call is inherently a blocking operation
(send
request, wait for response), if we never receive a responsefrom a
dead/hung
broker, the client will hang as well.  The ResponseTimeout lets
client
abort
waiting for the response from the broker.  This can be set to
whatever
performance constraints your application requires.  In a WAN
environment,
this might be set to something fairly high where there is a lotof
latency
in network round-trips. The socket connection is not dropped.The
client
simply stops waiting for the broker to respond and goes into its
error-handling code for a non-response.
2. I see the marshalling code for the KeepAliveInfo, but likeyou I
don't
see how this is turned on or controlled from the client-side.This
would
need more investigation to see if it is enabled via a URIparameter,
or
if
new code needs to be written to enable its use.
3. Can't answer the server-side socket issue. Don't know thatcode.
--
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com

Re: ActiveMQ+NMS+TCP Connection Problems

Reply via email to