Awesome Jim!

On 10 Sep 2008, at 06:48, Jim Gomes wrote:

FYI, the NMS trunk now has the keep alive support implemented. You can turn it on with the URI parameter "wireFormat.MaxInactivityDuration=nnnn" and "wireFormat.MaxInactivityDurationInitialDelay=nnnn" where 'n' equals the
number of milliseconds.  The initial delay option is optional and not
required to be used at the same time. It should operate just like the Java client. I observed that the server will send a KeepAliveInfo command to the client periodically. The client then responds back. This should keep the
socket connection alive even when no messages are flowing.  I would be
willing to bet that this is what the two ActiveMQ servers are doing to each
other, which is why that solution worked for you.

Best,
Jim

On Tue, Sep 9, 2008 at 8:32 AM, Bryan Murphy <[EMAIL PROTECTED]> wrote:

We basically run a server here in our local office behind a firewall, and the rest of our stuff out on Amazon's EC2 cloud. We suspect there were
issues with NAT timeouts and half dead TCP connections.
The specific behaviors we saw using NMS manifested themselves in the
following ways:

1. Client blocked on TCP connection waiting for messages, server does not
think client is connected anymore.

2. Client blocked on TCP connection, server reports *multiple* listeners
for
a queue that should only have one listener (the number changes over time, tended to tick upwards, and then to downwards, probably after the server timed out a dead tcp connection, sometimes saw a listener count upwards of
9
or 10 when there should only be 1).

3. Clients do not appear to always re-establish connection to server once connection is dead. Frequently had to restart clients, occasionally had to
restart server.

4. Message queues that were idle for long periods at a time exhibited
problematic behavior. Messages queues that were active remained available
(a huge indicator what was going on after fixing #5).

5. Hitting ^C to kill our application and not handling break to properly close connections caused behaviors very similar to what we were eventually seeing with our TCP connections. This, of course, made the issue that much more confusing and difficult to debug since not all communication problems were rooted at the network layer and the results were at least initially
maddeningly inconsistent.

We experimented with more aggressive request timeouts on the transport layer/session/connection (even modified the driver to ensure these were
getting set), setting up static routes, opening up firewall ports and
playing with the TCP timeouts (at least on our end, we have no control on the Amazon side). We tried prefetch size of one and tried to enable the keep alive but never figured out how to do it. The only solution that worked was the ActiveMQ to ActiveMQ bridge, and I suspect some of that may have to do with that we were never able to get keep alives working and we
have no control over fine-grained NAT settings on the Amazon side.

Bryan


On Tue, Sep 9, 2008 at 10:09 AM, James Strachan <[EMAIL PROTECTED]
wrote:

Maybe the WAN is dropping connections; we have failover in Java; am
not sure we've added that to NMS yet have we?

2008/9/9 Jim Gomes <[EMAIL PROTECTED]>:
Hi Bryan,
That's interesting. I wonder where the problem is with ActiveMQ => NMS connection. Without knowing your exact network topology, I can't point
to
where the problem is.  All I can do is speak to my experience and I
have
been able to keep connections alive for a very long time without
errors,
both with high- and low-activity, even going over what my
infrastructure
team has told me is a WAN connection.

Best,
Jim

On Tue, Sep 9, 2008 at 7:35 AM, Bryan Murphy <[EMAIL PROTECTED]>
wrote:

Thanks for the info. I suspected that's what the timeout meant, but
you
never really know until you ask..
Anyway, we finally solved our issue.  We setup two instances of
ActiveMQ
in
the two data centers to forward messages back and forth between each
other.
This is working much better for us.  It seems the ActiveMQ to
ActiveMQ
communication is a bit more robust than the ActiveMQ to Apache.NMS
communication (at least when running over a WAN).

Bryan

On Mon, Sep 8, 2008 at 2:49 PM, Jim Gomes <[EMAIL PROTECTED]> wrote:

Hi Bryan,
I can't answer all of your questions, yet. But I can answer some of
them,
anyway.

1. As far as the ResponseTimeout property goes, that is used for
network
timeouts.  It's not a JMS timeout value like TimeToLive.  The
ResponseTimeout is used by the client to wait for a response from
the
broker.  Since a network call is inherently a blocking operation
(send
request, wait for response), if we never receive a response from a
dead/hung
broker, the client will hang as well.  The ResponseTimeout lets
client
abort
waiting for the response from the broker.  This can be set to
whatever
performance constraints your application requires.  In a WAN
environment,
this might be set to something fairly high where there is a lot of
latency
in network round-trips. The socket connection is not dropped. The
client
simply stops waiting for the broker to respond and goes into its
error-handling code for a non-response.

2. I see the marshalling code for the KeepAliveInfo, but like you I
don't
see how this is turned on or controlled from the client-side. This
would
need more investigation to see if it is enabled via a URI parameter,
or
if
new code needs to be written to enable its use.

3. Can't answer the server-side socket issue. Don't know that code.







--
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com



Reply via email to