Re: Correctly configuring a network of brokers

Tim Bain Tue, 10 Nov 2015 07:57:12 -0800

On Thu, Nov 5, 2015 at 2:02 PM, jahlborn <jahlb...@gmail.com> wrote:

> So I've spent some time digging around the internet and experimenting with
> our
> setup, trying to determine the best configuration for a network of brokers.
> There are a variety of things which seem to conspire against me: the
> multitude
> of configuration options present in activemq, options which seem to have no
> "optimal" setting (each choice has different pros and cons), and
> behavior/features which change over time such that old recommendations may
> now
> be irrelevant or no longer correct.  For reference, we are using a mesh
> network of brokers, where consumers may arrive at any broker in the
> network.
> We use topics, queues, and virtual queues.  We use explicit receive() calls
> as
> well as listeners.  We also utilize the "exclusive consumer" feature for
> some
> of our clustered consumers.  All messaging is currently durable.  Some
> relevant configuration bits (changes from the defaults).
>
> * using activemq 5.9.1
> * advisory support is enabled
> * PolicyEntry
> ** optimizedDispatch: true
> * Using ConditionalNetworkBridgeFilterFactory
> ** replayWhenNoConsumers: true
> ** replayDelay: 1000
> * NetworkConnector
> ** prefetchSize: 1
> ** duplex: false
> ** messageTTL: 9999
>
> Alright, now that we got through all the context, here's the meat of the
> question.  There are a bunch of configuration parameters (mostly focused in
> the NetworkConnector), and it's not at all clear to me if my current
> configuration is optimal.  For instance, although we had been using a
> configuration similar to the above for about a year now (same as above
> except
> that messageTTL was 1), we only recently discovered that our exclusive
> consumers can sometimes stop processing messages.  In certain cases, the
> consumer would end up bouncing around and the messages would end up getting
> stuck on one node.  Adding the messageTTL setting seems to fix the problem
> (is
> it the right fix...?).
>


If your problem was that you were getting stuck messages due to consumers
bouncing around the Network of Brokers (which is what it sounds like), then
yes, increasing the networkTTL (or the messageTTL, depending on your
topology) is the correct fix, in addition to the replayWhenNoConsumers=true
setting you said you had already set.

* NetworkConnector
> ** "dynamicOnly" - i've seen a couple of places mention enabling this and
> some
>    indication that it helps with scaling in a network of brokers (e.g.
> [3]).
>    The description in [1] also makes it sound like something i would want
> to
>    enable.  However, the value defaults to false, which seems to indicate
> that
>    there is a down-side to enabling it.  Why wouldn't i want to enable
> this?
>

One major difference here is that with this setting disabled (the default)
messages will stay on the broker to which they were first produced, while
with it enabled they will go to the broker to which the durable subscriber
was last connected (or at least, the last one in the route to the consumer,
if the broker to which the consumer was actually connected has gone down
since then).  There are at least two disadvantages to enabling it: 1) if
the producer connects to an embedded broker, then those messages go offline
when the producer goes offline and aren't available when the consumer
reconnects, 2) it only takes filling one broker's store before you Producer
Flow Control the producer (whereas with the default setting you have to
fill every broker along the last route to the consumer before PFC kicks
in), and 3) if you have a high-latency network link in the route from
producer to consumer, you delay traversing it until the consumer
reconnects, which means the consumer may experience more latency than it
would otherwise need to.  So as with so many of these settings, the best
configuration for you will depend on your situation.

Also, default values are often the default because that's what they were
when they were first introduced (to avoid breaking legacy configurations),
not necessarily because that's the setting that's recommended for all
users.  Default values do get changed when the old value is clearly not
appropriate and the benefits of a change outweigh the inconvenience to
legacy users, but when there's not a clear preference they usually get left
alone, which is a little confusing to new users.


> ** "decreaseNetworkConsumerPriority", "suppressDuplicateQueueSubscriptions"
>
-
>    these params both seem like "damned if you do, damned if you don't" type
>    parameters.  The first comment in [2] seems to imply that in order to
>    scale, you really want to enable these features so that producers prefer
>    pushing messages to local consumers (makes sense).  Yet, at the same
> time,
>    it seems that enabling this feature will _decrease_ scalability in that
> it
>    won't evenly distribute messages in the case when there are multiple
> active
>    consumers (we use clusters of consumers in some scenarios).  Also in
> [2],
>    there are some allusions to stuck messages if you don't enable this
>    feature.  Should i enable these parameters?
>

You're right that decreaseNetworkConsumerPriority is a bit of a "damned if
you do, damned if you don't" type parameter when you're trying to load
balance between local and network clients, but if you're trying to load
balance well, you should have small prefetch buffer sizes anyway.  If you
do, it really shouldn't matter which consumer gets prioritized; they'll
each be given a small number of messages and then there will be a large
backlog sitting on the broker, who will hand them out as the initial small
batches get worked off.  But the more important contribution of
decreaseNetworkConsumerPriority is that it favors more-direct routes over
less-direct ones through a network where multiple routes exist from a
producer to a consumer, and (less important but still positive) enabling it
seems likely to reduce the load on your broker network by having a message
pass through fewer brokers, so I'd definitely enable it given what you've
said about your concerns.

suppressDuplicateQueueSubscriptions disallows multiple routes to a given
consumer, but those routes are only a problem if
decreaseNetworkConsumerPriority isn't used (because with
decreaseNetworkConsumerPriority, you won't choose the non-optimal routes
unless the optimal one disappears due to a broker going down).  So if
you're using decreaseNetworkConsumerPriority (and it sounds like you
should), then I don't see a reason for you to use
suppressDuplicateQueueSubscription.  But it could be useful for anyone
who's not using that option.


> ** "networkTTL", "messageTTL", "consumerTTL" - until recently, we kept
> these
>    at the defaults (1).  However, we recently realized that we can end up
> with
>    stuck messages with these settings.  I've seen a couple of places which
>    recommend setting "networkTTL" to the number of brokers in the network
>    (e.g. [2]), or at least something > 1.  However, the recommendation for
>    "consumerTTL" on [1] is that this value should be 1 in a mesh network
> (and
>    setting the "networkTTL" will set the "consumerTTL" as well).
>    Additionally, [2] seems to imply that enabling
>    "suppressDuplicateQueueSubscriptions" acts like "networkTTL" is 1 for
> proxy
>    messages (unsure what this means?).  We ended up setting only the
>    "messageTTL" and this seemed to solve our immediate problem.  Unsure if
> it
>    will cause other problems...?
>

messageTTL controls how far actual messages are allowed to propagate before
they have to be consumed locally.  consumerTTL controls how far advisory
messages (which flow in the opposite direction as the actual messages, so
that brokers that receive a message know where to send it to) are allowed
to propagate.  networkTTL sets both values to the same thing; you should
use either networkTTL (if you're using the same value) or messageTTL &
consumerTTL (if you need different values), but don't mix them.  I'm not
aware of any problems caused by having a "too-large" consumerTTL/networkTTL
unless you have a non-homogenous NoBs where you want to keep messages
produced in one part from propagating into another part; if all you have is
a uniform mesh where each message is "allowed" anywhere if an appropriate
consumer needs that, then just use networkTTL and avoid the
confusion/hassle of splitting the configuration.

In a mesh (all brokers connected to each other), you only need a
consumerTTL of 1, because you can get the advisory message to every other
broker in one hop.  But in that same mesh, there's no guarantee that a
single hop will get you to the broker where the consumer is, because the
consumer might jump to another node in the mesh before consuming the
message, which would then require another forward.  So in a mesh with
decreaseNetworkConsumerPriorty you may need a messageTTL/networkTTL of 1 +
[MAX # FORWARDS] or greater, where [MAX # FORWARDS] is the worst-case
number of jumps a consumer might make between the time a message is
produced and the time it is consumed.  In your case you've chosen 9999, so
that allows 9998 consumer jumps, which should be more than adequate.


> ** "prefetchSize" - defaults to 1000, but I see recommendations that it
> should
>    be 1 for network connectors (e.g. [3]).  I think that in our initial
>    testing i saw bad things happen with this setting and got more even load
>    balancing by lowering it to 1.
>

As I mentioned above, setting a small prefetch size is important for load
balancing; if you allow a huge backlog of messages to buffer up for one
consumer, the other consumers can't work on them even if they're sitting
around idle.  I'd pick a value like 1, 3, 5, 10, etc.; something small
relative to the number of messages you're likely to have pending at any one
time.  (But note that the prefetch buffer can improve performance if you
have messages that take a variable amount of time to process and sometimes
the amount of time to process them is lower than the amount of time to
transfer them between your brokers or from the broker to the consumer, such
as with a high-latency network link.  This doesn't sound like your
situation, but it's yet another case where the right setting depends on
your situation.)


> I think that about summarizes my questions and confusion.  Any help would
> be
> appreciated!
>
> [1] http://activemq.apache.org/networks-of-brokers.html
> [2] https://issues.jboss.org/browse/MB-471
> [3]
>
> http://www.javabeat.net/deploying-activemq-for-large-numbers-of-concurrent-applications/
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Correctly-configuring-a-network-of-brokers-tp4703715.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: Correctly configuring a network of brokers

Reply via email to