Re: Artemis high availability in Kubernetes

Gary Tully Fri, 11 Feb 2022 03:10:15 -0800

Hello,
the reconnect issue? How are your clients configured? Do they get
topology from the pair of brokers on kube?

--
On re-connection:

failover with the Artemis jms client will only occur between pairs.
It is restricted in that way to protect users of temp queues and
durable subs, b/c those resource will only be available on a replica.
Other clients (openwire or AMQP qpid jms) are not as protective by
default so will continue to try and reconnect.

In any event, from an application perspective, it is often best to
hide the JMS connection with something like camel-jms or spring jms
template. In that way, the error handling can be separately controlled
and isolated from protocol specifics. It is an extra level of
indirection with sensible defaults that can be tweaked as needed.
If the broker url is behind a proxy/loadblancer/firewall or dns or
some other mechanism that is broker topology agnostic, it can help.

--
On the replication vs single broker pod:

these are very different, with replication there are two copies of
your data, with a single pod there is only one copy.

- Single Pod:
Because kube does a good (if slow) job of auto restarting it makes
sense to leverage it to keep your single journal available. It is very
intuitive and simple.
If order is not important, cluster with a second broker and allow
client to use either.
If order is important, consider using multiple brokers and
partitioning data across them[1]. In that way, you can always be
partially available.

- Replica Pods
If you need two copies of your data, then you need replication, and
this is more involved.
With replication, there is synchronous copy. There are two overheads
to consider.
First, at runtime the broker responds when the *replica* gets the
message, which is usually trivial b/c of a fast network; but it is
important to be aware of.
Second, is the overhead of coordination on activation. Because there
are two copies of the journal, only one can be active at a time.
- as part of [2] we introduced coordinated activation via zk. I think
we probably need a kube version of this that layers over a crd or some
other etcd primitive. An operator could server the role of an oracle
here also. I note that the operator sdk provides a leader election
primitive [3] that may be perfect. The only issue may be permissions.
There will still be some necessary delay, lease expiration time etc..
but it should be possible to make this time limited and bounded. There
is a bit of work to do here.
In short, having a replica pod should reduce time to recover but at a cost.

After putting these thoughts down, I think the short answer is yes, in
kubernetes a single pod is currently best. It was a good question :-)

feedback welcome!
I hope this helps,
gary.

[1]
https://activemq.apache.org/components/artemis/documentation/latest/broker-balancers.html#data-gravity
[2]
https://activemq.apache.org/components/artemis/documentation/latest/ha.html#Pluggable-Quorum-Vote-Replication-configurations
[3] https://github.com/operator-framework/operator-sdk/issues/784

Re: Artemis high availability in Kubernetes

Reply via email to