Hello, the reconnect issue? How are your clients configured? Do they get topology from the pair of brokers on kube?
-- On re-connection: failover with the Artemis jms client will only occur between pairs. It is restricted in that way to protect users of temp queues and durable subs, b/c those resource will only be available on a replica. Other clients (openwire or AMQP qpid jms) are not as protective by default so will continue to try and reconnect. In any event, from an application perspective, it is often best to hide the JMS connection with something like camel-jms or spring jms template. In that way, the error handling can be separately controlled and isolated from protocol specifics. It is an extra level of indirection with sensible defaults that can be tweaked as needed. If the broker url is behind a proxy/loadblancer/firewall or dns or some other mechanism that is broker topology agnostic, it can help. -- On the replication vs single broker pod: these are very different, with replication there are two copies of your data, with a single pod there is only one copy. - Single Pod: Because kube does a good (if slow) job of auto restarting it makes sense to leverage it to keep your single journal available. It is very intuitive and simple. If order is not important, cluster with a second broker and allow client to use either. If order is important, consider using multiple brokers and partitioning data across them[1]. In that way, you can always be partially available. - Replica Pods If you need two copies of your data, then you need replication, and this is more involved. With replication, there is synchronous copy. There are two overheads to consider. First, at runtime the broker responds when the *replica* gets the message, which is usually trivial b/c of a fast network; but it is important to be aware of. Second, is the overhead of coordination on activation. Because there are two copies of the journal, only one can be active at a time. - as part of [2] we introduced coordinated activation via zk. I think we probably need a kube version of this that layers over a crd or some other etcd primitive. An operator could server the role of an oracle here also. I note that the operator sdk provides a leader election primitive [3] that may be perfect. The only issue may be permissions. There will still be some necessary delay, lease expiration time etc.. but it should be possible to make this time limited and bounded. There is a bit of work to do here. In short, having a replica pod should reduce time to recover but at a cost. After putting these thoughts down, I think the short answer is yes, in kubernetes a single pod is currently best. It was a good question :-) feedback welcome! I hope this helps, gary. [1] https://activemq.apache.org/components/artemis/documentation/latest/broker-balancers.html#data-gravity [2] https://activemq.apache.org/components/artemis/documentation/latest/ha.html#Pluggable-Quorum-Vote-Replication-configurations [3] https://github.com/operator-framework/operator-sdk/issues/784