A cluster of Artemis nodes is in an unhealthy state and I need help understanding what happened and guidance on how to resolve.
We were redeploying a new configuration for an existing Artemis deployment (2.6.3). The only changes in configuration were additional queues, users, roles and passwords. The new deployment has been tested in multiple environments without issues. However, the issue begins before pushing the new configuration. In only one environment, when the master node was shut down we started seeing this warning message in the backup server logs. 15:08:25,454 WARN [org.apache.activemq.artemis.core.server] AMQ224091: Bridge ClusterConnectionBridge@3eb594aa [name=$.artemis.internal.sf.{cluster-name}.c1893571-c00a-11e7-b035-020820d1423f, queue=QueueImpl[name=$.artemis.internal.sf.{cluster-name}.c1893571-c00a-11e7-b035-020820d1423f, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=5dd20394-70dc-11e7-8a26-02082028f390], temp=false]@55325130 targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@3eb594aa [name=$.artemis.internal.sf.{cluster-name}.c1893571-c00a-11e7-b035-020820d1423f, queue=QueueImpl[name=$.artemis.internal.sf.{cluster-name}.c1893571-c00a-11e7-b035-020820d1423f, postOffice=PostOfficeImpl[server=ActiveMQServerImpl::serverUUID=5dd20394-70dc-11e7-8a26-02082028f390], temp=false]@55325130 targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=netty-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?enabledProtocols=TLSv1&port=61616&sslEnabled=true&host={host-name-of-master}], discoveryGroupConfiguration=null]]::ClusterConnection Impl@256430879[nodeUUID=5dd20394-70dc-11e7-8a26-02082028f390, connector=TransportConfiguration(name=netty-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?enabledProtocols=TLSv1&port=61616&sslEnabled=true&host={host-name-of-backup}, address=jms, server=ActiveMQServerImpl::serverUUID=5dd20394-70dc-11e7-8a26-02082028f390])) [initialConnectors=[TransportConfiguration(name=netty-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?enabledProtocols=TLSv1&port=61616&sslEnabled=true&host={host-name-of-master}], discoveryGroupConfiguration=null]] is unable to connect to destination. Retrying Master started up fine and was able to handle traffic, except it started to log the same warning message with the master and backup host names swapped. The issue did not resolve itself after backup deployment completed nor through restarts. I cannot find the uuid in the ClusterConnectionBridge queue name in prior logs dating back to the last deployment. The failover between master and backup still happened as expected through restarts. We are still able to handle traffic, but initial client connection creations have gone from a few ms to a few thousand ms. Can anyone help me understand what happened and how to resolve? -- Todd Zimnoch