A cluster of Artemis nodes is in an unhealthy state and I need help
understanding what happened and guidance on how to resolve.

We were redeploying a new configuration for an existing Artemis deployment
(2.6.3). The only changes in configuration were additional queues, users,
roles and passwords. The new deployment has been tested in multiple
environments without issues.

However, the issue begins before pushing the new configuration. In only one
environment, when the master node was shut down we started seeing this
warning message in the backup server logs.

15:08:25,454 WARN  [org.apache.activemq.artemis.core.server] AMQ224091:
Bridge ClusterConnectionBridge@3eb594aa
[name=$.artemis.internal.sf.{cluster-name}.c1893571-c00a-11e7-b035-020820d1423f,
queue=QueueImpl[name=$.artemis.internal.sf.{cluster-name}.c1893571-c00a-11e7-b035-020820d1423f,
postOffice=PostOfficeImpl
[server=ActiveMQServerImpl::serverUUID=5dd20394-70dc-11e7-8a26-02082028f390],
temp=false]@55325130 targetConnector=ServerLocatorImpl
(identity=(Cluster-connection-bridge::ClusterConnectionBridge@3eb594aa
[name=$.artemis.internal.sf.{cluster-name}.c1893571-c00a-11e7-b035-020820d1423f,
queue=QueueImpl[name=$.artemis.internal.sf.{cluster-name}.c1893571-c00a-11e7-b035-020820d1423f,
postOffice=PostOfficeImpl[server=ActiveMQServerImpl::serverUUID=5dd20394-70dc-11e7-8a26-02082028f390],
temp=false]@55325130 targetConnector=ServerLocatorImpl
[initialConnectors=[TransportConfiguration(name=netty-connector,
factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)
?enabledProtocols=TLSv1&port=61616&sslEnabled=true&host={host-name-of-master}],
discoveryGroupConfiguration=null]]::ClusterConnection
Impl@256430879[nodeUUID=5dd20394-70dc-11e7-8a26-02082028f390,
connector=TransportConfiguration(name=netty-connector,
factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)
?enabledProtocols=TLSv1&port=61616&sslEnabled=true&host={host-name-of-backup},
address=jms,
server=ActiveMQServerImpl::serverUUID=5dd20394-70dc-11e7-8a26-02082028f390]))
[initialConnectors=[TransportConfiguration(name=netty-connector,
factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)
?enabledProtocols=TLSv1&port=61616&sslEnabled=true&host={host-name-of-master}],
discoveryGroupConfiguration=null]] is unable to connect to destination.
Retrying

Master started up fine and was able to handle traffic, except it started to
log the same warning message with the master and backup host names swapped.

The issue did not resolve itself after backup deployment completed nor
through restarts. I cannot find the uuid in the ClusterConnectionBridge
queue name in prior logs dating back to the last deployment.

The failover between master and backup still happened as expected through
restarts. We are still able to handle traffic, but initial client
connection creations have gone from a few ms to a few thousand ms. Can
anyone help me understand what happened and how to resolve?

-- 
Todd Zimnoch

Reply via email to