Hi All, We have an ActiveMQ Artemis instance with primary/backup replication. Sometimes both servers fail to replicate after its restart. I have examined the logs of both servers, and mentioned a strange warnings in the log of first server right after it begins to stop:
1. First server is stopping, and at this time the second server is still alive and becomes primary. But there is a warning that primary node has failed to connect to the backup node. Is it ok? 2024-07-24 03:03:31,918 WARN [org.apache.activemq.artemis.core.server] AMQ222092: Connection to the backup node failed, removing replication now org.apache.activemq.artemis.api.core.ActiveMQRemoteDisconnectException: null at org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl.connectionDestroyed(RemotingServiceImpl.java:607) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor$Listener.connectionDestroyed(NettyAcceptor.java:974) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.remoting.impl.netty.ActiveMQChannelHandler.lambda$channelInactive$0(ActiveMQChannelHandler.java:89) ~[artemis-core-client-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:57) ~[artemis-commons-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:32) ~[artemis-commons-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:68) ~[artemis-commons-2.35.0.jar:2.35.0] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.35.0.jar:2.35.0] 2. NullPointerException in MessageReferenceImpl.java:50 (this warning repeats two times). 2024-07-24 03:03:31,919 WARN [org.apache.activemq.artemis.core.server] AMQ222246: Unable to rollback on close java.lang.NullPointerException: Cannot invoke "org.apache.activemq.artemis.core.server.MessageReference.getSequence()" because "o1" is null at org.apache.activemq.artemis.core.server.impl.MessageReferenceImpl$MessageReferenceComparatorSequence.compare(MessageReferenceImpl.java:50) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.MessageReferenceImpl$MessageReferenceComparatorSequence.compare(MessageReferenceImpl.java:42) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.utils.collections.LinkedListImpl.scanLeft(LinkedListImpl.java:286) ~[artemis-commons-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.utils.collections.LinkedListImpl.addSorted(LinkedListImpl.java:248) ~[artemis-commons-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.utils.collections.PriorityLinkedListImpl.addSorted(PriorityLinkedListImpl.java:93) ~[artemis-commons-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.QueueImpl.internalAddSorted(QueueImpl.java:3069) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.QueueImpl.addSorted(QueueImpl.java:1180) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.QueueImpl.addSorted(QueueImpl.java:1215) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.QueueImpl.postRollback(QueueImpl.java:4247) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.RefsOperation.afterRollback(RefsOperation.java:131) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.afterRollback(TransactionImpl.java:600) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl$4.done(TransactionImpl.java:436) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl.executeOnCompletion(OperationContextImpl.java:226) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl.executeOnCompletion(OperationContextImpl.java:168) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.afterCompleteOperations(AbstractJournalStorageManager.java:352) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.internalRollback(TransactionImpl.java:427) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.rollback(TransactionImpl.java:400) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.cancelAndRollback(ServerSessionImpl.java:2287) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.doRollback(ServerSessionImpl.java:2262) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.rollback(ServerSessionImpl.java:1375) ~[artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.doClose(ServerSessionImpl.java:431) [artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl$1.callDoClose(ServerSessionImpl.java:1752) [artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl$1.done(ServerSessionImpl.java:1747) [artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl.executeOnCompletion(OperationContextImpl.java:226) [artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl.executeOnCompletion(OperationContextImpl.java:168) [artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.close(ServerSessionImpl.java:1739) [artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.close(ServerSessionImpl.java:1723) [artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.protocol.openwire.amq.AMQSession.close(AMQSession.java:559) [artemis-openwire-protocol-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.protocol.openwire.OpenWireConnection.disconnect(OpenWireConnection.java:720) [artemis-openwire-protocol-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.protocol.openwire.OpenWireConnection.disconnect(OpenWireConnection.java:755) [artemis-openwire-protocol-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl.freeze(RemotingServiceImpl.java:375) [artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.ReplicationPrimaryActivation.freezeConnections(ReplicationPrimaryActivation.java:106) [artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.freezeConnections(ActiveMQServerImpl.java:1579) [artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1352) [artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1271) [artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1255) [artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1062) [artemis-server-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.integration.FileBroker.stop(FileBroker.java:94) [artemis-cli-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.cli.commands.Run.stop(Run.java:232) [artemis-cli-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.cli.commands.Run$3.run(Run.java:223) [artemis-cli-2.35.0.jar:2.35.0] I'm not sure that it is related to the replication issues, but if it is a bug, I can create a Jira. Unfortunately I don't know how to reproduce this warning in isolated environment. 3. During replication of the first node which was restarted as backup, some errors a written to the log: 2024-07-24 03:05:01,343 ERROR [org.apache.activemq.artemis.core.journal.impl.JournalImpl] Exception during appendAddRecord: java.lang.NullPointerException: Cannot invoke "org.apache.activemq.artemis.core.journal.impl.JournalFile.getFile()" because "this.currentFile" is null at org.apache.activemq.artemis.core.journal.impl.JournalImpl.switchFileIfNecessary(JournalImpl.java:3523) ~[artemis-journal-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:3251) ~[artemis-journal-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.journal.impl.JournalImpl$1.run(JournalImpl.java:938) ~[artemis-journal-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:57) ~[artemis-commons-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:32) ~[artemis-commons-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:68) ~[artemis-commons-2.35.0.jar:2.35.0] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.35.0.jar:2.35.0] 2024-07-24 03:05:01,360 WARN [org.apache.activemq.artemis.journal] AMQ142021: Error on IO callback, Cannot invoke "org.apache.activemq.artemis.core.journal.impl.JournalFile.getFile()" because "this.currentFile" is null 2024-07-24 03:05:01,361 WARN [org.apache.activemq.artemis.journal] AMQ142021: Error on IO callback, Cannot invoke "org.apache.activemq.artemis.core.journal.impl.JournalFile.getFile()" because "this.currentFile" is null 2024-07-24 03:05:01,361 ERROR [org.apache.activemq.artemis.core.journal.impl.JournalImpl] Exception during appendAddRecord: java.lang.NullPointerException: Cannot invoke "org.apache.activemq.artemis.core.journal.impl.JournalFile.getFile()" because "this.currentFile" is null at org.apache.activemq.artemis.core.journal.impl.JournalImpl.switchFileIfNecessary(JournalImpl.java:3523) ~[artemis-journal-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:3251) ~[artemis-journal-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.core.journal.impl.JournalImpl$1.run(JournalImpl.java:938) ~[artemis-journal-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:57) ~[artemis-commons-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:32) ~[artemis-commons-2.35.0.jar:2.35.0] at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:68) ~[artemis-commons-2.35.0.jar:2.35.0] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.35.0.jar:2.35.0] 2024-07-24 03:05:15,383 INFO [org.apache.activemq.artemis.core.server] AMQ221024: Backup server ActiveMQServerImpl::name=amq-test is synchronized with primary server, nodeID=327b04af-48c3-11ef-a9c3-0050568560e3. 2024-07-24 03:05:15,390 INFO [org.apache.activemq.artemis.core.server] AMQ221031: backup announced These errors happen very often on other servers when trying to replicate (with success or failure). I also did not reproduced it in isolated environment. 4. There is a warning which repeats several times: 2024-07-24 03:06:25,434 WARN [org.apache.activemq.artemis.core.server.impl.ReplicationPrimaryActivation] Timed out waiting for failback server activation with NodeID = 327b14af-48c3-11ef-a9c3-0050568560e3: and sequence > 5: after 4000ms And found that this timeout is hard-coded in ReplicationPrimaryActivation.java:69. I need to examine why this timeout was reached. Test environment can work on low-level resources and sometimes we may need to configure this timeout. -- Best regards, Aleksandr ----------------------------------- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient any use, distribution, copying or disclosure is strictly prohibited. If you have received this message in error, please notify the sender immediately either by telephone or by e-mail and delete this message and any attachment from your system. Correspondence via e-mail is for information purposes only. AO Raiffeisenbank neither makes nor accepts legally binding statements by e-mail unless otherwise agreed. ----------------------------------- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@activemq.apache.org For additional commands, e-mail: users-h...@activemq.apache.org For further information, visit: https://activemq.apache.org/contact