Hello all, Has anyone managed to check this out? It seems to be a pretty major race condition issue with large message processing. Unfortunately I just realized that attachments get removed from e-mails, so here are relevant thread stacks...
Stacks of 2 *deadlocked *threads: "Thread-1074 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@3738bd7a)" - Thread t@11854 java.lang.Thread.State: BLOCKED at org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl.getEncodeSize(LargeServerMessageImpl.java:174) - waiting to lock <1eb68b32> (a org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl) owned by "Thread-25 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5@59ccdde4)" t@202 at org.apache.activemq.artemis.core.persistence.impl.journal.codec.LargeMessagePersister.getEncodeSize(LargeMessagePersister.java:59) at org.apache.activemq.artemis.core.persistence.impl.journal.codec.LargeMessagePersister.getEncodeSize(LargeMessagePersister.java:25) at org.apache.activemq.artemis.core.journal.impl.dataformat.JournalAddRecord.getEncodeSize(JournalAddRecord.java:79) at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:2785) at org.apache.activemq.artemis.core.journal.impl.JournalImpl.access$100(JournalImpl.java:91) at org.apache.activemq.artemis.core.journal.impl.JournalImpl$1.run(JournalImpl.java:843) at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$1194/0x0000000801ecd440.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) Locked ownable synchronizers: - locked <4c6aa3d9> (a java.util.concurrent.ThreadPoolExecutor$Worker) "Thread-25 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5@59ccdde4)" - Thread t@202 java.lang.Thread.State: WAITING at jdk.internal.misc.Unsafe.park(Native Method) - parking to wait for <7807bddf> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232) at org.apache.activemq.artemis.utils.SimpleFutureImpl.get(SimpleFutureImpl.java:62) at org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkKnownRecordID(JournalImpl.java:1073) at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:943) at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.confirmPendingLargeMessage(AbstractJournalStorageManager.java:360) at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.confirmLargeMessageSend(PostOfficeImpl.java:1308) - locked <1eb68b32> (a org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl) at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.processRoute(PostOfficeImpl.java:1254) at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.java:904) at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.java:799) at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.doSend(ServerSessionImpl.java:1801) - locked <3396ca5d> (a org.apache.activemq.artemis.core.server.impl.ServerSessionImpl) at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.sendContinuations(ServerSessionPacketHandler.java:993) - locked <67aa6e6> (a java.lang.Object) at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.slowPacketHandler(ServerSessionPacketHandler.java:314) at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.onMessagePacket(ServerSessionPacketHandler.java:287) at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler$$Lambda$1694/0x000000080219f840.onMessage(Unknown Source) at org.apache.activemq.artemis.utils.actors.Actor.doTask(Actor.java:33) at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$1194/0x0000000801ecd440.run(Unknown Source) at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$1194/0x0000000801ecd440.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) Locked ownable synchronizers: - locked <7fc2876c> (a java.util.concurrent.ThreadPoolExecutor$Worker) As described in previous emails, these two threads are locked like this: 1. *Thread t@11854* JournalImpl.appendAddRecord() -> appendExecutor (JournalImpl:946) is delayed for some reason 2. *Thread t@202* JournalImpl.appendDeleteRecord() is triggered because appendAdd is async-> takes lock on LargeMessage object (PostOfficeImpl:1305) and gets stuck on appendExecutor queue behind Thread-1 (JournalImpl:1058) 3. *Thread t@11854* JournalImpl.appendAddRecord() -> appendExecutor gets to the part where it needs the lock on same LargeMessage object (LargeServerMessageImpl:173), but it can't get it because Thread-2 keeps it So deadlock is: *Thread t@11854* is waiting for lock on LargeMessage object that will never be released by *Thread t@202* because it is waiting for processing on appendExecutor queue behind *Thread t@11854* This eventually blocks all other producer threads, so *no messages can be produced to the queue*: "flow-action-completed-355" - Thread t@12775 java.lang.Thread.State: TIMED_WAITING at jdk.internal.misc.Unsafe.park(Native Method) - parking to wait for <4876219f> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:235) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2211) at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:389) - locked <5582112e> (a java.lang.Object) at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:319) at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.sessionClose(ActiveMQSessionContext.java:543) at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.close(ClientSessionImpl.java:1234) at org.apache.activemq.artemis.jms.client.ActiveMQSession.close(ActiveMQSession.java:280) - locked <5fe5842b> (a org.apache.activemq.artemis.jms.client.ActiveMQConnection) "flow-action-completed-351" - Thread t@12771 java.lang.Thread.State: BLOCKED at org.apache.activemq.artemis.jms.client.ActiveMQConnection.createSession(ActiveMQConnection.java:234) - waiting to lock <5fe5842b> (a org.apache.activemq.artemis.jms.client.ActiveMQConnection) owned by "flow-action-completed-355" t@12775 at org.springframework.jms.connection.SingleConnectionFactory.createSession(SingleConnectionFactory.java:481) at org.springframework.jms.connection.CachingConnectionFactory.getSession(CachingConnectionFactory.java:232) at org.springframework.jms.connection.SingleConnectionFactory$SharedConnectionInvocationHandler.invoke(SingleConnectionFactory.java:648) For now we increased *minLargeMessageSize *so we process all messages as regular messages because we run artemis embedded, keep entire messages in memory, and never actually get the benefit of processing the message as a large message. We would just like for someone to confirm that there is a bug present here. If you need more info we will provide it. Kind regards, Mario On Wed, 20 Nov 2019 at 18:23, Mario Mahovlić <mariomahov...@gmail.com> wrote: > Hello all, > > We run Artemis embedded on our Spring service and occasionally get this > issue where all our producer threads become blocked so no messages can be > produced to queue (happened 3 times in 2 weeks). We produce both regular > and large messages to the queue. All we get during this time from artemis > are timeout exceptions on our producer client side: > > nested exception is javax.jms.JMSException: AMQ219014: Timed out after > waiting 30,000 ms for response when sending packet 71 > > We took a thread dump of the service while issue was in effect (attached). > > From what we can see Thread t@11854 and thread Thread t@202 are > permanently locked, all other BLOCKED threads are blocked on session > creation: > > org.apache.activemq.artemis.jms.client.ActiveMQConnection.createSession(ActiveMQConnection.java:234) > > There seems to be a race condition that can cause thread deadlock during > large message processing when satisfied, example: > > > 1. Large message is produced > 2. *Thread-1* JournalImpl.appendAddRecord() -> appendExecutor > (JournalImpl:946) is delayed for some reason > 3. *Thread-2* JournalImpl.appendDeleteRecord() is triggered because > appendAdd is async-> takes lock on LargeMessage object > (PostOfficeImpl:1305) and gets stuck on appendExecutor queue behind > Thread-1 (JournalImpl:1058) > 4. *Thread-1* JournalImpl.appendAddRecord() -> appendExecutor gets to > the part where it needs the lock on same LargeMessage object > (LargeServerMessageImpl:173), but it can't get it because Thread-2 keeps it > > So deadlock is: Thread-1 is waiting for lock on LargeMessage object that > will never be released by Thread-2 because it is waiting for processing on > appendExecutor queue behind Thread-1 > > We are still having issues actually reproducing this deadlock because > Thread-2 when processing JournalImpl.appendDeleteRecord() checks > records/pendingRecords for recordId before reaching appendExecutor, and in > all cases we managed to reproduce it is always present in > (JournalImpl:1051) > > Service is running within a docker container and folder containing the > journal is mapped to the host machine. Metrics for the node on which > service was running show no disk I/O issues at that time. > > Artemis version: 2.6.4, Spring boot version: 2.1.5.RELEASE > > Relevant artemis settings (rest of the settings are default): > > durable: true > max-size-bytes : 1GB > address-full-policy: FAIL > journal-sync-non-transactional : false > journal-sync-transactional: false > > We are in the process of analyzing the issue further but wanted to report > this as soon as possible, so someone else can also take a look. If you need > any additional info, we will provide it. > > Kind regards, > > Mario >