Hello all,

Has anyone managed to check this out? It seems to be a pretty major race
condition issue with large message processing.
Unfortunately I just realized that attachments get removed from e-mails, so
here are relevant thread stacks...

Stacks of 2 *deadlocked *threads:

"Thread-1074
(ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@3738bd7a)"
- Thread t@11854
   java.lang.Thread.State: BLOCKED
        at
org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl.getEncodeSize(LargeServerMessageImpl.java:174)
        - waiting to lock <1eb68b32> (a
org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl)
owned by "Thread-25
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5@59ccdde4)"
t@202
        at
org.apache.activemq.artemis.core.persistence.impl.journal.codec.LargeMessagePersister.getEncodeSize(LargeMessagePersister.java:59)
        at
org.apache.activemq.artemis.core.persistence.impl.journal.codec.LargeMessagePersister.getEncodeSize(LargeMessagePersister.java:25)
        at
org.apache.activemq.artemis.core.journal.impl.dataformat.JournalAddRecord.getEncodeSize(JournalAddRecord.java:79)
        at
org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:2785)
        at
org.apache.activemq.artemis.core.journal.impl.JournalImpl.access$100(JournalImpl.java:91)
        at
org.apache.activemq.artemis.core.journal.impl.JournalImpl$1.run(JournalImpl.java:843)
        at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
        at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
        at
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
        at
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$1194/0x0000000801ecd440.run(Unknown
Source)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)

   Locked ownable synchronizers:
        - locked <4c6aa3d9> (a
java.util.concurrent.ThreadPoolExecutor$Worker)


"Thread-25
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5@59ccdde4)"
- Thread t@202
   java.lang.Thread.State: WAITING
        at jdk.internal.misc.Unsafe.park(Native Method)
        - parking to wait for <7807bddf> (a
java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
        at
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
        at
org.apache.activemq.artemis.utils.SimpleFutureImpl.get(SimpleFutureImpl.java:62)
        at
org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkKnownRecordID(JournalImpl.java:1073)
        at
org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:943)
        at
org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.confirmPendingLargeMessage(AbstractJournalStorageManager.java:360)
        at
org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.confirmLargeMessageSend(PostOfficeImpl.java:1308)
        - locked <1eb68b32> (a
org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl)
        at
org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.processRoute(PostOfficeImpl.java:1254)
        at
org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.java:904)
        at
org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.java:799)
        at
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.doSend(ServerSessionImpl.java:1801)
        - locked <3396ca5d> (a
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl)
        at
org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.sendContinuations(ServerSessionPacketHandler.java:993)
        - locked <67aa6e6> (a java.lang.Object)
        at
org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.slowPacketHandler(ServerSessionPacketHandler.java:314)
        at
org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.onMessagePacket(ServerSessionPacketHandler.java:287)
        at
org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler$$Lambda$1694/0x000000080219f840.onMessage(Unknown
Source)
        at
org.apache.activemq.artemis.utils.actors.Actor.doTask(Actor.java:33)
        at
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
        at
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$1194/0x0000000801ecd440.run(Unknown
Source)
        at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
        at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
        at
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
        at
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$1194/0x0000000801ecd440.run(Unknown
Source)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)

   Locked ownable synchronizers:
        - locked <7fc2876c> (a
java.util.concurrent.ThreadPoolExecutor$Worker)

As described in previous emails, these two threads are locked like this:

   1. *Thread t@11854*   JournalImpl.appendAddRecord() -> appendExecutor
   (JournalImpl:946) is delayed for some reason
   2. *Thread t@202*  JournalImpl.appendDeleteRecord() is triggered because
   appendAdd is async-> takes lock on LargeMessage object
   (PostOfficeImpl:1305) and gets stuck on appendExecutor queue behind
   Thread-1 (JournalImpl:1058)
   3. *Thread t@11854*   JournalImpl.appendAddRecord() -> appendExecutor
   gets to the part where it needs the lock on same LargeMessage object
   (LargeServerMessageImpl:173), but it can't get it because Thread-2 keeps it

So deadlock is: *Thread t@11854* is waiting for lock on LargeMessage object
that will never be released by *Thread t@202* because it is waiting for
processing on appendExecutor queue behind *Thread t@11854*

This eventually blocks all other producer threads, so *no messages can be
produced to the queue*:

"flow-action-completed-355" - Thread t@12775
   java.lang.Thread.State: TIMED_WAITING
        at jdk.internal.misc.Unsafe.park(Native Method)
        - parking to wait for <4876219f> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:235)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2211)
        at
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:389)
        - locked <5582112e> (a java.lang.Object)
        at
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:319)
        at
org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.sessionClose(ActiveMQSessionContext.java:543)
        at
org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.close(ClientSessionImpl.java:1234)
        at
org.apache.activemq.artemis.jms.client.ActiveMQSession.close(ActiveMQSession.java:280)
        - locked <5fe5842b> (a
org.apache.activemq.artemis.jms.client.ActiveMQConnection)


"flow-action-completed-351" - Thread t@12771
   java.lang.Thread.State: BLOCKED
        at
org.apache.activemq.artemis.jms.client.ActiveMQConnection.createSession(ActiveMQConnection.java:234)
        - waiting to lock <5fe5842b> (a
org.apache.activemq.artemis.jms.client.ActiveMQConnection) owned by
"flow-action-completed-355" t@12775
        at
org.springframework.jms.connection.SingleConnectionFactory.createSession(SingleConnectionFactory.java:481)
        at
org.springframework.jms.connection.CachingConnectionFactory.getSession(CachingConnectionFactory.java:232)
        at
org.springframework.jms.connection.SingleConnectionFactory$SharedConnectionInvocationHandler.invoke(SingleConnectionFactory.java:648)

For now we increased *minLargeMessageSize *so we process all messages as
regular messages because we run artemis embedded, keep entire messages in
memory, and never actually get the benefit of processing the message as a
large message.
We would just like for someone to confirm that there is a bug present here.
If you need more info we will provide it.

Kind regards,
Mario

On Wed, 20 Nov 2019 at 18:23, Mario Mahovlić <mariomahov...@gmail.com>
wrote:

> Hello all,
>
> We run Artemis embedded on our Spring service and occasionally get this
> issue where all our producer threads become blocked so no messages can be
> produced to queue (happened 3 times in 2 weeks). We produce both regular
> and large messages to the queue. All we get during this time from artemis
> are timeout exceptions on our producer client side:
>
> nested exception is javax.jms.JMSException: AMQ219014: Timed out after 
> waiting 30,000 ms for response when sending packet 71
>
> We took a thread dump of the service while issue was in effect (attached).
>
> From what we can see Thread t@11854 and thread Thread t@202 are
> permanently locked, all other BLOCKED threads are blocked on session
> creation:
>
> org.apache.activemq.artemis.jms.client.ActiveMQConnection.createSession(ActiveMQConnection.java:234)
>
> There seems to be a race condition that can cause thread deadlock during
> large message processing when satisfied, example:
>
>
>    1. Large message is produced
>    2. *Thread-1* JournalImpl.appendAddRecord() -> appendExecutor
>    (JournalImpl:946) is delayed for some reason
>    3. *Thread-2* JournalImpl.appendDeleteRecord() is triggered because
>    appendAdd is async-> takes lock on LargeMessage object
>    (PostOfficeImpl:1305) and gets stuck on appendExecutor queue behind
>    Thread-1 (JournalImpl:1058)
>    4. *Thread-1* JournalImpl.appendAddRecord() -> appendExecutor gets to
>    the part where it needs the lock on same LargeMessage object
>    (LargeServerMessageImpl:173), but it can't get it because Thread-2 keeps it
>
> So deadlock is: Thread-1 is waiting for lock on LargeMessage object that
> will never be released by Thread-2 because it is waiting for processing on
> appendExecutor queue behind Thread-1
>
> We are still having issues actually reproducing this deadlock because
> Thread-2 when processing JournalImpl.appendDeleteRecord() checks
> records/pendingRecords for recordId before reaching appendExecutor, and in
> all cases we managed to reproduce it is always present in
> (JournalImpl:1051)
>
> Service is running within a docker container and folder containing the
> journal is mapped to the host machine. Metrics for the node on which
> service was running show no disk I/O issues at that time.
>
> Artemis version: 2.6.4, Spring boot version: 2.1.5.RELEASE
>
> Relevant artemis settings (rest of the settings are default):
>
> durable: true
> max-size-bytes : 1GB
> address-full-policy: FAIL
> journal-sync-non-transactional : false
> journal-sync-transactional: false
>
> We are in the process of analyzing the issue further but wanted to report
> this as soon as possible, so someone else can also take a look. If you need
> any additional info, we will provide it.
>
> Kind regards,
>
> Mario
>

Reply via email to