Hello all,

We run Artemis embedded on our Spring service and occasionally get this
issue where all our producer threads become blocked so no messages can be
produced to queue (happened 3 times in 2 weeks). We produce both regular
and large messages to the queue. All we get during this time from artemis
are timeout exceptions on our producer client side:

nested exception is javax.jms.JMSException: AMQ219014: Timed out after
waiting 30,000 ms for response when sending packet 71

We took a thread dump of the service while issue was in effect (attached).

>From what we can see Thread t@11854 and thread Thread t@202 are permanently
locked, all other BLOCKED threads are blocked on session creation:

org.apache.activemq.artemis.jms.client.ActiveMQConnection.createSession(ActiveMQConnection.java:234)

There seems to be a race condition that can cause thread deadlock during
large message processing when satisfied, example:


   1. Large message is produced
   2. *Thread-1* JournalImpl.appendAddRecord() -> appendExecutor
   (JournalImpl:946) is delayed for some reason
   3. *Thread-2* JournalImpl.appendDeleteRecord() is triggered because
   appendAdd is async-> takes lock on LargeMessage object
   (PostOfficeImpl:1305) and gets stuck on appendExecutor queue behind
   Thread-1 (JournalImpl:1058)
   4. *Thread-1* JournalImpl.appendAddRecord() -> appendExecutor gets to
   the part where it needs the lock on same LargeMessage object
   (LargeServerMessageImpl:173), but it can't get it because Thread-2 keeps it

So deadlock is: Thread-1 is waiting for lock on LargeMessage object that
will never be released by Thread-2 because it is waiting for processing on
appendExecutor queue behind Thread-1

We are still having issues actually reproducing this deadlock because
Thread-2 when processing JournalImpl.appendDeleteRecord() checks
records/pendingRecords for recordId before reaching appendExecutor, and in
all cases we managed to reproduce it is always present in
(JournalImpl:1051)

Service is running within a docker container and folder containing the
journal is mapped to the host machine. Metrics for the node on which
service was running show no disk I/O issues at that time.

Artemis version: 2.6.4, Spring boot version: 2.1.5.RELEASE

Relevant artemis settings (rest of the settings are default):

durable: true
max-size-bytes : 1GB
address-full-policy: FAIL
journal-sync-non-transactional : false
journal-sync-transactional: false

We are in the process of analyzing the issue further but wanted to report
this as soon as possible, so someone else can also take a look. If you need
any additional info, we will provide it.

Kind regards,

Mario

Reply via email to