Hello. I've just detected a lock problem using activemq-core 4.0.2 that seems to be related with journal writing logic. I was able to get a stack of the running process, where we can see that a lot of threads (56 to be precise) are waiting to lock the journal, in this way:

"ActiveMQ Transport: tcp:///10.228.32.104:32194" daemon prio=10 tid=0x0228c5a8 nid=0x1b0e0 waiting for monitor entry [0xed97e000..0xed97faa8] at org.apache.activeio.journal.active.JournalImpl.write(JournalImpl.java:169) * - waiting to lock <0x2a9f58c0> (a org.apache.activeio.journal.active.JournalImpl)* at org.apache.activeio.journal.active.JournalImpl.write(JournalImpl.java:157) at org.apache.activemq.store.journal.JournalPersistenceAdapter.writeCommand(JournalPersistenceAdapter.java:588) at org.apache.activemq.store.journal.JournalMessageStore.addMessage(JournalMessageStore.java:94)
       at org.apache.activemq.broker.region.Queue.send(Queue.java:253)
at org.apache.activemq.broker.region.AbstractRegion.send(AbstractRegion.java:225) at org.apache.activemq.broker.region.RegionBroker.send(RegionBroker.java:342) at org.apache.activemq.broker.TransactionBroker.send(TransactionBroker.java:192) at org.apache.activemq.broker.BrokerFilter.send(BrokerFilter.java:113) at org.apache.activemq.broker.CompositeDestinationBroker.send(CompositeDestinationBroker.java:97) at org.apache.activemq.broker.MutableBrokerFilter.send(MutableBrokerFilter.java:126) at org.apache.activemq.broker.AbstractConnection.processMessage(AbstractConnection.java:377) at org.apache.activemq.command.ActiveMQMessage.visit(ActiveMQMessage.java:590) at org.apache.activemq.broker.AbstractConnection.service(AbstractConnection.java:226) at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:62) at org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:91) at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:63) at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:122) at org.apache.activemq.transport.InactivityMonitor.onCommand(InactivityMonitor.java:122) at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:87) at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:127)
       at java.lang.Thread.run(Unknown Source)


Furthermore, some threads have got the lock (4 threads) , but are waiting for a java.util.concurrent.CountDownLatch to became zero:

"ActiveMQ Task" daemon prio=10 tid=0x0205e598 nid=0x1b0cb in Object.wait() [0xed77f000..0xed77fb28]
       at java.lang.Object.wait(Native Method)
       at java.lang.Object.wait(Unknown Source)
* at edu.emory.mathcs.backport.java.util.concurrent.CountDownLatch.await(CountDownLatch.java:179)* * - locked <0x384ddf68> (a edu.emory.mathcs.backport.java.util.concurrent.CountDownLatch)* at org.apache.activeio.journal.active.BatchedWrite.waitForForce(BatchedWrite.java:95) at org.apache.activeio.journal.active.JournalImpl.write(JournalImpl.java:187) at org.apache.activeio.journal.active.JournalImpl.setMark(JournalImpl.java:344) - locked <0x2a9f58c0> (a org.apache.activeio.journal.active.JournalImpl) at org.apache.activemq.store.journal.JournalPersistenceAdapter.doCheckpoint(JournalPersistenceAdapter.java:395) at org.apache.activemq.store.journal.JournalPersistenceAdapter$2.iterate(JournalPersistenceAdapter.java:122) at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:110) at org.apache.activemq.thread.PooledTaskRunner.access$100(PooledTaskRunner.java:25) at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:43) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
       at java.lang.Thread.run(Unknown Source)

and also some transport threads, in a similar situation:

"ActiveMQ Transport: tcp:///10.228.32.110:50356" daemon prio=10 tid=0x02098b90 nid=0x1b0ca in Object.wait() [0xf157e000..0xf157fba8]
       at java.lang.Object.wait(Native Method)
       at java.lang.Object.wait(Unknown Source)
* at edu.emory.mathcs.backport.java.util.concurrent.CountDownLatch.await(CountDownLatch.java:179) - locked <0x384ddf68> (a edu.emory.mathcs.backport.java.util.concurrent.CountDownLatch)* at org.apache.activeio.journal.active.BatchedWrite.waitForForce(BatchedWrite.java:95) at org.apache.activeio.journal.active.JournalImpl.write(JournalImpl.java:187) at org.apache.activeio.journal.active.JournalImpl.write(JournalImpl.java:157) at org.apache.activemq.store.journal.JournalPersistenceAdapter.writeCommand(JournalPersistenceAdapter.java:588) at org.apache.activemq.store.journal.JournalMessageStore.addMessage(JournalMessageStore.java:94)
       at org.apache.activemq.broker.region.Queue.send(Queue.java:253)
at org.apache.activemq.broker.region.AbstractRegion.send(AbstractRegion.java:225) at org.apache.activemq.broker.region.RegionBroker.send(RegionBroker.java:342) at org.apache.activemq.broker.TransactionBroker.send(TransactionBroker.java:192) at org.apache.activemq.broker.BrokerFilter.send(BrokerFilter.java:113) at org.apache.activemq.broker.CompositeDestinationBroker.send(CompositeDestinationBroker.java:97) at org.apache.activemq.broker.MutableBrokerFilter.send(MutableBrokerFilter.java:126) at org.apache.activemq.broker.AbstractConnection.processMessage(AbstractConnection.java:377) at org.apache.activemq.command.ActiveMQMessage.visit(ActiveMQMessage.java:590) at org.apache.activemq.broker.AbstractConnection.service(AbstractConnection.java:226) at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:62) at org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:91) at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:63) at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:122) at org.apache.activemq.transport.InactivityMonitor.onCommand(InactivityMonitor.java:122) at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:87) at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:127)
       at java.lang.Thread.run(Unknown Source)


After a fast review of the code, it seems that someone should call :

org.apache.activeio.journal.active.BatchedWrite.forced()

or

org.apache.activeio.journal.active.BatchedWrite.writeFailed()

I guess that after the actual write to the journal is performed.


As additional information, in the environment where we have the activeMQ running, there's a known problem where servlets acting as JMS clients are not closing explicitly its connection, causing exceptions ( I suppose that is the reason) like:

Error writing log '[WARNING][2007/06/04.09:28:26.785][ActiveMQ Scheduler]Async exception with no exception listener: org.apache.activemq.transport.InactivityIOException: Channel was inactive for too long.(Channel was inactive for too long.)
'
java.io.InterruptedIOException
       at java.io.FileOutputStream.writeBytes(Native Method)
       at java.io.FileOutputStream.write(Unknown Source)
       at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(Unknown Source)
at sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(Unknown Source)
       at sun.nio.cs.StreamEncoder$CharsetSE.implFlush(Unknown Source)
       at sun.nio.cs.StreamEncoder.flush(Unknown Source)
       at java.io.OutputStreamWriter.flush(Unknown Source)
       at java.io.BufferedWriter.flush(Unknown Source)
at es.tid.planb.core.FileLogWriter$RealFile.put(FileLogWriter.java:140)
       at es.tid.planb.core.FileLogWriter.put(FileLogWriter.java:241)
       at es.tid.planb.core.RotFileLogWriter.put(RotFileLogWriter.java:99)
       at es.tid.planb.core.Logger.write(Logger.java:240)
       at es.tid.planb.core.Logger.rWarning(Logger.java:291)
       at es.tid.planb.core.Logger.warning(Logger.java:193)
       at es.tid.planb.core.PBLogger.warn(PBLogger.java:260)
at org.apache.activemq.ActiveMQConnection.onAsyncException(ActiveMQConnection.java:1453) at org.apache.activemq.ActiveMQConnection.onException(ActiveMQConnection.java:1459) at org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:94) at org.apache.activemq.transport.ResponseCorrelator.onException(ResponseCorrelator.java:110) at org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:94) at org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:94) at org.apache.activemq.transport.WireFormatNegotiator.onException(WireFormatNegotiator.java:128) at org.apache.activemq.transport.InactivityMonitor.onException(InactivityMonitor.java:149) at org.apache.activemq.transport.InactivityMonitor.readCheck(InactivityMonitor.java:101) at org.apache.activemq.transport.InactivityMonitor.access$000(InactivityMonitor.java:35) at org.apache.activemq.transport.InactivityMonitor$1.run(InactivityMonitor.java:51) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.runAndReset(FutureTask.java:198) at edu.emory.mathcs.backport.java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:189) at edu.emory.mathcs.backport.java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:213) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
       at java.lang.Thread.run(Unknown Source)


The journal persistence configuration we're using is:

   <persistenceAdapter>
<journaledJDBC journalLogFiles="5" dataDirectory="/opt/project/run/activemq-data" dataSource="#jdbc-ds">
     </journaledJDBC>
   </persistenceAdapter>

Any advice or clue about what could be happening? Is there any situation in what calling BatchedWrite.forced() or BatchedWrite.writeFailed() could be skipped? Perhaps the inactivity related exceptions I'm suffering could be related with this?

My apologizes for the long stack dumps, and thanks a lot for this great project. We are suffering the problem in the production environment, so, I should be very grateful if you could provide any idea about it.
If more information is needed, don't hesitate to ask me for it.


Regards.

Reply via email to