We will definitely need a timeline on this.  We are dequeueing messages on
12 app servers per "pod" with 32 threads per server dequeuing messages.  We
have over 65 pods.  This means, at any given time, we have around 25,000
dequeue threads running.  While there are isolated systems, we are
encountering deadlocks multiple times a day on multiple pods with most of
our traffic not running on QPID yet.  We are stuck moving forward until
this is resolved, as support has to reboot the brokers/app servers when
this happens.

Wanted to make sure you all realized this isn't a minor annoyance.  We are
using QPID quite a lot and this is a blocker for us moving forward.

On Thu, Feb 5, 2015 at 9:16 AM, Rajith Muditha Attapattu <rajit...@gmail.com
> wrote:

> The consensus is, that there is no easy fix, other than rewriting some key
> pieces of the client.
> Given that a new JMS client is being actively developed, we have decided
> not to devote any significant time/resources on the older client.
>
> The new JMS client is AMQP 1.0 and the source is here
> https://git-wip-us.apache.org/repos/asf?p=qpid-jms.git
> Robbie is the best person to give an accurate picture of the status for
> this project.
>
> Rajith
>
> On Thu, Feb 5, 2015 at 2:42 AM, Rob Godfrey <rob.j.godf...@gmail.com>
> wrote:
>
>> To answer the easy question first, 0.32 clients should be totally
>> compatible with an 0.16 broker.
>>
>> In terms of the deadlock issue, I know there are a number of open deadlock
>> JIRAs that Robbie and Rajith have looked at in the past... Hopefully one
>> of
>> them will be able to chip in here and discuss the feasibility of fixing
>> your particular issue for 0.32.
>>
>> -- Rob
>>
>> On 4 February 2015 at 23:36, Helen Kwong <helenkw...@gmail.com> wrote:
>>
>> > Hi Qpid gurus,
>> >
>> > We are using 0.16 Java broker and client on 0-10, and we are running
>> into
>> > deadlock issues on the client that involve AMQSession's
>> > _messageDeliveryLock and AMQConnection's _failoverMutex, where different
>> > threads acquire them in different orders. This is leading to major
>> > production headaches for us and has us very worried.
>> >
>> > (I've been looking at 0.16 code mostly but also skimmed the relevant
>> parts
>> > in 0.32, which seem largely the same in those places.)
>> >
>> > Deadlock Variety 1
>> >
>> > This is an example of a deadlock we see, where the IOReceiver thread
>> > deadlocks with Session dispatcher thread (we have listeners that call
>> > session rollback or commit in onMessage()):
>> >
>> > IOReceiver
>> > org.apache.qpid.client.AMQSession.closed(AMQSession.java:818)
>> ---------->
>> > waiting for session's messageDeliveryLock
>> >
>> >
>> org.apache.qpid.client.AMQConnection.closeAllSessions(AMQConnection.java:938)
>> >
>> >
>> org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1282)
>> >  ----------> acquires connection's failoverMutex
>> >
>> >
>> org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1057)
>> >
>> org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:907)
>> >
>> >
>> org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:182)
>> >
>> >
>> org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32)
>> >
>> >
>> org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:103)
>> >
>> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:55)
>> >
>> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50)
>> >
>> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32)
>> > org.apache.qpid.transport.Method.delegate(Method.java:159)
>> > org.apache.qpid.transport.Session.received(Session.java:585)
>> > org.apache.qpid.transport.Connection.dispatch(Connection.java:412)
>> >
>> >
>> org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64)
>> >
>> >
>> org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40)
>> >
>> >
>> org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110)
>> >
>> >
>> org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:103)
>> >
>> >
>> org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54)
>> >
>> >
>> org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40)
>> > org.apache.qpid.transport.Method.delegate(Method.java:159)
>> > org.apache.qpid.transport.Connection.received(Connection.java:367)
>> > org.apache.qpid.transport.Connection.received(Connection.java:65)
>> > org.apache.qpid.transport.network.Assembler.emit(Assembler.java:97)
>> > org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:198)
>> > org.apache.qpid.transport.network.Assembler.frame(Assembler.java:131)
>> > org.apache.qpid.transport.network.Frame.delegate(Frame.java:128)
>> > org.apache.qpid.transport.network.Assembler.received(Assembler.java:102)
>> > org.apache.qpid.transport.network.Assembler.received(Assembler.java:44)
>> >
>> org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:189)
>> >
>> >
>> org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:105)
>> >
>> >
>> org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:44)
>> > org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:152)
>> > java.lang.Thread.run(Thread.java:745)
>> >
>> > Session dispatcher thread
>> >
>> >
>> org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1255)
>> >  ---------> waiting for connection's failoverMutex
>> >
>> >
>> org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1057)
>> > org.apache.qpid.client.AMQSession_0_10.sync(AMQSession_0_10.java:1034)
>> >
>> >
>> org.apache.qpid.client.AMQSession_0_10.sendSuspendChannel(AMQSession_0_10.java:812)
>> > org.apache.qpid.client.AMQSession.suspendChannel(AMQSession.java:3075)
>> > org.apache.qpid.client.AMQSession.rollback(AMQSession.java:1837)
>> > common.messaging.QpidSession.rollback(QpidSession.java:211)
>> >
>> >
>> common.messaging.QpidMessageHandler.rollbackSession(QpidMessageHandler.java:284)
>> >
>> common.messaging.QpidMessageHandler.onMessage(QpidMessageHandler.java:113)
>> >
>> >
>> org.apache.qpid.client.BasicMessageConsumer.notifyMessage(BasicMessageConsumer.java:748)
>> >
>> >
>> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:141)
>> >
>> >
>> org.apache.qpid.client.BasicMessageConsumer.notifyMessage(BasicMessageConsumer.java:722)
>> >
>> >
>> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:186)
>> >
>> >
>> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:54)
>> >
>> >
>> org.apache.qpid.client.AMQSession$Dispatcher.notifyConsumer(AMQSession.java:3454)
>> >
>> >
>> org.apache.qpid.client.AMQSession$Dispatcher.dispatchMessage(AMQSession.java:3393)
>> >   -----------> acquires session's messageDeliverylock
>> >
>> >
>> org.apache.qpid.client.AMQSession$Dispatcher.access$1000(AMQSession.java:3180)
>> > org.apache.qpid.client.AMQSession.dispatch(AMQSession.java:3173)
>> >
>> >
>> org.apache.qpid.client.message.UnprocessedMessage.dispatch(UnprocessedMessage.java:54)
>> > org.apache.qpid.client.AMQSession$Dispatcher.run(AMQSession.java:3316)
>> > java.lang.Thread.run(Thread.java:745)
>> >
>> > The problem is that the IOReceiver thread acquires failoverMutex before
>> > messageDeliveryLock (for each session), whereas the dispatcher thread
>> > acquires it in the other order. We also see potential problems where
>> other
>> > threads (instead of IOReceiver) can deadlock with the dispatcher
>> thread, as
>> > long as it acquires failoverMutex before messageDeliveryLock. Examples
>> we
>> > can think of:
>> >
>> > A) Another thread calling AMQSession.close()
>> > B) Another thread calling BasicMessageConsumer.close()
>> > C) Same connection, different session's dispatcher thread, calling
>> > rollback() or commit() -> sync() -> setCurrentException() ->
>> > AMQConnection.exceptionReceived() -> AMQConnection.closeAllSessions(),
>> > which can try to acquire the messageDeliveryLock of another session and
>> > deadlock with the other session's dispatcher thread
>> >
>> >
>> > Deadlock Variety 2:
>> > From code inspection, it also appears that AMQConnection.close() can
>> > deadlock with either AMQSession.close() or BasicMessageConsumer.close()
>> > (where the session / consumer is on the same connection). This is
>> because
>> > AMQConnection.close() first acquires the messageDeliveryLock of all its
>> > sessions in the recursive doClose(), before trying to acquire the
>> > connection's failoverMutex. But the Session / consumer's close()
>> acquires
>> > the failoverMutex before messageDeliveryLock. We haven't seen this
>> happen
>> > but would like to know if this is possible.
>> >
>> >
>> > We'd really appreciate your help on this. Assuming these can be fixed in
>> > 0.32, we are also wondering if clients are backward compatible -- i.e.,
>> can
>> > we upgrade only our client to 0.32 while continuing to use the 0.16
>> broker?
>> >
>> > Thanks,
>> > Helen
>> >
>>
>
>

Reply via email to