On 18/12/2019 10:42 am, Toralf Lund wrote:
On 18/12/2019 10:40, Gordon Sim wrote:
On 18/12/2019 8:45 am, Toralf Lund wrote:
An additional, more serious issue is that the system has also locked
up a couple of times following an exception during Sender::send().
Slightly simplified stack trace from one of the cases:
Thread 5 (Thread 0x7fb93bc7c700 (LWP 134270)):
#0 0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
#1 0x00007fb93f9c7677 in
qpid::sys::Poller::wait(qpid::sys::Duration) ()
from /usr/lib64/libqpidcommon.so.2
#2 0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
from /usr/lib64/libqpidcommon.so.2
#3 0x00007fb93f9b8e4a in ?? ()
from /usr/lib64/libqpidcommon.so.2
#4 0x00007fb941663dd5 in start_thread () from
/usr/lib64/libpthread.so.0
#5 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
Thread 4 (Thread 0x7fb93b47b700 (LWP 134271)):
#0 0x00007fb941667cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /usr/lib64/libpthread.so.0
#1 0x00007fb93fa35b35 in qpid::sys::Timer::run() ()
from /usr/lib64/libqpidcommon.so.2
#2 0x00007fb93f9b8e4a in ?? ()
from /usr/lib64/libqpidcommon.so.2
#3 0x00007fb941663dd5 in start_thread () from
/usr/lib64/libpthread.so.0
#4 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
Thread 3 (Thread 0x7fb93987f700 (LWP 134344)):
[ ... ]
Thread 3's trace didn't make it through. Do you still have that?
Thread 2 (Thread 0x7fb93907e700 (LWP 395509)):
#0 0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
#1 0x00007fb93f9c7677 in
qpid::sys::Poller::wait(qpid::sys::Duration) ()
from /usr/lib64/libqpidcommon.so.2
#2 0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
from /usr/lib64/libqpidcommon.so.2
#3 0x00007fb93f9b8e4a in ?? ()
from /usr/lib64/libqpidcommon.so.2
#4 0x00007fb941663dd5 in start_thread () from
/usr/lib64/libpthread.so.0
#5 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
Thread 1 (Thread 0x7fb942671300 (LWP 134233)):
#0 0x00007fb94039850f in ?? () from /usr/lib64/libgcc_s.so.1
#1 0x00007fb940399f5f in ?? () from /usr/lib64/libgcc_s.so.1
#2 0x00007fb94039a8ca in ?? () from /usr/lib64/libgcc_s.so.1
#3 0x00007fb94039add7 in _Unwind_Resume () from
/usr/lib64/libgcc_s.so.1
#4 0x00007fb93fd8b927 in
qpid::client::SessionImpl::sendCommand(qpid::framing::AMQBody const&,
qpid::framing::MethodContent const*) ()
from /usr/lib64/libqpidclient.so.2
#5 0x00007fb93fd8b97b in
qpid::client::SessionImpl::send(qpid::framing::AMQBody const&) ()
from /usr/lib64/libqpidclient.so.2
#6 0x00007fb93fd876a6 in qpid::client::SessionBase_0_10::sync() ()
from /usr/lib64/libqpidclient.so.2
#7 0x00007fb9422403f8 in ?? ()
from /usr/lib64/libqpidmessaging.so.2
#8 0x00007fb942241e6a in ?? ()
from /usr/lib64/libqpidmessaging.so.2
#9 0x00000000004f2dc9 in ?? ()
It looks like the send is stuck trying to unwind after an exception
(possibly on the checkOpen()), due to not being able to unlock the
sendLock semaphore. Is something in thread 3 holding any locks?
It may be holding locks, but not anything related to QPid. Specifically,
it's probably waiting for a condition via pthread_cond_wait(), which
implies that it's also holding a mutex associated with the condition
variable.
Ok, thread 1 may of course not be stuck at all, and it was just captured
at that point. Getting pstack output separated by a few seconds would
shed more light.
Does the frequency of the 'locking up' match the frequency of the
session-busy exceptions?
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]