On one of our installations of one of our AMQP 0-10 applications using
the QPid messaging API (in C++), we keep getting exceptions with the message
session-busy: Session detached by peer
- usually while waiting for incoming messages, but in some cases also
when sending them. After resolving some issues with "dangling" session
pointers (with help from the list), I thought this wasn't really a
problem, but I just found that the broker has several extra connections
for the application; qpid-stat reports the following:
$ qpid-stat -c
Connections
connection cproc cpid mech
auth connected idle msgIn msgOut
===========================================================================================================================================
qpid.192.168.0.215:5672-192.168.0.61:58532 our-program 134233
ANONYMOUS anonymous 4d 19h 42m 25s 4d 18h 37m 44s 47.3k 531k
qpid.192.168.0.215:5672-192.168.0.61:58564 our-program 134233
ANONYMOUS anonymous 4d 5h 27m 9s 4d 4h 32m 4s 42.3k 488k
qpid.192.168.0.215:5672-192.168.0.61:58566 our-program 134233
ANONYMOUS anonymous 4d 4h 32m 12s 4d 3h 33m 34s 44.6k 513k
qpid.192.168.0.215:5672-192.168.0.61:58580 our-program 134233
ANONYMOUS anonymous 1d 17h 37m 49s 1d 16h 41m 54s 40.6k 515k
qpid.192.168.0.215:5672-192.168.0.61:58582 our-program 134233
ANONYMOUS anonymous 1d 16h 42m 1s 1d 15h 46m 44s 41.3k 497k
qpid.192.168.0.215:5672-192.168.0.61:58584 our-program 134233
ANONYMOUS anonymous 1d 15h 46m 46s 1d 14h 43m 24s 46.8k 535k
qpid.192.168.0.215:5672-192.168.0.61:58618 our-program 134233
ANONYMOUS anonymous 22h 55m 50s 21h 48m 4s 44.5k 496k
qpid.192.168.0.215:5672-192.168.0.61:58846 our-program 134233
ANONYMOUS anonymous 18h 21m 22s 0s 0 0
(Names and addresses have been edited.)
The implied times seem to match instances of the session-busy exception,
but a new connection doesn't appear every time. The exception has
occurred 27 times during the last 4 days, and as you can see, there are
a not that many connections.
I don't see any signs of connection errors or explicit reopens but
automatic reconnect with a limit has been enabled; the following connect
options are used
connection.setOption("reconnect", true);
connection.setOption("reconnect_limit", 0);
(I believe this means trying reconnect once, as "reconnect_limit"
specifies retries after an initial attempt.)
An additional, more serious issue is that the system has also locked up
a couple of times following an exception during Sender::send(). Slightly
simplified stack trace from one of the cases:
Thread 5 (Thread 0x7fb93bc7c700 (LWP 134270)):
#0 0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
#1 0x00007fb93f9c7677 in qpid::sys::Poller::wait(qpid::sys::Duration) ()
from /usr/lib64/libqpidcommon.so.2
#2 0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
from /usr/lib64/libqpidcommon.so.2
#3 0x00007fb93f9b8e4a in ?? ()
from /usr/lib64/libqpidcommon.so.2
#4 0x00007fb941663dd5 in start_thread () from /usr/lib64/libpthread.so.0
#5 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
Thread 4 (Thread 0x7fb93b47b700 (LWP 134271)):
#0 0x00007fb941667cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /usr/lib64/libpthread.so.0
#1 0x00007fb93fa35b35 in qpid::sys::Timer::run() ()
from /usr/lib64/libqpidcommon.so.2
#2 0x00007fb93f9b8e4a in ?? ()
from /usr/lib64/libqpidcommon.so.2
#3 0x00007fb941663dd5 in start_thread () from /usr/lib64/libpthread.so.0
#4 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
Thread 3 (Thread 0x7fb93987f700 (LWP 134344)):
[ ... ]
Thread 2 (Thread 0x7fb93907e700 (LWP 395509)):
#0 0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
#1 0x00007fb93f9c7677 in qpid::sys::Poller::wait(qpid::sys::Duration) ()
from /usr/lib64/libqpidcommon.so.2
#2 0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
from /usr/lib64/libqpidcommon.so.2
#3 0x00007fb93f9b8e4a in ?? ()
from /usr/lib64/libqpidcommon.so.2
#4 0x00007fb941663dd5 in start_thread () from /usr/lib64/libpthread.so.0
#5 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
Thread 1 (Thread 0x7fb942671300 (LWP 134233)):
#0 0x00007fb94039850f in ?? () from /usr/lib64/libgcc_s.so.1
#1 0x00007fb940399f5f in ?? () from /usr/lib64/libgcc_s.so.1
#2 0x00007fb94039a8ca in ?? () from /usr/lib64/libgcc_s.so.1
#3 0x00007fb94039add7 in _Unwind_Resume () from /usr/lib64/libgcc_s.so.1
#4 0x00007fb93fd8b927 in
qpid::client::SessionImpl::sendCommand(qpid::framing::AMQBody const&,
qpid::framing::MethodContent const*) ()
from /usr/lib64/libqpidclient.so.2
#5 0x00007fb93fd8b97b in qpid::client::SessionImpl::send(qpid::framing::AMQBody
const&) () from /usr/lib64/libqpidclient.so.2
#6 0x00007fb93fd876a6 in qpid::client::SessionBase_0_10::sync() ()
from /usr/lib64/libqpidclient.so.2
#7 0x00007fb9422403f8 in ?? ()
from /usr/lib64/libqpidmessaging.so.2
#8 0x00007fb942241e6a in ?? ()
from /usr/lib64/libqpidmessaging.so.2
#9 0x00000000004f2dc9 in ?? ()
The application became responsive again after running a debug session to
get the above info. There is a chance that it did another Sender::send()
on the same sender (without trying to reset the session or whatever)
following the exception.
Any idea what's going on here? Is there anything I can do to avoid the
session-busy errors? Can I prevent extra connections from appearing if
it should occur after all? Also, is there a way to prevent the apparent
lock-ups after exception on send()? I think I've assumed that it's
"safe" to try other operations as they will just fail directly, unless
the session is actually able to recover, but perhaps that's a bad idea?
Note that besides what's been mentioned already, the following
connection options are used:
connection.setOption("tcp_nodelay", true);
connection.setOption("heartbeat", 2);
Thanks,
- Toralf