On one of our installations of one of our AMQP 0-10 applications using the QPid messaging API (in C++), we keep getting exceptions with the message

session-busy: Session detached by peer

- usually while waiting for incoming messages, but in some cases also when sending them. After resolving some issues with "dangling" session pointers (with help from the list), I thought this wasn't really a problem, but I just found that the broker has several extra connections for the application; qpid-stat reports the following:

$ qpid-stat -c
Connections
  connection                                  cproc            cpid    mech     
  auth       connected        idle            msgIn  msgOut
  
===========================================================================================================================================
  qpid.192.168.0.215:5672-192.168.0.61:58532  our-program      134233  
ANONYMOUS  anonymous  4d 19h 42m 25s   4d 18h 37m 44s  47.3k   531k
  qpid.192.168.0.215:5672-192.168.0.61:58564  our-program      134233  
ANONYMOUS  anonymous  4d 5h 27m 9s     4d 4h 32m 4s    42.3k   488k
  qpid.192.168.0.215:5672-192.168.0.61:58566  our-program      134233  
ANONYMOUS  anonymous  4d 4h 32m 12s    4d 3h 33m 34s   44.6k   513k
  qpid.192.168.0.215:5672-192.168.0.61:58580  our-program      134233  
ANONYMOUS  anonymous  1d 17h 37m 49s   1d 16h 41m 54s  40.6k   515k
  qpid.192.168.0.215:5672-192.168.0.61:58582  our-program      134233  
ANONYMOUS  anonymous  1d 16h 42m 1s    1d 15h 46m 44s  41.3k   497k
  qpid.192.168.0.215:5672-192.168.0.61:58584  our-program      134233  
ANONYMOUS  anonymous  1d 15h 46m 46s   1d 14h 43m 24s  46.8k   535k
  qpid.192.168.0.215:5672-192.168.0.61:58618  our-program      134233  
ANONYMOUS  anonymous  22h 55m 50s      21h 48m 4s      44.5k   496k
  qpid.192.168.0.215:5672-192.168.0.61:58846  our-program      134233  
ANONYMOUS  anonymous  18h 21m 22s      0s                 0      0

(Names and addresses have been edited.)

The implied times seem to match instances of the session-busy exception, but a new connection doesn't appear every time. The exception has occurred 27 times during the last 4 days, and as you can see, there are a not that many connections.

I don't see any signs of connection errors or explicit reopens but automatic reconnect with a limit has been enabled; the following connect options are used

  connection.setOption("reconnect", true);
  connection.setOption("reconnect_limit", 0);

(I believe this means trying reconnect once, as "reconnect_limit" specifies retries after an initial attempt.)

An additional, more serious issue is that the system has also locked up a couple of times following an exception during Sender::send(). Slightly simplified stack trace from one of the cases:

Thread 5 (Thread 0x7fb93bc7c700 (LWP 134270)):
#0  0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
#1  0x00007fb93f9c7677 in qpid::sys::Poller::wait(qpid::sys::Duration) ()
   from /usr/lib64/libqpidcommon.so.2
#2  0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
   from /usr/lib64/libqpidcommon.so.2
#3  0x00007fb93f9b8e4a in ?? ()
   from /usr/lib64/libqpidcommon.so.2
#4  0x00007fb941663dd5 in start_thread () from /usr/lib64/libpthread.so.0
#5  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6

Thread 4 (Thread 0x7fb93b47b700 (LWP 134271)):
#0  0x00007fb941667cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /usr/lib64/libpthread.so.0
#1  0x00007fb93fa35b35 in qpid::sys::Timer::run() ()
   from /usr/lib64/libqpidcommon.so.2
#2  0x00007fb93f9b8e4a in ?? ()
   from /usr/lib64/libqpidcommon.so.2
#3  0x00007fb941663dd5 in start_thread () from /usr/lib64/libpthread.so.0
#4  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6

Thread 3 (Thread 0x7fb93987f700 (LWP 134344)):
[ ... ]

Thread 2 (Thread 0x7fb93907e700 (LWP 395509)):
#0  0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
#1  0x00007fb93f9c7677 in qpid::sys::Poller::wait(qpid::sys::Duration) ()
   from /usr/lib64/libqpidcommon.so.2
#2  0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
   from /usr/lib64/libqpidcommon.so.2
#3  0x00007fb93f9b8e4a in ?? ()
   from /usr/lib64/libqpidcommon.so.2
#4  0x00007fb941663dd5 in start_thread () from /usr/lib64/libpthread.so.0
#5  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6

Thread 1 (Thread 0x7fb942671300 (LWP 134233)):
#0  0x00007fb94039850f in ?? () from /usr/lib64/libgcc_s.so.1
#1  0x00007fb940399f5f in ?? () from /usr/lib64/libgcc_s.so.1
#2  0x00007fb94039a8ca in ?? () from /usr/lib64/libgcc_s.so.1
#3  0x00007fb94039add7 in _Unwind_Resume () from /usr/lib64/libgcc_s.so.1
#4  0x00007fb93fd8b927 in 
qpid::client::SessionImpl::sendCommand(qpid::framing::AMQBody const&, 
qpid::framing::MethodContent const*) ()
   from /usr/lib64/libqpidclient.so.2
#5  0x00007fb93fd8b97b in qpid::client::SessionImpl::send(qpid::framing::AMQBody 
const&) () from /usr/lib64/libqpidclient.so.2
#6  0x00007fb93fd876a6 in qpid::client::SessionBase_0_10::sync() ()
   from /usr/lib64/libqpidclient.so.2
#7  0x00007fb9422403f8 in ?? ()
   from /usr/lib64/libqpidmessaging.so.2
#8  0x00007fb942241e6a in ?? ()
   from /usr/lib64/libqpidmessaging.so.2
#9  0x00000000004f2dc9 in ?? ()

The application became responsive again after running a debug session to get the above info. There is a chance that it did another Sender::send() on the same sender (without trying to reset the session or whatever) following the exception.

Any idea what's going on here? Is there anything I can do to avoid the session-busy errors? Can I prevent extra connections from appearing if it should occur after all? Also, is there a way to prevent the apparent lock-ups after exception on send()? I think I've assumed that it's "safe" to try other operations as they will just fail directly, unless the session is actually able to recover, but perhaps that's a bad idea?

Note that besides what's been mentioned already, the following connection options are used:

connection.setOption("tcp_nodelay", true);
connection.setOption("heartbeat", 2);

Thanks,

- Toralf

Reply via email to