Hi,

This is a problem that has been discussed before, but I kind of reached a dead end in trying to figure out what was going on then, so I'll try again:

On one particular site where one of our QPid C++ messaging applications is used, qpid-stat on the server reports multiple connections where there is supposed to be just one - the application has exactly one Connection object with one Session. Example output from "qpid-stat -c"; names are changed and entries for other processes are removed:

Connections
  connection                                   cproc cpid    mech       auth       connected        idle msgIn  msgOut
=============================================================================================================================================
  qpid.10.0.1.215:5672-10.0.2.61:37488   the-application 171170  ANONYMOUS  anonymous  5d 21h 43m 13s   5d 16h 34m 7s 778k  1.61m   qpid.10.0.1.215:5672-10.0.2.61:37492   the-application 171170  ANONYMOUS  anonymous  5d 16h 34m 20s   5d 13h 45m 27s 368k   940k   qpid.10.0.1.215:5672-10.0.2.61:37496   the-application 171170  ANONYMOUS  anonymous  5d 13h 45m 35s   5d 8h 36m 7s 637k  1.53m   qpid.10.0.1.215:5672-10.0.2.61:37498   the-application 171170  ANONYMOUS  anonymous  5d 8h 36m 14s    5d 6h 24m 47s 269k   794k   qpid.10.0.1.215:5672-10.0.2.61:37502   the-application 171170  ANONYMOUS  anonymous  5d 6h 24m 57s    5d 4h 25m 17s 248k   761k   qpid.10.0.1.215:5672-10.0.2.61:37504   the-application 171170  ANONYMOUS  anonymous  5d 4h 25m 22s    4d 22h 55m 47s 678k  1.70m   qpid.10.0.1.215:5672-10.0.2.61:37508   the-application 171170  ANONYMOUS  anonymous  4d 22h 56m 4s    4d 18h 50m 7s 514k  1.42m   qpid.10.0.1.215:5672-10.0.2.61:37510   the-application 171170  ANONYMOUS  anonymous  4d 18h 50m 16s   4d 14h 39m 17s 525k  1.43m   qpid.10.0.1.215:5672-10.0.2.61:37516   the-application 171170  ANONYMOUS  anonymous  4d 11h 57m 23s   4d 2h 52m 47s 2.61m  1.51m   qpid.10.0.1.215:5672-10.0.2.61:37530   the-application 171170  ANONYMOUS  anonymous  3d 21h 20m 43s   3d 11h 41m 7s 2.90m  1.52m   qpid.10.0.1.215:5672-10.0.2.61:37542   the-application 171170  ANONYMOUS  anonymous  3d 6h 55m 24s    2d 22h 29m 27s 2.40m  1.46m   qpid.10.0.1.215:5672-10.0.2.61:37550   the-application 171170  ANONYMOUS  anonymous  2d 16h 42m 4s    2d 8h 7m 37s 2.41m  1.51m   qpid.10.0.1.215:5672-10.0.2.61:37562   the-application 171170  ANONYMOUS  anonymous  2d 1h 13m 47s    1d 14h 28m 7s 3.01m  1.81m   qpid.10.0.1.215:5672-10.0.2.61:37570   the-application 171170  ANONYMOUS  anonymous  1d 10h 42m 8s    1d 1h 54m 47s 2.45m  1.53m   qpid.10.0.1.215:5672-10.0.2.61:37586   the-application 171170  ANONYMOUS  anonymous  9h 8m 51s        0s 2.62m  1.49m

It turns out that addition of new connections coincide with log message like

2020-04-15 09:10:25 [Client] info Trying to connect to cirm...
2020-04-15 09:10:25 [System] info Connecting: 10.0.1.215:5672
2020-04-15 09:10:25 [Client] info Connection [10.0.2.61:37576-cirm:5672] connected to tcp:cirm:5672
2020-04-15 09:10:25 [Client] info Connected to cirm
2020-04-15 09:10:25 [Client] error session-busy: Session detached by peer

This is from "info+" logging by the QPid library; date and time is added by a log handler in the application. This sequence of events is triggered by Sender::send() or Receiver::receive(), i.e. we're looking at an automatic reconnect.

So again, I'm wondering what's going on. I mean, if the session is "detached", why is a connection still listed on the server side? Also, surely there can't be communication issues at this stage, as then the entire reconnect operation would have failed? Or am I missing something?

And more importantly: Is there anything I can do to make the "stale" connections go away? My strategy for recovering from the error is simply

qpid::Messaging::Session session;
qpid::Messaging::Connection connection;

....

session.close();
session=connection.createSession();

But evidently, that's not enough.

The connections disappear if I restart the application, so I suspect connection.close() ; connection.open(); would help, but if that's the case, why? How is what it does different from closing the one and only session? Also, if I do have to close the connection sometimes, how do I know exactly when? Obviously, it's possible to set up everything from scratch on every session or communication error, but is that supposed to be necessary? I've been assuming it isn't, and the the whole application is kind of structured around that.

(I would do more tests, if it wasn't for the fact that I only see the problem on a production system, which also found at a remote location with a very slow link to the rest of the world...)

Help?

Note that heartbeats are enabled with an interval of 5.  It used to be shorter, but I increased it. That didn't really make any difference.

I'm using qpid-cpp version 1.37. AMQP 0-10.

- Toralf


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to