Hi,
This is a problem that has been discussed before, but I kind of reached
a dead end in trying to figure out what was going on then, so I'll try
again:
On one particular site where one of our QPid C++ messaging applications
is used, qpid-stat on the server reports multiple connections where
there is supposed to be just one - the application has exactly one
Connection object with one Session. Example output from "qpid-stat -c";
names are changed and entries for other processes are removed:
Connections
connection cproc cpid mech
auth connected idle msgIn msgOut
=============================================================================================================================================
qpid.10.0.1.215:5672-10.0.2.61:37488 the-application 171170
ANONYMOUS anonymous 5d 21h 43m 13s 5d 16h 34m 7s 778k 1.61m
qpid.10.0.1.215:5672-10.0.2.61:37492 the-application 171170
ANONYMOUS anonymous 5d 16h 34m 20s 5d 13h 45m 27s 368k 940k
qpid.10.0.1.215:5672-10.0.2.61:37496 the-application 171170
ANONYMOUS anonymous 5d 13h 45m 35s 5d 8h 36m 7s 637k 1.53m
qpid.10.0.1.215:5672-10.0.2.61:37498 the-application 171170
ANONYMOUS anonymous 5d 8h 36m 14s 5d 6h 24m 47s 269k 794k
qpid.10.0.1.215:5672-10.0.2.61:37502 the-application 171170
ANONYMOUS anonymous 5d 6h 24m 57s 5d 4h 25m 17s 248k 761k
qpid.10.0.1.215:5672-10.0.2.61:37504 the-application 171170
ANONYMOUS anonymous 5d 4h 25m 22s 4d 22h 55m 47s 678k 1.70m
qpid.10.0.1.215:5672-10.0.2.61:37508 the-application 171170
ANONYMOUS anonymous 4d 22h 56m 4s 4d 18h 50m 7s 514k 1.42m
qpid.10.0.1.215:5672-10.0.2.61:37510 the-application 171170
ANONYMOUS anonymous 4d 18h 50m 16s 4d 14h 39m 17s 525k 1.43m
qpid.10.0.1.215:5672-10.0.2.61:37516 the-application 171170
ANONYMOUS anonymous 4d 11h 57m 23s 4d 2h 52m 47s 2.61m 1.51m
qpid.10.0.1.215:5672-10.0.2.61:37530 the-application 171170
ANONYMOUS anonymous 3d 21h 20m 43s 3d 11h 41m 7s 2.90m 1.52m
qpid.10.0.1.215:5672-10.0.2.61:37542 the-application 171170
ANONYMOUS anonymous 3d 6h 55m 24s 2d 22h 29m 27s 2.40m 1.46m
qpid.10.0.1.215:5672-10.0.2.61:37550 the-application 171170
ANONYMOUS anonymous 2d 16h 42m 4s 2d 8h 7m 37s 2.41m 1.51m
qpid.10.0.1.215:5672-10.0.2.61:37562 the-application 171170
ANONYMOUS anonymous 2d 1h 13m 47s 1d 14h 28m 7s 3.01m 1.81m
qpid.10.0.1.215:5672-10.0.2.61:37570 the-application 171170
ANONYMOUS anonymous 1d 10h 42m 8s 1d 1h 54m 47s 2.45m 1.53m
qpid.10.0.1.215:5672-10.0.2.61:37586 the-application 171170
ANONYMOUS anonymous 9h 8m 51s 0s 2.62m 1.49m
It turns out that addition of new connections coincide with log message like
2020-04-15 09:10:25 [Client] info Trying to connect to cirm...
2020-04-15 09:10:25 [System] info Connecting: 10.0.1.215:5672
2020-04-15 09:10:25 [Client] info Connection [10.0.2.61:37576-cirm:5672]
connected to tcp:cirm:5672
2020-04-15 09:10:25 [Client] info Connected to cirm
2020-04-15 09:10:25 [Client] error session-busy: Session detached by peer
This is from "info+" logging by the QPid library; date and time is added
by a log handler in the application. This sequence of events is
triggered by Sender::send() or Receiver::receive(), i.e. we're looking
at an automatic reconnect.
So again, I'm wondering what's going on. I mean, if the session is
"detached", why is a connection still listed on the server side? Also,
surely there can't be communication issues at this stage, as then the
entire reconnect operation would have failed? Or am I missing something?
And more importantly: Is there anything I can do to make the "stale"
connections go away? My strategy for recovering from the error is simply
qpid::Messaging::Session session;
qpid::Messaging::Connection connection;
....
session.close();
session=connection.createSession();
But evidently, that's not enough.
The connections disappear if I restart the application, so I suspect
connection.close() ; connection.open(); would help, but if that's the
case, why? How is what it does different from closing the one and only
session? Also, if I do have to close the connection sometimes, how do I
know exactly when? Obviously, it's possible to set up everything from
scratch on every session or communication error, but is that supposed to
be necessary? I've been assuming it isn't, and the the whole application
is kind of structured around that.
(I would do more tests, if it wasn't for the fact that I only see the
problem on a production system, which also found at a remote location
with a very slow link to the rest of the world...)
Help?
Note that heartbeats are enabled with an interval of 5. It used to be
shorter, but I increased it. That didn't really make any difference.
I'm using qpid-cpp version 1.37. AMQP 0-10.
- Toralf
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]