Hi there, I've deployed qpid on our embedded TI platform, and I'm seeing some
odd behaviour with long-term (36-48 hour) use, and I'd like some advice
about whether how I'm configuring and using qpid is the cause of this.
We have a distributed embedded system across up to 8-12 separate boxes, and
I've used Avahi to detect the other units, and then I'm using qpid messages
as the system IPC. We don't have java or python as part of our buildroot
rootfs at the moment, so I've done everything with C++. The network is
fully ad-hoc, so we can't do any static configuration of exchanges and so
on. Nodes can come and go during runtime, and the system needs to deal with
it. I've done it all by the creation of sessions, connections, senders and
receivers. Qpid heartbeat messages are used to detect nodes disappearing.
The problem seem to be with the common situation of "give data to each of
the other nodes" type of problem. I've implemented it as a push-type
operation. The nodes only receive locally, so the node with the data will
send a message to a topic on each of the other nodes it knows about. The
topic has been configured with the following string:
<name> { create: always, assert: never, node : {type: topic, x-declare: {
auto-delete: True, exclusive: False, arguments: {'qpid.policy_type': ring}
I enable a 1s hearbeat on all the connections, but reconnect is off. The
QpidPoller thread spawned by the reconnect would mysteriously crash, so I've
taken it out.
The data profile is pretty modest with messages of perhaps 128 bytes are
sent around the system at about 1Hz, with some other slightly bigger 256
byte messages sent perhaps 3-5Hz.
I keep the connections and sessions open rather than opering them and
closing them a lot.
I've tried to be conservative with failure cases. My send / receive failure
code will close the connection object, bin the message and lets the calling
code deal with the retry. When the retry comes in (which I reject until 3s
is up), I open the connection, and then attempt to call getSender/Receiver
for that broker (doing a createSender/Receiver if the get throws an
exception).
This particular test is with a 2 node system, a "master" which is generally
supplying source data around the system, and a slave node generally just
receiving data.
After several (36-48 hours) I see mystery disconnections going into the
syslog, e.g.
Apr 18 10:42:52 [qpidd] 2013-04-18 10:42:52 [Broker] error Connection
127.0.0.1:5672-127.0.0.1:40549 timed out: closing_
My qpidd.conf has the interesting lines:
/cluster-mechanism=DIGEST-MD5 ANONYMOUS
# Default max size of queue in bytes.
# Default is 104857600 (100Mb), which is a tad high, try 1Mb
default-queue-limit=1024000
# TTL of messages in system. Default is 600s
queue-purge-interval=10/
When I use qpid-tool and qpid-stat from my dev box, I see stats like on the
"slave" node:
Summary of Objects by Type:
Package Class Active Deleted
=======================================================
org.apache.qpid.broker binding 13 280
org.apache.qpid.broker broker 1 0
org.apache.qpid.broker memory 1 0
org.apache.qpid.broker system 1 0
org.apache.qpid.ha habroker 1 0
org.apache.qpid.broker subscription 5 257
org.apache.qpid.broker connection 2 27
org.apache.qpid.broker session 1 23
org.apache.qpid.broker queue 6 147
org.apache.qpid.broker exchange 12 0
org.apache.qpid.broker vhost 1 0
Whereas the "master" node has stats like:
Package Class Active Deleted
=======================================================
org.apache.qpid.broker binding 36 0
org.apache.qpid.broker broker 1 0
org.apache.qpid.broker memory 1 0
org.apache.qpid.broker system 1 0
org.apache.qpid.ha habroker 1 0
org.apache.qpid.broker subscription 23 0
org.apache.qpid.broker connection 7 0
org.apache.qpid.broker session 7 0
org.apache.qpid.broker queue 19 0
org.apache.qpid.broker exchange 12 0
org.apache.qpid.broker vhost 1 0
So clearly I am creating and destroying a lot of bindings, subscriptions and
queues here. If I list the queues on the slave I see a lot of this kind of
thing:
223 20:25:43 -
346.<TopicName>_1af820f0-0224-4f32-8464-081a8020fed4
224 20:25:43 -
346.<TopicName>_20433cbe-e71c-493d-9f2c-70c6caf89680
225 20:25:43 -
346.<TopicName>_20bfcaea-d734-45a6-837b-5f7065178f22
226 20:25:43 -
346.<TopicName>_30f68ad2-8556-4b8b-bf34-3aec299e9270
227 20:25:43 -
346.<TopicName>_3556f8a5-a233-42a8-a0a5-8e1b91aaeb7d
228 20:25:43 -
346.<TopicName>_3ec782b5-ce6c-4ec2-8a60-dec9ef797b18
229 20:25:43 -
346.<TopicName>_4295e25c-163b-41c0-91d8-1a52051a23e0
230 20:25:43 -
346.<TopicName>_43a772f6-42f3-4262-893e-52294bc901be
231 20:25:43 -
346.<TopicName>_489b4bda-3e7e-41f3-92db-6c5308245986
232 20:25:43 -
346.<TopicName>_4a2645fe-bb51-4170-a770-68120c2742b7
233 20:25:43 -
346.<TopicName>_5cc8a979-f50c-4c06-90fd-81ee3b16ee24
234 20:25:43 -
346.<TopicName>_6123edf9-553b-4ca0-8bc8-2305c33e71ec
235 20:25:43 -
346.<TopicName>_750800cc-c913-4cb6-b2bc-4cffa343b335
236 20:25:43 -
346.<TopicName>_880cff8a-fc4b-414f-8046-d3d09fa2e1a7
237 20:25:43 -
346.<TopicName>_a06a7268-89ad-4cc0-b6ec-ca44ef9ee787
238 20:25:43 -
346.<TopicName>_aa4753c5-cc32-40a4-afd4-d677303147df
239 20:25:43 -
346.<TopicName>_b3f15b2e-f93c-4fad-914d-070344343aaa
240 20:25:43 -
346.<TopicName>_b5cf4400-fc1b-4bd2-a8a5-7aa388a44e5d
241 20:25:43 -
346.<TopicName>_bbe3774f-fc6b-4164-bc71-81be966ca598
242 20:25:43 -
346.<TopicName>_c84620a2-7d29-4de0-bfa0-e7f937ea11ed
243 20:25:43 -
346.<TopicName>_c8e3c75d-07ab-4847-9fba-8ce929c3b470
244 20:25:43 -
346.<TopicName>_e1ef9106-07e2-4644-8410-150ad695e0be
Why are so many of these being created? I've put the autodelete on, so I
guess it's the other end at the master somehow keeping the queue in
existence.
So my questions are:
a) It's a wired ethernet connection so I don't think it's connectivity
that's taking down the connections. I've patched the qpid code to use the
monotonic rather than realtime clock so GPS leap seconds (which changes the
system clock) wouldn't cause timeouts. What could be causing it?
a) Is my backoff logic sensible? Should I be recreating the connection /
session / sender / receiver instead of trying to reuse them?
b) What's causing the proliferation of binding, subscription and queue
objects?
c) Are there any other settings I can be supplying in the address or broker
config that will mitigate the effect?
Thanks for your help,
Neil
--
View this message in context:
http://qpid.2158936.n2.nabble.com/Newbie-problem-with-long-term-use-of-C-broker-client-code-tp7591679.html
Sent from the Apache Qpid users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]