Thanks for your reply Gordon, The only qpid warnings I see are the connection timeout messages. Whenever I query it the queues have zero occupancy. The current settings I have are largely the result of my trying some different things to see if they helped.
I started seeing this mystery disconnection when I enabled the broadcast of CAN bus messages received to all the nodes in the system. This is the source of the 4-5 256 byte messages a second data feed I talked about in my original post (but it can be higher). This isn't actually required by anything in the system at the moment, but will be needed later, and I want to replace our current sockets-based messaging system with qpid going forward, so it's worth my time to fix the problem now. Since the difference was a data rate thing I theorised that there could possibly be a backlog in the system. Anyway, I added the 10 second queue purge to prevent the queues from getting too big in case that was the issue. If one of the receiving processes crashed for any reason I saw a fast buildup of messages. We only have 256Mb on our smallest units, so we need to keep a tight lid on memory usage. Anyway, that was when I looked into the defaults of TTL of messages, purge interval and queue size, and applied those settings. I'm setting a 5s TTL on the sent messages. On my specific system, it always seems to be the slave that has the problem, but our test department has seen the issue on their master units too, so I don't think there's anything special about it. OK, so I shouldn't be attempting to reopen the connections. I'll change the code to recreate the connections and sessions, etc. I did originally have it that way round (i.e. recreating them every time), but decided to try changing it for some reason I can't now recall. Thanks for the tips on qpid.policy. I had a lot of trouble finding specs for the format of that string. There are no routes between the two brokers. I just create a sender and receiver with the config string I listed, and I'm implementing fanout behaviour by having a source object own a MyFanoutSender-type class which has the list of BrokerSession objects, each of which encapsulates a single Connection, Session and Sender. The decision to do this is the ad-hoc nature of our network. In the general case, depending on the nature of each node (some are displays, some are purely controllers) the routes I'd need to set up would change dynamically as settings were changed on the nodes, and it seemed cleaner to have lots of independent agents doing their own thing than having something that has to know the gestalt of everything happening in the whole set of qpid senders and receivers on a single node. The error cases on the persisted routes when someone powers off a unit, changes a slave to a master, and then powers up the other one making it a slave made it just a bit too annoying. IIRC the reconnection crash was a uncaught exception from the poller thread. The exception handling is pretty sketchy in our codebase, so my qpid encapsulation objects catch all the exceptions locally. I had no way of catching the exceptions from the poller thread. The qpid poller seemed to be doing the reconnection logic so I took reconnection away to make it more quiescent. The really odd thing is that when this fault happens it appears as if the network interfaces shut down on the target. I can't telnet into them, the applications slow down. I've never seen this happen on an x86 box (though with 6 cores and 12Gb of RAM, it's not likely). Anyway, I'll implement your suggestions to connection drops and the queue configuration and see what the effect is. Thanks! -- View this message in context: http://qpid.2158936.n2.nabble.com/Newbie-problem-with-long-term-use-of-C-broker-client-code-tp7591679p7591709.html Sent from the Apache Qpid users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
