On 08/03/2012 11:42 AM, Fraser Adams wrote:
Hello all, I've previously posted with a problem whereby in a federated set up we've been seeing brokers eat memory. In that case what was weird was that when the producer client was pointed to a broker on the same box we'd start to see problems, but when it was pointed to a broker on a remote box it largely seemed to be stable. It drove us nuts, but we were able to park it for a while. We've recently started seeing it again where we really had to use a co-located broker.One thing that seems interesting is that it happens most when the consumer was in some way behaving slower than it should. We're publishing messages to amq.match and the queue the consumers receive from is a RING queue, so as far as I can see (intuitively at least!!) it shouldn't really matter if the consumer is slow - the circular queue should just overwrite the oldest messages. One of my colleagues wrote a couple of little programs using the qpid::client API to reproduce a bursty producer and a slow consumer and this seems to reproduce the problem - basically despite a circular queue both qpidd and client-consumer memory consumption grows and I start swapping madly on my 4G box despite the queue only being 100MB. I'm more familiar with qpid::messaging, so I tried to reproduce the problem using that, what's interesting is that my first attempt couldn't reproduce it - but of course the APIs behave differently. Then I started wondering about flow control/capacity. I usually do setCapacity(500) - for no other reason than I think that's what the default is for the JMS API. Now with that I was using more qpidd memory than I'd expect with a 100M circular queue, but I then reduced it to 100 then 10 and realised that the capacity (which I thought related to prefetch on the client) was affecting both client and qpidd memory consumption. I also noticed that doing "link: {reliability: unreliable}" helped.
For a reliable receiver, the broker needs to keep track of all unacknowledged messages. That will include all prefetched messages (depending on the frequency at which acknowledgements are actually sent it may include fetched messages also of course).
In the current broker these records will include a reference to the message which prevents it being deleted, even if in the case of a ring queue it is overwritten.
If the receiver is unreliable (no explicit acknowledgement expected) then (at least in recent brokers) the record will no longer reference the message, but simply keep track of the amount of byte credit it consumed in order to move the window correctly.
I tried enabling flow control in the client-consumer to no avail (I struggle to understand that API!!) - I thought adding SubscriptionSettings settings; settings.autoAck = 100; settings.flowControl = FlowControl::messageCredit(200); subscription = subscriptions.subscribe(*this, queue, settings); to my prepareQueue() method would be the way to do it, but that just seemed to cause my consumer to hang after it had received 200 messages.
Yes, you would need to use FlowControl::messageWindow() instead; the window will be moved automatically by the library unless you choose to handle completions manually. By default qpid::client subscriptions use unlimited credit (meaning the prefetch is unbounded).
(Yes, I agree its not the easiest API to use and has a fair amount of incidental complexity, hence qpid::messaging!)
So my messaging-consumer can be made to behave in a way that at least makes some sense, but what concerns me is back to the problem of federated links - as I say we're seeing terrible resource leaks and I assume that the federation bridge code is closer to what qpid::client is doing, so I've no idea if we can configure a federated link to "honour" the behaviour that we'd expect from a circular queue (we use the default behaviour which *should* we using unreliable messaging).
Indeed, if you are not using the --ack option then there are no acknowledgements required. Though the federation subscriptions do use infinite credit, this should at least prevent holding on to messages. What version of the broker are you using now? (I recall at one time there was an older broker in part of the system?)
hope this makes sense. I've attached the producer and two consumers that I've been using to try this stuff out. I'd appreciate any thoughts and especially any mitigations, this is starting to cause us real problems.
I could probably get you a simple patch if applying that and rebuilding is an option.
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
