On 03/14/2012 06:45 PM, Fraser Adams wrote:
As far as I'm aware we're not using acknowledgements, what I mean is that I believe that the default with qpid-route is for no acks, I think it sends messages via an unreliable link. I thought that you explicitly had to enable acks with the --ack= option in qpid-route? have I missed something?
No, that is correct, just checking. [...]
The route is a source route because we don't want the performance penalty of persistence and can tolerate some message loss and by using source routes if the producer fails and restarts we don't get into the pain whereby "normal" routes would try to reconnect the link then silently fail because the queue isn't yet in place, so with source routes we have a script that when the broker starts it re-creates the queue and the binding then re-establishes the route in the correct order.
Just btw, you can mark the queue as durable without making the message persistent, in which case there would be no performance penalty.
So it's not all that complicated, but it's driving me nuts that when the source broker is co-located with the producer we have a memory increase, but when we host the source broker on a different box it seems to be fine.
That is very weird and my top suspicion would be that there were different versions of qpid on the two boxes. Can you run colocated on the 'other' box to rule that in our out? Or have you verified that they are using the same version of qpid?
As I said in previous posts it seems to be exacerbated when the network is dodgy so I too initially suspected an acknowledgement issue, but as I say as far as I'm aware federation links are unreliable and don't require acknowledgement "by default".
[...]
BTW to answer one of Gordon's previous questions we're running RHEL5.4 however your comment "where a the per-thread pools of memory didn't work well in the case of a thread that always worked on producing (hence allocating) and another that did all the consuming (hence freeing).". I *guess* we must have a similar situation to that with a single queue on the source broker and our producer delivering data to amq.match on that and the federation link consuming, but it is RHEL5.4 not RHEL6 I've no idea how we'd identify if this scenario is affecting us too - is there a way to work out if that issue is actually kicking in?
Its a RHEL6 only issue relating to memory allocators in glibc. If both boxes are 5.4 we can rule that out.
Also I'm thinking it's not a Boost issue (which is one of the other straws I was clutching) we've checked the server qpid was built on and the boxes giving pain and they are running the same Boost version (1.33 something I think). I'd appreciate any more thoughts you guys may have.
To be honest I'm stumped, I'm afraid and can only offer some suggestions on what I might do to search for any further clues...
Just to confirm, you have run qpid-stat -c, qpid-stat -q and qpid-stat -u against a bloating broker? And everything shown there is as expected (not much queue depth, message counts correlating, no unexpected activity)?
When the memory growth starts happening, if you delete and recreate the bridge does that have any effect on growth?
Is it reproducible at all with more detailed logging (ideally --log-enable info+ --log-enable trace+:amqp_0_10)? Obviously logs like that grow pretty quickly so depending on the scale of the leak that may not be feasible. It might give some clues though (then again it might not :-(). Perhaps even a short run from both co-located and remote cases to see if a comparison shows anything up?
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
