On 03/14/2012 06:45 PM, Fraser Adams wrote:
As far as I'm aware we're not using acknowledgements, what I mean is
that I believe that the default with qpid-route is for no acks, I think
it sends messages via an unreliable link. I thought that you explicitly
had to enable acks with the --ack= option in qpid-route? have I missed
something?

No, that is correct, just checking.

[...]
The route is a source route because we don't want the performance
penalty of persistence and can tolerate some message loss and by using
source routes if the producer fails and restarts we don't get into the
pain whereby "normal" routes would try to reconnect the link then
silently fail because the queue isn't yet in place, so with source
routes we have a script that when the broker starts it re-creates the
queue and the binding then re-establishes the route in the correct order.

Just btw, you can mark the queue as durable without making the message persistent, in which case there would be no performance penalty.

So it's not all that complicated, but it's driving me nuts that when the
source broker is co-located with the producer we have a memory increase,
but when we host the source broker on a different box it seems to be fine.

That is very weird and my top suspicion would be that there were different versions of qpid on the two boxes. Can you run colocated on the 'other' box to rule that in our out? Or have you verified that they are using the same version of qpid?

As I said in previous posts it seems to be exacerbated when the network
is dodgy so I too initially suspected an acknowledgement issue, but as I
say as far as I'm aware federation links are unreliable and don't
require acknowledgement "by default".

[...]

BTW to answer one of Gordon's previous questions we're running RHEL5.4
however your comment "where a the per-thread pools of memory didn't work
well in the case of a thread that always worked on producing (hence
allocating) and another that did all the consuming (hence freeing).". I
*guess* we must have a similar situation to that with a single queue on
the source broker and our producer delivering data to amq.match on that
and the federation link consuming, but it is RHEL5.4 not RHEL6 I've no
idea how we'd identify if this scenario is affecting us too - is there a
way to work out if that issue is actually kicking in?

Its a RHEL6 only issue relating to memory allocators in glibc. If both boxes are 5.4 we can rule that out.

Also I'm thinking it's not a Boost issue (which is one of the other
straws I was clutching) we've checked the server qpid was built on and
the boxes giving pain and they are running the same Boost version (1.33
something I think).

I'd appreciate any more thoughts you guys may have.

To be honest I'm stumped, I'm afraid and can only offer some suggestions on what I might do to search for any further clues...

Just to confirm, you have run qpid-stat -c, qpid-stat -q and qpid-stat -u against a bloating broker? And everything shown there is as expected (not much queue depth, message counts correlating, no unexpected activity)?

When the memory growth starts happening, if you delete and recreate the bridge does that have any effect on growth?

Is it reproducible at all with more detailed logging (ideally --log-enable info+ --log-enable trace+:amqp_0_10)? Obviously logs like that grow pretty quickly so depending on the scale of the leak that may not be feasible. It might give some clues though (then again it might not :-(). Perhaps even a short run from both co-located and remote cases to see if a comparison shows anything up?

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to