On 09/06/2013 02:50 PM, Jimmy Jones wrote:
I've done some further digging, and managed to simplify the system a
little to reproduce the problem. The system is now an external
process that posts messages to the default headers exchange on my
machine, which has a ring queue to receive effectively all messages
from the default headers exchange, process them, and post to another
headers exchange. There is now nothing listening on the subsequent
headers exchange, and all exchanges are non-durable. I've also tried
Fraser's suggestion of marking the link as unreliable on the queue
which seems to have no effect (is there any way in the qpid utilities
to confirm the link has been set to unreliable?)

So essentially what happens is the system happily processes away,
normally with an empty ring queue, sometimes it spikes up a bit and
goes back down again, with my ingest process using ~70% CPU and qpidd
~50% CPU, on a machine with 8 CPU cores. However sometimes the queue
spikes up to 2GB (the max), starts throwing messages away, and qpid
hits 100%+ CPU and the ingest process goes to about 3% CPU. I can see
messages are being very slowly processed.

I've tried attaching to qpidd with gdb a few times, and all threads
apart from one seem to be idle in epoll_wait or pthread_cond_wait.
The running thread always seems to be somewhere under
DispatchHandle::processEvent.

In this simplified system, is the ingest process still blocking on waitForCompletion() in send()?

If so, I think that is the key symptom. That slows down the processing of messages into the ingest process, which in turn causes the producer rate to the input queue to exceed the consume rate, the queue backs up and then messages need to be dropped.

The question is why the completions aren't being sent by the broker for the messages resent by the ingest process. You don't have any queues bound to the exchange they are being sent to. Do you have an alternate-excahnge specified for that second headers exchange? (And if so, what if any, queues are bound to that)? What are the stats for that second exchange at the point the problem occurs (qpid-stat -e)? What is the capacity of the sender in the ingest process (or is it left at the default value)? Is it the only sender on the session?

What level of logging do you have on? If you don't have it already, maybe see if you can reproduce with logging at info+, just in case there are any clues there.

When this situation occurs, if you stop the external process, does the system eventually clear itself, or does the ingest process remain blocked once it gets into that situation?

That's rather a lot of questions I'm afraid... just looking for some clue to latch on to.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to