Hi Ken, Had a play with oprofile... but seems to have lumped everything into glibc, any ideas? The queue is setup as ring, max size 2GB.
# qpid-stat -q Queues queue dur autoDel excl msg msgIn msgOut bytes bytesIn bytesOut cons bind ========================================================================================================================== 9d587baf-edc4-694b-bd43-f0accdf77a44:0.0 Y Y 0 0 0 0 0 0 1 2 ingest Y 16.4k 49.9k 33.5k 2.13g 7.17g 5.04g 1 2 # opreport --long-filenames --session-dir=/root/oprof CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt TIMER:0| samples| %| ------------------ 78847 100.000 /usr/sbin/qpidd TIMER:0| samples| %| ------------------ 53196 67.4674 /usr/lib64/libstdc++.so.6.0.8 14220 18.0349 /usr/lib/libqpidbroker.so.2.0.0 7368 9.3447 /lib64/libc-2.5.so 3833 4.8613 /usr/lib/libqpidcommon.so.2.0.0 93 0.1179 /usr/lib/libqpidtypes.so.1.0.0 70 0.0888 /lib64/libpthread-2.5.so 43 0.0545 /lib64/ld-2.5.so 19 0.0241 /lib64/librt-2.5.so 4 0.0051 /lib64/libuuid.so.1.2 1 0.0013 /usr/sbin/qpidd # opreport --demangle=smart --session-dir=/root/oprof --symbols `which qpidd` CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % image name symbol name 53196 67.4674 libstdc++.so.6.0.8 /usr/lib64/libstdc++.so.6.0.8 10597 13.4400 libqpidbroker.so.2.0.0 void deque<qpid::framing::AMQFrame>::_M_range_insert_aux<_Deque_iterator<qpid::framing::AMQFrame, qpid::framing::AMQFrame&, qpid::framing::AMQFrame*>>(_Deque_iterator<qpid::framing::AMQFrame, qpid::framing::AMQFrame&, qpid::framing::AMQFrame*>, _Deque_iterator<qpid::framing::AMQFrame, qpid::framing::AMQFrame&, qpid::framing::AMQFrame*>, _Deque_iterator<qpid::framing::AMQFrame, qpid::framing::AMQFrame&, qpid::framing::AMQFrame*>, forward_iterator_tag) 2922 3.7059 libqpidcommon.so.2.0.0 qpid::framing::AMQFrame::~AMQFrame() 2833 3.5930 libqpidbroker.so.2.0.0 deque<qpid::framing::AMQFrame>::clear() 2486 3.1529 libc-2.5.so _int_malloc 1882 2.3869 libc-2.5.so _int_free 1757 2.2284 libc-2.5.so malloc 589 0.7470 libc-2.5.so memcpy 384 0.4870 libc-2.5.so free ... Cheers, Jimmy ----- Original Message ----- From: Ken Giusti Sent: 09/06/13 06:27 PM To: [email protected] Subject: Re: System stalling Hi Jimmy, Have you ever used the oprofile tool before? http://oprofile.sourceforge.net/about/ I've found this tool useful when I need to get a sense of where the broker is spending its time, especially when it is compute-bound. You'll need to be able to install oprofile on the system that is running the broker, and you'd need root permission to run it. The approach I take is to configure oprofile to analyze the broker, then perform whatever actions I need to get the broker to into the compute bound state. Once the broker is acting up, I then trigger oprofile to start a capture. That results in a capture that best represents what the broker is doing when it is in that compute bound state. It's been awhile since I used oprofile, but here's a summary of the commands I used last. First, the path to the broker executable for this example is /home/kgiusti/mrg/qpid/cpp/src/.libs/lt-qpidd. Be sure you're referencing the actual executable image an not the shell wrapper that autotools generates! After starting the broker daemon, I delete any old oprofile configuration and captures. I then configure and start the oprofile daemon using the following commands (done as root): $ rm -rf /root/oprof $ rm -rf ~/.oprofile $ opcontrol --shutdown $ opcontrol --init $ opcontrol --reset $ opcontrol --setup --no-vmlinux --session-dir=/root/oprof --image=/home/kgiusti/mrg/qpid/cpp/src/.libs/lt-qpidd --separate=library --event=INST_RETIRED_ANY_P:6000:0:0:1 --cpu-buffer-size=1000000 $ opcontrol --start-daemon Once that is done, you should try to reproduce the problem. Once the broker is in that weird state, start the capture: $ opcontrol --start Capture for a while, then stop the capture and dump the results: $ opcontrol --stop $ opreport --long-filenames --session-dir=/root/oprof opreport will dump the methods where the broker is spending most of its compute time. You might need to also provide the paths to the link libraries, e.g.: $ opreport --long-filenames --session-dir=/root/oprof -l /home/kgiusti/mrg/qpid/cpp/src/.libs/libqpidbroker.so.2.0.0 These notes are a bit old, and opcontrol/opreport's options may have changed a bit, but this should give you a general idea of how to use it. -K ----- Original Message ----- > From: "Jimmy Jones" <[email protected]> > To: [email protected] > Sent: Friday, September 6, 2013 9:50:17 AM > Subject: Re: System stalling > > I've done some further digging, and managed to simplify the system a little > to reproduce the problem. The system is now an external process that posts > messages to the default headers exchange on my machine, which has a ring > queue to receive effectively all messages from the default headers exchange, > process them, and post to another headers exchange. There is now nothing > listening on the subsequent headers exchange, and all exchanges are > non-durable. I've also tried Fraser's suggestion of marking the link as > unreliable on the queue which seems to have no effect (is there any way in > the qpid utilities to confirm the link has been set to unreliable?) > > So essentially what happens is the system happily processes away, normally > with an empty ring queue, sometimes it spikes up a bit and goes back down > again, with my ingest process using ~70% CPU and qpidd ~50% CPU, on a > machine with 8 CPU cores. However sometimes the queue spikes up to 2GB (the > max), starts throwing messages away, and qpid hits 100%+ CPU and the ingest > process goes to about 3% CPU. I can see messages are being very slowly > processed. > > I've tried attaching to qpidd with gdb a few times, and all threads apart > from one seem to be idle in epoll_wait or pthread_cond_wait. The running > thread always seems to be somewhere under DispatchHandle::processEvent. > > I'm at a bit of a loss for what I can do to fix this! > > Jimmy > > ----- Original Message ----- > From: Fraser Adams > Sent: 08/23/13 09:09 AM > To: [email protected] > Subject: Re: System stalling > Hi Jimmy, hope you are well! > As an experiment one thing that you could try is messing with the link > "reliability". As you know in the normal mode of operation it's > necessary to periodically send acknowledgements from the consumer client > application which then get passed back ultimately to the broker. > > I'm no expert on this but from my recollection if you are in a position > particularly where circular queues are overflowing and you are > continually trying to produce and consume and you have some fair level > of prefetch/capacity on the consumer the mechanism for handling the > acknowledgements on the broker is "sub-optimal" - I think it's a linear > search or some such and there are conditions where catching up with > acknowledgements becomes a bit "N squared". > > Gordon would be able to explain this way better than me - that's > assuming this hypothesis is even relevant :-) > > Anyway if you try having a link: {reliability: unreliable} stanza in > your consumer address string (as an example one of mine looks like the > following - the address sting syntax isn't exactly trivial :-)). > > string address = "test_consumer; {create: receiver, node: {x-declare: > {auto-delete: True, exclusive: True, arguments: {'qpid.policy_type': > ring, 'qpid.max_size': 100000000}}, x-bindings: [{exchange: 'amq.match', > queue: 'test_consumer', key: 'test1', arguments: {x-match: all, > data-format: test}}]}, link: {reliability: unreliable}}"; > > Clearly your arguments would be different but hopefully it'll give you a > kick start. > > > The main down side of disabling link reliability is that if you have > enabled prefetch and the consumer unexpectedly dies then all of the > messages on the prefetch queue will be lost, whereas with reliable > messaging the broker maintains references to all unacknowledged messages > so would resent them (I *think* that's how it works.....) > > > At the very least it's a fairly simple tweak to your consumer addresses > that might rule out (or point to) acknowledgement shenanigans as being > the root of your problem. From my own experience I always end up blaming > this first if I hit performance weirdness with ring queues :-) > > HTH, > Frase > > > > On 21/08/13 17:08, Jimmy Jones wrote: > >>>>> I've got an simple processing system using the 0.22 C++ broker, all > >>>>> on one box, where an external system posts messages to the default > >>>>> headers exchange, and an ingest process receives them using a ring > >>>>> queue, transforms them and outputs to a different headers exchange. > >>>>> Various other processes pick messages of interest off that exchange > >>>>> using ring queues. Recently however the system has been stalling - > >>>>> I'm still receiving lots of data from the other system, but the > >>>>> ingest process suddenly goes to <5% CPU usage and its queue fills up > >>>>> and messages start getting discarded from the ring, the follow on > >>>>> processes go to practically 0% CPU and qpidd hovers around 95-120% > >>>>> CPU (normally its ~75%) and the rest of the system pretty much goes > >>>>> idle (no swapping, there is free memory) > >>>>> > >>>>> I attached to the ingest process with gdb and it was stuck in send > >>>>> (waitForCapacity/waitForCompletionImpl) - I notice this can block. > >>>> Is there any queue bound to the second headers exchange, i.e. to the one > >>>> this ingest process is sending to, that is not a ring queue? (If you run > >>>> qpid-config queue -r, you get a quick listing of the queues and their > >>>> bindings). > >>> I've run qpid-config queue, and all my queues have --limit-policy=ring, > >>> apart > >>> from a UUID one which I presume is qpid-config itself. Are there any > >>> other useful > >>> debugging things I can do? > >> What does qpid-stat -q show? Is it possible to test whether the broker > >> is still responsive e,g, by sending and receiving messages through a > >> test queue/exchange? Are there any errors in the logs? Are any of the > >> queues durable (and messages persistent)? > > qpid-stat -q is all zero's in the msg & bytes column, apart from the ingest > > queue, > > and another overflowing ring queue I have. > > > > I did run qpid-tool when the system was broken to dump some stats. > > msgTotalDequeues > > was slowly incremeneting on the ingest queue, so I presume messages were > > still being > > delivered and the broker was responsive? > > > > The only logging I've got is syslog, and I just see a warning about unsent > > data, > > presumably when the ingest process receives a SIGALARM. I'm happy to swich > > on more > > logging, what would you recommend? > > > > None of my queues are durable, but I think incoming messages from the other > > system > > are marked as durable. The exchange that the ingest process sends to is > > durable, > > but I'm not setting any durable flags on outgoing messages (I presume the > > default > > is off). > > > >> Another thing might be a ptrace of the broker process. Maybe two or > >> three with a short delay between them. > > I'll try this next time it goes haywire. > > > >> For some reason it seems like the broker is not sending back > >> confirmation to the sender in the ingest process, causing that to block. > >> Ring queues shouldn't be subject to producer flow control so we need to > >> figure out what other reason there could be for that. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- -K --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
