As an experiment, try lowering the # of worker threads for the broker. For example, we saw an order of magnitude increase in performance when we dropped worker threads from 8 to 2 (on a 48-core server). Our test involved creating a ring queue with a max queue count of 250,000 messages. We pre-filled the queue with 259 byte messages, and then had a multi-threaded client start at least 3 threads, 1 connection/session/sender per thread, and had them try to send as many 259 byte messages/second as possible. Decreasing the # of worker threads in the broker gave us better throughput.
Andy On Sep 23, 2011, at 8:05 AM, Fraser Adams wrote: > Hi Andy, > I'm afraid that I can't tell you for sure as I'm doing this a bit by "remote > control" (I've tasked some of my developers to try and replicate the MRG > whitepaper throughput results to give us a baseline top level performance > figure). > > However when I last spoke to them they had tried sending a load of ~900 octet > messages to a ring queue set to 2GB, but to rule out any memory issues > (shouldn't be as the box has 24GB) they have also tried with a ring queue of > the default size of 100M - they got the same problem, it just happened a lot > sooner obviously. > > Fraser > > > Andy Goldstein wrote: >> Hi Fraser, >> >> How many messages can the ring queue hold before it starts dropping old >> messages to make room for new ones? >> >> Andy >> >> On Sep 23, 2011, at 5:21 AM, Fraser Adams wrote: >> >> >>> Hello all, >>> I was chatting to some colleagues yesterday who are trying to do some >>> stress testing and have noticed some weird results. >>> >>> I'm afraid I've not personally reproduced this yet, but I wanted to post on >>> a Friday whilst the list was more active. >>> >>> The set up is firing off messages of ~900 octets in size into a queue with >>> a ring limit policy and I'm pretty sure they are using Qpid 0.8 >>> >>> As I understand it they have a few producers and a consumers and the >>> "steady state" message rate is OKish, but if they kill off a couple of >>> consumers to force the queue to start filling what seems to happen (as >>> described to me) is that when the (ring) queue fills up to its limit (and I >>> guess starts overwriting) the consumer rate plummets massively. >>> >>> As I say I've not personally tried this yet, but as it happens another >>> colleague was doing something independently and he reported something >>> similar. He was using the C++ qpid::client API and from what I can gather >>> did a bit of digging and found a command to disable consumer flow control, >>> which seemed to solve his particular issue. >>> >>> >>> Do the scenarios above sound like flow control issues? I'm afraid I've not >>> looked much at this and the only documentation I can find relates to the >>> producer flow control feature introduced in 0.10 which isn't applicable >>> here as a) the issues were seen in a 0.8 broker and b) as far as the doc >>> goes producer flow control isn't applied on ring queues. >>> >>> The colleague who did the tinkering on qpid::client I believe figured it >>> out from the low-level doxygen API documentation, but I've not seen >>> anything in the higher level documents and I've certainly not seen anything >>> in the qpid::messaging or JMS stuff (which is mostly where my own >>> experience comes from). I'd definitely like to be able to disable it from >>> Java and qpid::messaging too. >>> >>> >>> I'd appreciate a brain dump of distilled flow control knowledge that I can >>> pass on if that's possible!!! >>> >>> >>> As an aside, another thing seemed slightly weird to me. My colleagues are >>> running an a 16 core Linux box and the worker threads are set to 17 as >>> expected however despite running with I think 8 producers and 32 consumers >>> the CPU usage reported by top maxes out at 113% this seems massively low on >>> a 16 core box and I'd have hoped to see a massively higher message rate >>> than they are actually seeing and the CPU usage getting closer to 1600%. Is >>> there something "special" that needs to be done to make best use out of a >>> nice big multicore Xeon box. IIRC the MRG whitepaper mentions "Use taskset >>> to start qpid-daemon on all cpus". This isn't something I'm familiar with >>> but looks like it relates to CPU affinity, but to my mind that doesn't >>> account for maxing out at only a fraction of the available CPU capacity >>> (it's not network bound BTW). >>> >>> >>> Are there any tutorials on how to obtain the absolute maximum super turbo >>> message throughput :-) We're not even coming *close* to the figures quoted >>> in the MRG whitepaper despite running of more powerful hardware, so I'm >>> assuming we're doing something wrong unless the MRG figures are massively >>> exaggerated. >>> >>> >>> Many thanks >>> Frase >>> >>> >>> >>> >>> >>> >>> --------------------------------------------------------------------- >>> Apache Qpid - AMQP Messaging Implementation >>> Project: http://qpid.apache.org >>> Use/Interact: mailto:[email protected] >>> >>> >> >> >> --------------------------------------------------------------------- >> Apache Qpid - AMQP Messaging Implementation >> Project: http://qpid.apache.org >> Use/Interact: mailto:[email protected] >> >> >> > > > --------------------------------------------------------------------- > Apache Qpid - AMQP Messaging Implementation > Project: http://qpid.apache.org > Use/Interact: mailto:[email protected] > --------------------------------------------------------------------- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:[email protected]
