As an experiment, try lowering the # of worker threads for the broker.  For 
example, we saw an order of magnitude increase in performance when we dropped 
worker threads from 8 to 2 (on a 48-core server).  Our test involved creating a 
ring queue with a max queue count of 250,000 messages.  We pre-filled the queue 
with 259 byte messages, and then had a multi-threaded client start at least 3 
threads, 1 connection/session/sender per thread, and had them try to send as 
many 259 byte messages/second as possible.  Decreasing the # of worker threads 
in the broker gave us better throughput.

Andy

On Sep 23, 2011, at 8:05 AM, Fraser Adams wrote:

> Hi Andy,
> I'm afraid that I can't tell you for sure as I'm doing this a bit by "remote 
> control" (I've tasked some of my developers to try and replicate the MRG 
> whitepaper throughput results to give us a baseline top level performance 
> figure).
> 
> However when I last spoke to them they had tried sending a load of ~900 octet 
> messages to a ring queue set to 2GB, but to rule out any memory issues 
> (shouldn't be as the box has 24GB) they have also tried with a ring queue of 
> the default size of 100M - they got the same problem, it just happened a lot 
> sooner obviously.
> 
> Fraser
> 
> 
> Andy Goldstein wrote:
>> Hi Fraser,
>> 
>> How many messages can the ring queue hold before it starts dropping old 
>> messages to make room for new ones?
>> 
>> Andy
>> 
>> On Sep 23, 2011, at 5:21 AM, Fraser Adams wrote:
>> 
>>  
>>> Hello all,
>>> I was chatting to some colleagues yesterday who are trying to do some 
>>> stress testing and have noticed some weird results.
>>> 
>>> I'm afraid I've not personally reproduced this yet, but I wanted to post on 
>>> a Friday whilst the list was more active.
>>> 
>>> The set up is firing off messages of ~900 octets in size into a queue with 
>>> a ring limit policy and I'm pretty sure they are using Qpid 0.8
>>> 
>>> As I understand it they have a few producers and a consumers and the 
>>> "steady state" message rate is OKish, but if they kill off a couple of 
>>> consumers to force the queue to start filling what seems to happen (as 
>>> described to me) is that when the (ring) queue fills up to its limit (and I 
>>> guess starts overwriting) the consumer rate plummets massively.
>>> 
>>> As I say I've not personally tried this yet, but as it happens another 
>>> colleague was doing something independently and he reported something 
>>> similar. He was using the C++ qpid::client API and from what I can gather 
>>> did a bit of digging and found a command to disable consumer flow control, 
>>> which seemed to solve his particular issue.
>>> 
>>> 
>>> Do the scenarios above sound like flow control issues? I'm afraid I've not 
>>> looked much at this and the only documentation I can find relates to the 
>>> producer flow control feature introduced in 0.10 which isn't applicable 
>>> here as a) the issues were seen in a 0.8 broker and b) as far as the doc 
>>> goes producer flow control isn't applied on ring queues.
>>> 
>>> The colleague who did the tinkering on qpid::client I believe figured it 
>>> out from the low-level doxygen API documentation, but I've not seen 
>>> anything in the higher level documents and I've certainly not seen anything 
>>> in the qpid::messaging or JMS stuff (which is mostly where my own 
>>> experience comes from). I'd definitely like to be able to disable it from 
>>> Java and qpid::messaging too.
>>> 
>>> 
>>> I'd appreciate a brain dump of distilled flow control knowledge that I can 
>>> pass on if that's possible!!!
>>> 
>>> 
>>> As an aside, another thing seemed slightly weird to me. My colleagues are 
>>> running an a 16 core Linux box and the worker threads are set to 17 as 
>>> expected however despite running with I think 8 producers and 32 consumers 
>>> the CPU usage reported by top maxes out at 113% this seems massively low on 
>>> a 16 core box and I'd have hoped to see a massively higher message rate 
>>> than they are actually seeing and the CPU usage getting closer to 1600%. Is 
>>> there something "special" that needs to be done to make best use out of a 
>>> nice big multicore Xeon box. IIRC the MRG whitepaper mentions "Use taskset 
>>> to start qpid-daemon on all cpus". This isn't something I'm familiar with 
>>> but looks like it relates to CPU affinity, but to my mind that doesn't 
>>> account for maxing out at only a fraction of the available CPU capacity 
>>> (it's not network bound BTW).
>>> 
>>> 
>>> Are there any tutorials on how to obtain the absolute maximum super turbo 
>>> message throughput :-) We're not even coming *close* to the figures quoted 
>>> in the MRG whitepaper despite running of more powerful hardware, so I'm 
>>> assuming we're doing something wrong unless the MRG figures are massively 
>>> exaggerated.
>>> 
>>> 
>>> Many thanks
>>> Frase
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> Apache Qpid - AMQP Messaging Implementation
>>> Project:      http://qpid.apache.org
>>> Use/Interact: mailto:[email protected]
>>> 
>>>    
>> 
>> 
>> ---------------------------------------------------------------------
>> Apache Qpid - AMQP Messaging Implementation
>> Project:      http://qpid.apache.org
>> Use/Interact: mailto:[email protected]
>> 
>> 
>>  
> 
> 
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:[email protected]
> 


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

Reply via email to