Re: Flow Control/Performance tuning help required

Andy Goldstein Fri, 23 Sep 2011 07:06:31 -0700

On Sep 23, 2011, at 9:11 AM, Fraser Adams wrote:

> 
> I'll mention that to the guys when I get back to the office. Though it seems 
> a bit counterintuitive to me I'd have thought that having a lower number of 
> worker threads wouldn't utilise the available cores. By "logic" running two 
> (or even eight) worker threads on your 48 core server seems low - any idea 
> what's going on to explain your results??


I can't say for sure, but I would guess there's maybe more lock contention 
going on in the broker when you have more threads.

> So have you reproduced the MRG paper results? That paper, which is over three 
> years old now, has figures of 380,000 256 octet messages in plus out on a 2 x 
> 4 core Xeon box. We've not come *close* to that figure and my developers are 
> far from dummies. The paper describes the methodology quite well, but doesn't 
> quite spell out as a tutorial exactly what the setup was.

What numbers are you getting, and how are you testing?

> 
> I don't suppose you (or anyone else) has any help on the other part of my 
> question about consumer flow control??

I'm not too familiar with consumer flow control, unless you're talking about 
using a prefetch capacity on a receiver.

Andy


> Cheers,
> Frase
> 
> Andy Goldstein wrote:
>> As an experiment, try lowering the # of worker threads for the broker.  For 
>> example, we saw an order of magnitude increase in performance when we 
>> dropped worker threads from 8 to 2 (on a 48-core server).  Our test involved 
>> creating a ring queue with a max queue count of 250,000 messages.  We 
>> pre-filled the queue with 259 byte messages, and then had a multi-threaded 
>> client start at least 3 threads, 1 connection/session/sender per thread, and 
>> had them try to send as many 259 byte messages/second as possible.  
>> Decreasing the # of worker threads in the broker gave us better throughput.
>> 
>> Andy
>> 
>> On Sep 23, 2011, at 8:05 AM, Fraser Adams wrote:
>> 
>>  
>>> Hi Andy,
>>> I'm afraid that I can't tell you for sure as I'm doing this a bit by 
>>> "remote control" (I've tasked some of my developers to try and replicate 
>>> the MRG whitepaper throughput results to give us a baseline top level 
>>> performance figure).
>>> 
>>> However when I last spoke to them they had tried sending a load of ~900 
>>> octet messages to a ring queue set to 2GB, but to rule out any memory 
>>> issues (shouldn't be as the box has 24GB) they have also tried with a ring 
>>> queue of the default size of 100M - they got the same problem, it just 
>>> happened a lot sooner obviously.
>>> 
>>> Fraser
>>> 
>>> 
>>> Andy Goldstein wrote:
>>>    
>>>> Hi Fraser,
>>>> 
>>>> How many messages can the ring queue hold before it starts dropping old 
>>>> messages to make room for new ones?
>>>> 
>>>> Andy
>>>> 
>>>> On Sep 23, 2011, at 5:21 AM, Fraser Adams wrote:
>>>> 
>>>>       
>>>>> Hello all,
>>>>> I was chatting to some colleagues yesterday who are trying to do some 
>>>>> stress testing and have noticed some weird results.
>>>>> 
>>>>> I'm afraid I've not personally reproduced this yet, but I wanted to post 
>>>>> on a Friday whilst the list was more active.
>>>>> 
>>>>> The set up is firing off messages of ~900 octets in size into a queue 
>>>>> with a ring limit policy and I'm pretty sure they are using Qpid 0.8
>>>>> 
>>>>> As I understand it they have a few producers and a consumers and the 
>>>>> "steady state" message rate is OKish, but if they kill off a couple of 
>>>>> consumers to force the queue to start filling what seems to happen (as 
>>>>> described to me) is that when the (ring) queue fills up to its limit (and 
>>>>> I guess starts overwriting) the consumer rate plummets massively.
>>>>> 
>>>>> As I say I've not personally tried this yet, but as it happens another 
>>>>> colleague was doing something independently and he reported something 
>>>>> similar. He was using the C++ qpid::client API and from what I can gather 
>>>>> did a bit of digging and found a command to disable consumer flow 
>>>>> control, which seemed to solve his particular issue.
>>>>> 
>>>>> 
>>>>> Do the scenarios above sound like flow control issues? I'm afraid I've 
>>>>> not looked much at this and the only documentation I can find relates to 
>>>>> the producer flow control feature introduced in 0.10 which isn't 
>>>>> applicable here as a) the issues were seen in a 0.8 broker and b) as far 
>>>>> as the doc goes producer flow control isn't applied on ring queues.
>>>>> 
>>>>> The colleague who did the tinkering on qpid::client I believe figured it 
>>>>> out from the low-level doxygen API documentation, but I've not seen 
>>>>> anything in the higher level documents and I've certainly not seen 
>>>>> anything in the qpid::messaging or JMS stuff (which is mostly where my 
>>>>> own experience comes from). I'd definitely like to be able to disable it 
>>>>> from Java and qpid::messaging too.
>>>>> 
>>>>> 
>>>>> I'd appreciate a brain dump of distilled flow control knowledge that I 
>>>>> can pass on if that's possible!!!
>>>>> 
>>>>> 
>>>>> As an aside, another thing seemed slightly weird to me. My colleagues are 
>>>>> running an a 16 core Linux box and the worker threads are set to 17 as 
>>>>> expected however despite running with I think 8 producers and 32 
>>>>> consumers the CPU usage reported by top maxes out at 113% this seems 
>>>>> massively low on a 16 core box and I'd have hoped to see a massively 
>>>>> higher message rate than they are actually seeing and the CPU usage 
>>>>> getting closer to 1600%. Is there something "special" that needs to be 
>>>>> done to make best use out of a nice big multicore Xeon box. IIRC the MRG 
>>>>> whitepaper mentions "Use taskset to start qpid-daemon on all cpus". This 
>>>>> isn't something I'm familiar with but looks like it relates to CPU 
>>>>> affinity, but to my mind that doesn't account for maxing out at only a 
>>>>> fraction of the available CPU capacity (it's not network bound BTW).
>>>>> 
>>>>> 
>>>>> Are there any tutorials on how to obtain the absolute maximum super turbo 
>>>>> message throughput :-) We're not even coming *close* to the figures 
>>>>> quoted in the MRG whitepaper despite running of more powerful hardware, 
>>>>> so I'm assuming we're doing something wrong unless the MRG figures are 
>>>>> massively exaggerated.
>>>>> 
>>>>> 
>>>>> Many thanks
>>>>> Frase
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> Apache Qpid - AMQP Messaging Implementation
>>>>> Project:      http://qpid.apache.org
>>>>> Use/Interact: mailto:[email protected]
>>>>> 
>>>>>           
>>>> ---------------------------------------------------------------------
>>>> Apache Qpid - AMQP Messaging Implementation
>>>> Project:      http://qpid.apache.org
>>>> Use/Interact: mailto:[email protected]
>>>> 
>>>> 
>>>>       
>>> ---------------------------------------------------------------------
>>> Apache Qpid - AMQP Messaging Implementation
>>> Project:      http://qpid.apache.org
>>> Use/Interact: mailto:[email protected]
>>> 
>>>    
>> 
>> 
>> ---------------------------------------------------------------------
>> Apache Qpid - AMQP Messaging Implementation
>> Project:      http://qpid.apache.org
>> Use/Interact: mailto:[email protected]
>> 
>> 
>>  
> 
> 
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:[email protected]
> 


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

Re: Flow Control/Performance tuning help required

Reply via email to