Re: Dealing with the "over-prefetch" problem with large numbers of workers and many queue servers

Kevin Burton Fri, 30 Oct 2015 17:31:52 -0700

Oh nice. I'll take a look at this.  This might be just what I need... it
adds complexity but better than refactoring my code if I can avoid it ;)


Kevin

On Thu, Oct 22, 2015 at 12:59 PM, Martin Lichtin <lich...@yahoo.com> wrote:

> Your problem sounds a bit more complex, but just wanted to mentioned that
> one can set usePrefetchExtension=”false”.
> From the docs:
>
> The default behavior of a broker is to use delivery acknowledgements to
> determine the state of a consumer's prefetch buffer. For example, if a
> consumer's prefetch limit is configured as 1 the broker will dispatch 1
> message to the consumer and when the consumer acknowledges receiving the
> message, the broker will dispatch a second message. If the initial message
> takes a long time to process, the message sitting in the prefetch buffer
> cannot be processed by a faster consumer.
>
> If the behavior is causing issues, it can be changed such that the broker
> will wait for the consumer to acknowledge that the message is processed
> before refilling the prefetch buffer. This is accomplished by setting a
> destination policy on the broker to disable the prefetch extension for
> specific destinations.
>
> - Martin
>
>
>
> On 20.10.2015 04:15, Kevin Burton wrote:
>
>> We have a problem whereby we have a LARGE number of workers.  Right now
>> about 50k worker threads on about 45 bare metal boxes.
>>
>> We have about 10 ActiveMQ servers / daemons which service these workers.
>>
>> The problem is that my current design has a session per queue server per
>> thread.   So this means I have about 500k sessions each trying to prefetch
>> 1 message at a time.
>>
>> Since my tasks can take about 30 seconds on average to execute, this means
>> that it takes 5 minutes for a message to be processed.
>>
>> That's a BIG problem in that I want to keep my latencies low!
>>
>> And the BIG downside here is that a lot of my workers get their prefetch
>> buffer filled first, starving out other workers that do nothing...
>>
>> This leads to massive starvation where some of my boxes are at 100% CPU
>> and
>> others are at 10-20% starved for work.
>>
>> So I'm working on a new design where by I use a listener, then I allow it
>> to prefetch and I use a countdown latch from within the message listener
>> to
>> wait for the thread to process the message.  Then I commit the message.
>>
>> This solves the over-prefetch problem because we don't attempt to
>> pre-fetch
>> until the message is processed.
>>
>> Since I can't commit each JMS message one at a time, I'm only left with
>> options that commit the whole session.  This forces me to set prefetch=1
>> otherwise I could commit() and then commit a message that is actually
>> still
>> being processed.
>>
>> This leaves me with a situation where I need to be clever about how I
>> fetch
>> from the queue servers.
>>
>> If I prefetch on ALL queue servers I'm kind of back to where I was to
>> begin
>> with.
>>
>> I was thinking of implementing this solution which should work and
>> minimizes all downsides.  Wanted feedback on this issue.
>>
>> If I have say 1000 worker threads, what I do is allow up to 10% of the nr
>> of worker threads to be pre-fetched and stored in a local queue
>> (ArrayBlockingQueue).
>>
>> In this example this would be 100 messages.
>>
>> The problem now is how to we read in parallel from each server.
>>
>> I think in this situation is that we then allow 10% of the buffered
>> messages from each queue server.
>>
>> So in this case 10 from each.
>>
>> so now we end up with a situation where we're allowed to prefetch 10
>> messages, each from each queue server, which can grow to hold 100 message.
>>
>> The latency for processing this message would be the minimum average time
>> per task /thread being indexed which I think will keep the latencies low.
>>
>> Also, I think this could be a common anti-pattern and solution to the
>> over-prefetch problem.
>>
>> If you agree I'm willing to document the problem
>>
>> Additionally, I think this comes close to the multi-headed ideal solution
>> according to queuing theory using multiple worker heads.  It just becomes
>> more interesting because we have imperfect
>> information from the queue servers so we have to make educated guesses
>> about their behavior.
>>
>>
>>
>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>

Re: Dealing with the "over-prefetch" problem with large numbers of workers and many queue servers

Reply via email to