Re: Dealing with the "over-prefetch" problem with large numbers of workers and many queue servers

Tim Bain Wed, 21 Oct 2015 06:53:50 -0700

Right off the top, can't you use INDIVIDUAL_ACK here, rather than
committing transactions?  That seems like the ideal mode to let you choose
which messages to ack without having to ack all the ones up to a certain
point.


The only complication is that I think your prefetch size would need to be
equal to (or greater than, but that's not ideal for load balancing) the
number of current consumers on the session, which could be complicated to
configure.  But you might be able to use a prefetch buffer size of 0 to
work around it; I'm not sure how that would interact with INDIVIDUAL_ACK,
since I've never tried using a prefetch size of 0, but it would be simple
enough for you to test.

If it works, a prefetch buffer size of 0 would be better than a size of 1
with AUTO_ACK, because there would be nothing prefetched to any client that
wasn't actively being processed, so new consumers wouldn't be starved by
the broker having already passed out the backlog to consumers who weren't
ready for their next message.

Also, I'm curious about how a 30-second message with a prefetch size of 1
results in a 5-minute latency; why isn't that 2 * 30 seconds = 1 minute?

Tim

On Mon, Oct 19, 2015 at 8:15 PM, Kevin Burton <bur...@spinn3r.com> wrote:

> We have a problem whereby we have a LARGE number of workers.  Right now
> about 50k worker threads on about 45 bare metal boxes.
>
> We have about 10 ActiveMQ servers / daemons which service these workers.
>
> The problem is that my current design has a session per queue server per
> thread.   So this means I have about 500k sessions each trying to prefetch
> 1 message at a time.
>
> Since my tasks can take about 30 seconds on average to execute, this means
> that it takes 5 minutes for a message to be processed.
>
> That's a BIG problem in that I want to keep my latencies low!
>
> And the BIG downside here is that a lot of my workers get their prefetch
> buffer filled first, starving out other workers that do nothing...
>
> This leads to massive starvation where some of my boxes are at 100% CPU and
> others are at 10-20% starved for work.
>
> So I'm working on a new design where by I use a listener, then I allow it
> to prefetch and I use a countdown latch from within the message listener to
> wait for the thread to process the message.  Then I commit the message.
>
> This solves the over-prefetch problem because we don't attempt to pre-fetch
> until the message is processed.
>
> Since I can't commit each JMS message one at a time, I'm only left with
> options that commit the whole session.  This forces me to set prefetch=1
> otherwise I could commit() and then commit a message that is actually still
> being processed.
>
> This leaves me with a situation where I need to be clever about how I fetch
> from the queue servers.
>
> If I prefetch on ALL queue servers I'm kind of back to where I was to begin
> with.
>
> I was thinking of implementing this solution which should work and
> minimizes all downsides.  Wanted feedback on this issue.
>
> If I have say 1000 worker threads, what I do is allow up to 10% of the nr
> of worker threads to be pre-fetched and stored in a local queue
> (ArrayBlockingQueue).
>
> In this example this would be 100 messages.
>
> The problem now is how to we read in parallel from each server.
>
> I think in this situation is that we then allow 10% of the buffered
> messages from each queue server.
>
> So in this case 10 from each.
>
> so now we end up with a situation where we're allowed to prefetch 10
> messages, each from each queue server, which can grow to hold 100 message.
>
> The latency for processing this message would be the minimum average time
> per task /thread being indexed which I think will keep the latencies low.
>
> Also, I think this could be a common anti-pattern and solution to the
> over-prefetch problem.
>
> If you agree I'm willing to document the problem
>
> Additionally, I think this comes close to the multi-headed ideal solution
> according to queuing theory using multiple worker heads.  It just becomes
> more interesting because we have imperfect
> information from the queue servers so we have to make educated guesses
> about their behavior.
>
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
>

Re: Dealing with the "over-prefetch" problem with large numbers of workers and many queue servers

Reply via email to