It really comes down to your use case, perhaps you can comment on what
you're doing.

Personally, we run smaller workers and more of them. Mainly because having
more JVMs helps us avoid internal contention on strangely-locked JVM
internals. We have on average 2-3 workers per machine with moderately sized
heaps.

I can't say I'd think the overhead is too much more to have extra workers
if you're doing shuffles or fields grouping most of the time anyways.

Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com


On Wed, Dec 18, 2013 at 8:54 PM, Jon Logan <jmlo...@buffalo.edu> wrote:

> Has anyone done much experimenting on optimal worker sizes? I'm basically
> unsure if it is better to run with more, smaller workers, or fewer, larger
> workers. Right now, I'm using ~3GB workers, and around 5 or so per machine.
> Would it be better to reduce this number?
>
> The main issues that come to mind are
>
> If larger workers
> - if one crashes, more data is lost
> - more GC issues for larger heap sizes
>
> if smaller workers
> - more overhead
> - more threads used
> - less local shuffling capability
> - more load on ZK/nimbus(?)
>
>
> Thoughts?
>

Reply via email to