Has anyone done much experimenting on optimal worker sizes? I'm basically
unsure if it is better to run with more, smaller workers, or fewer, larger
workers. Right now, I'm using ~3GB workers, and around 5 or so per machine.
Would it be better to reduce this number?

The main issues that come to mind are

If larger workers
- if one crashes, more data is lost
- more GC issues for larger heap sizes

if smaller workers
- more overhead
- more threads used
- less local shuffling capability
- more load on ZK/nimbus(?)


Thoughts?

Reply via email to