Re: Rules of Thumb for Setting Parallelism

2020-11-10 Thread Rex Fenley
Awesome, thanks! On Sat, Nov 7, 2020 at 6:43 AM Till Rohrmann wrote: > Hi Rex, > > You should configure the number of slots per TaskManager to be the number > of cores of a machine/node. In total you will then have a cluster with > #slots = #cores per machine x #machines. > > If you have a

Re: Rules of Thumb for Setting Parallelism

2020-11-07 Thread Till Rohrmann
Hi Rex, You should configure the number of slots per TaskManager to be the number of cores of a machine/node. In total you will then have a cluster with #slots = #cores per machine x #machines. If you have a cluster with 4 nodes and 8 slots each, then you have a total of 32 slots. Now if you

Re: Rules of Thumb for Setting Parallelism

2020-11-06 Thread Rex Fenley
Great, thanks! So just to confirm, configure # of task slots to # of core nodes x # of vCPUs? I'm not sure what you mean by "distribute them across both jobs (so that the total adds up to 32)". Is it configurable how many task slots a job can receive, so in this case I'd provide ~30/36 * 32 task

Re: Rules of Thumb for Setting Parallelism

2020-11-06 Thread Till Rohrmann
Hi Rex, as a rule of thumb I recommend configuring your TMs with as many slots as they have cores. So in your case your cluster would have 32 slots. Then depending on the workload of your jobs you should distribute them across both jobs (so that the total adds up to 32). A high number of

Rules of Thumb for Setting Parallelism

2020-11-05 Thread Rex Fenley
Hello, I'm running a Job on AWS EMR with the TableAPI that does a long series of Joins, GroupBys, and Aggregates and I'd like to know how to best tune parallelism. In my case, I have 8 EMR core nodes setup each with 4vCores and 8Gib of memory. There's a job we have to run that has ~30 table