Hello,

I'm running a Job on AWS EMR with the TableAPI that does a long series of
Joins, GroupBys, and Aggregates and I'd like to know how to best tune
parallelism.

In my case, I have 8 EMR core nodes setup each with 4vCores and 8Gib of
memory. There's a job we have to run that has ~30 table operators. Given
this, how should I calculate what to set the systems parallelism to?

I also plan on running a second job on the same system, but just with 6
operators. Will this change the calculation for parallelism at all?

Thanks!

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Reply via email to