Hi Hiroyuki, thanks for the answer. I found a solution for the cores per executor configuration: I set this configuration to true: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html#emr-spark-maximizeresourceallocation Probably it was true by default at version 5.16, but I didn't find when it has changed. In the same link, it says that dynamic allocation is true by default. I thought it would do the trick but reading again I think it is related to the number of executors rather than the number of cores.
But the jobs are still taking more than before. Watching application history, I see these differences: For the same job, the same kind of instances types, default (aws managed) configuration for executors, cores, and memory: Instances: 6 r5.xlarge : 4 vCpu , 32gb of mem. (So there is 24 cores: 6 instances * 4 cores). With 5.16: - 24 executors (4 in each instance, including the one who also had the driver). - 4 cores each. - 2.7 * 2 (Storage + on-heap storage) memory each. - 1 executor per core, but at the same time 4 cores per executor (?). - Total Mem in executors per Instance : 21.6 (2.7 * 2 * 4) - Total Elapsed Time: 6 minutes With 5.20: - 5 executors (1 in each instance, 0 in the instance with the driver). - 4 cores each. - 11.9 * 2 (Storage + on-heap storage) memory each. - Total Mem in executors per Instance : 23.8 (11.9 * 2 * 1) - Total Elapsed Time: 8 minutes I don't understand the configuration of 5.16, but it works better. It seems that in 5.20, a full instance is wasted with the driver only, while it could also contain an executor. Regards, Pedro. l jue., 31 de ene. de 2019 20:16, Hiroyuki Nagata <idiotpan...@gmail.com> escribió: > Hi, Pedro > > > I also start using AWS EMR, with Spark 2.4.0. I'm seeking methods for > performance tuning. > > Do you configure dynamic allocation ? > > FYI: > > https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation > > I've not tested it yet. I guess spark-submit needs to specify number of > executors. > > Regards, > Hiroyuki > > 2019年2月1日(金) 5:23、Pedro Tuero さん(tuerope...@gmail.com)のメッセージ: > >> Hi guys, >> I use to run spark jobs in Aws emr. >> Recently I switch from aws emr label 5.16 to 5.20 (which use Spark >> 2.4.0). >> I've noticed that a lot of steps are taking longer than before. >> I think it is related to the automatic configuration of cores by executor. >> In version 5.16, some executors toke more cores if the instance allows it. >> Let say, if an instance had 8 cores and 40gb of ram, and ram configured >> by executor was 10gb, then aws emr automatically assigned 2 cores by >> executor. >> Now in label 5.20, unless I configure the number of cores manually, only >> one core is assigned per executor. >> >> I don't know if it is related to Spark 2.4.0 or if it is something >> managed by aws... >> Does anyone know if there is a way to automatically use more cores when >> it is physically possible? >> >> Thanks, >> Peter. >> >