Please try and play with spark-defaults.conf for EMR. Dynamic allocation = true is there by default for EMR 4.4 and above. What is the EMR version you are using?
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html#d0e20458 On Thu, Jan 19, 2017 at 5:02 PM, Venkata D <dvenkatj2ee...@gmail.com> wrote: > blondowski, > > How big is your JSON file. Is it possible to post the spark params or > configurations here, maybe that might get to some idea about the issue. > > Thanks > > On Thu, Jan 19, 2017 at 4:21 PM, blondowski <dan.blondow...@dice.com> > wrote: > >> Please bear with me..I'm fairly new to spark. Running pyspark 2.0.1 on >> AWS >> EMR (6 node cluster with 475GB of RAM) >> >> We have a job that creates a dataframe from json files, then does some >> manipulation (adds columns) and then calls a UDF. >> >> The job fails on the UDF call with Container killed by YARN for exceeding >> memory limits. 6.7 GB of 6.6 GB physical memory used. Consider boosting >> spark.yarn.executor.memoryOverhead. >> >> I've tried adjusting executor-memory to 48GB, but that also failed. >> >> What I've noticed that during reading json & creation of dataframe it uses >> 100+ executors and all of the memory on the cluster is being used. >> >> When it gets to the part where it's calling UDF it only allocates 3 >> executors. And they all die one by one. >> Can somebody please explain to me how the executors get allocated? >> >> >> >> -- >> View this message in context: http://apache-spark-user-list. >> 1001560.n3.nabble.com/Executors-running-out-of-memory-tp28325.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > -- Regards, Sanat Patnaik Cell->804-882-6424