Hi All,

It seems like the heap usage for
org.apache.spark.deploy.yarn.ApplicationMaster keeps growing continuously.
The driver crashes with OOM eventually.

More details:
I have a spark streaming app that runs on spark-2.0. The
spark.driver.memory is 10G and spark.yarn.driver.memoryOverhead is 2048.
Looking at driver heap dumps taken every 30 mins, the heap usage for
org.apache.spark.deploy.yarn.ApplicationMaster grows by 100MB every 30 mins.

Also, I suspect it may be caused because I had set below to true (which is
by default true I think)
  --conf spark.dynamicAllocation.enabled=true \
  --conf spark.shuffle.service.enabled=true \

I am trying out by setting them to false now to check if the heap usage for
ApplicationMaster stops increasing.

By investigating the heap dump and looking at the code for
ApplicationMaster it seems like the heap usage is growing because of
releasedExecutorLossReasons HashMap in
https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L124

Has anyone else seen this issue before?

Thanks,
Bharath

Reply via email to