Hi All, It seems like the heap usage for org.apache.spark.deploy.yarn.ApplicationMaster keeps growing continuously. The driver crashes with OOM eventually.
More details: I have a spark streaming app that runs on spark-2.0. The spark.driver.memory is 10G and spark.yarn.driver.memoryOverhead is 2048. Looking at driver heap dumps taken every 30 mins, the heap usage for org.apache.spark.deploy.yarn.ApplicationMaster grows by 100MB every 30 mins. Also, I suspect it may be caused because I had set below to true (which is by default true I think) --conf spark.dynamicAllocation.enabled=true \ --conf spark.shuffle.service.enabled=true \ I am trying out by setting them to false now to check if the heap usage for ApplicationMaster stops increasing. By investigating the heap dump and looking at the code for ApplicationMaster it seems like the heap usage is growing because of releasedExecutorLossReasons HashMap in https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L124 Has anyone else seen this issue before? Thanks, Bharath