we are testing Dataset/Dataframe jobs instead of RDD jobs. one thing we
keep running into is containers getting killed by yarn. i realize this has
to do with off-heap memory, and the suggestion is to increase
spark.yarn.executor.memoryOverhead.

at times our memoryOverhead is as large as the executor memory (say 4G and
4G).

why is Dataset/Dataframe using so much off heap memory?

we havent changed spark.memory.offHeap.enabled which defaults to false.
should we enable that to get a better handle on this?

Reply via email to