Hi all,

I am running a spark application on YARN-client mode with 6 executors (each 4 
cores and executor memory = 6G and Overhead = 4G, spark version: 1.6.3 / 
2.1.0). I find that my executor memory keeps increasing until get killed by 
node manager; and give out the info that tells me to boost 
spark.yarn.excutor.memoryOverhead. I know that this param mainly control the 
size of memory allocated off-heap. But I don’t know when and how the spark 
engine will use this part of memory. Also increase that part of memory not 
always solve my problem. sometimes works sometimes not. It trends to be useless 
when the input data is large.

FYI, my app’s logic is quite simple. It means to combine the small files 
generated in one single day (one directory one day) into a single one and write 
back to hdfs. Here is the core code:
val df = spark.read.parquet(originpath).filter(s"m = ${ts.month} AND d = 
${ts.day}").coalesce(400)
val dropDF = df.drop("hh").drop("mm").drop("mode").drop("y").drop("m").drop("d")
dropDF.repartition(1).write.mode(SaveMode.ErrorIfExists).parquet(targetpath)
The source file may have hundreds to thousands level’s partition. And the total 
parquet file is around 1to 5 gigs. Also I find that in the step that shuffle 
reading data from different machines, The size of shuffle read is about 4 times 
larger than the input size, Which is wired or some principle I don’t know. 

Anyway, I have done some search myself for this problem. Some article said that 
it’s on the direct buffer memory (I don’t set myself). Some article said that 
people solve it with more frequent full GC. Also I find one people on SO with 
very similar situation: 
http://stackoverflow.com/questions/31646679/ever-increasing-physical-memory-for-a-spark-application-in-yarn
This guy claimed that it’s a bug with parquet but comment questioned him. 
People in this mail list may also receive an email hours ago from blondowski 
who described this problem while writing json: 
http://apache-spark-user-list.1001560.n3.nabble.com/Executors-running-out-of-memory-tt28325.html#none

So it looks like to be common question for different output format. I hope 
someone with experience about this problem could make an explanation about this 
issue. Why this happen and what is a reliable way to solve this problem. 

Best,


Reply via email to