Greetings, I am processing a "batch" of files and have structured an iterative process around them. Each batch is processed by first loading the data with spark-csv, performing some minor transformations and then writing back out as parquet. Absolutely no caching or shuffle should occur with anything in this process.
I watch memory utilization on each executor and I notice a steady increase in memory with each iteration that completes. Eventually, we reach the memory limit set for the executor and the process begins to slowly degrade and fail. I'm really unclear about what I am doing that could possibly be causing the executors to hold on to state between iterations. Again, I was careful to make sure there was no caching that occurred. I've done most of my testing to date in python, though I will port it to scala to see if the behavior is potentially isolated to the runtime. Spark: 1.5.2 ~~ Ajaxx