I'm running a Spark Streaming job on 1.3.1 which contains an updateStateByKey. The job works perfectly fine, but at some point (after a few runs), it starts shuffling to disk no matter how much memory I give the executors.
I have tried changing --executor-memory on spark-submit, spark.shuffle.memoryFraction, spark.storage.memoryFraction, and spark.storage.unrollFraction. But no matter how I configure these, it always spills to disk around 2.5GB. What is the best way to avoid spilling shuffle to disk?