Hello everyone. The problem is that spark write data to the disk very hard, even if application has a lot of free memory (about 3.8g). So, I've noticed that folder with name like "spark-local-20140917165839-f58c" contains a lot of other folders with files like "shuffle_446_0_1". The total size of files in the dir "spark-local-20140917165839-f58c" can reach 1.1g. Sometimes its size decreases (are there only temp files in that folder?), so the totally amount of data written to the disk is greater than 1.1g.
The question is what kind of data Spark store there and can I make spark not to write it on the disk and just keep it in the memory if there is enough RAM free space? I run my job locally with Spark 1.0.1: ./bin/spark-submit --driver-memory 12g --master local[3] --properties-file conf/spark-defaults.conf --class my.company.Main /path/to/jar/myJob.jar spark-defaults.conf : spark.shuffle.spill false spark.reducer.maxMbInFlight 1024 spark.shuffle.file.buffer.kb 2048 spark.storage.memoryFraction 0.7 The situation with disk usage is common for many jobs. I had also used ALS from MLIB and saw the similar things. I had reached no success by playing with spark configuration and i hope someone can help me :)