Hi Bijay, The Shuffle Spill (Disk) is the total number of bytes written to disk by records spilled during the shuffle. The Shuffle Spill (Memory) is the amount of space the spilled records occupied in memory before they were spilled. These differ because the serialized format is more compact, and the on-disk version can be compressed as well.
-Sandy On Mon, Mar 23, 2015 at 5:29 PM, Bijay Pathak <bijay.pat...@cloudwick.com> wrote: > Hello, > > I am running TeraSort <https://github.com/ehiggs/spark-terasort> on > 100GB of data. The final metrics I am getting on Shuffle Spill are: > > Shuffle Spill(Memory): 122.5 GB > Shuffle Spill(Disk): 3.4 GB > > What's the difference and relation between these two metrics? Does these > mean 122.5 GB was spill from memory during the shuffle? > > thank you, > bijay >