It would be the "40%", although it's probably better to think of it as shuffle vs. data cache and the remainder goes to tasks. As the comments for the shuffle memory fraction configuration clarify that it will be taking memory at the expense of the storage/data cache fraction:
spark.shuffle.memoryFraction0.2Fraction of Java heap to use for aggregation and cogroups during shuffles, ifspark.shuffle.spill is true. At any given time, the collective size of all in-memory maps used for shuffles is bounded by this limit, beyond which the contents will begin to spill to disk. If spills are often, consider increasing this value at the expense of spark.storage.memoryFraction. On Wed, Jun 17, 2015 at 6:02 PM, Corey Nolet <cjno...@gmail.com> wrote: > So I've seen in the documentation that (after the overhead memory is > subtracted), the memory allocations of each executor are as follows (assume > default settings): > > 60% for cache > 40% for tasks to process data > > > Reading about how Spark implements shuffling, I've also seen it say "20% > of executor memory is utilized for shuffles" Does this 20% cut into the 40% > for tasks to process data or the 60% for the data cache? >