It would be the "40%", although it's probably better to think of it as
shuffle vs. data cache and the remainder goes to tasks. As the comments for
the shuffle memory fraction configuration clarify that it will be taking
memory at the expense of the storage/data cache fraction:

spark.shuffle.memoryFraction0.2Fraction of Java heap to use for aggregation
and cogroups during shuffles, ifspark.shuffle.spill is true. At any given
time, the collective size of all in-memory maps used for shuffles is
bounded by this limit, beyond which the contents will begin to spill to
disk. If spills are often, consider increasing this value at the expense of
spark.storage.memoryFraction.

On Wed, Jun 17, 2015 at 6:02 PM, Corey Nolet <cjno...@gmail.com> wrote:

> So I've seen in the documentation that (after the overhead memory is
> subtracted), the memory allocations of each executor are as follows (assume
> default settings):
>
> 60% for cache
> 40% for tasks to process data
>
>
> Reading about how Spark implements shuffling, I've also seen it say "20%
> of executor memory is utilized for shuffles" Does this 20% cut into the 40%
> for tasks to process data or the 60% for the data cache?
>

Reply via email to