In many cases the shuffle will actually hit the OS buffer cache and
not ever touch spinning disk if it is a size that is less than memory
on the machine.

- Patrick

On Wed, Jun 10, 2015 at 5:06 PM, Corey Nolet <cjno...@gmail.com> wrote:
> So with this... to help my understanding of Spark under the hood-
>
> Is this statement correct "When data needs to pass between multiple JVMs, a
> shuffle will always hit disk"?
>
> On Wed, Jun 10, 2015 at 10:11 AM, Josh Rosen <rosenvi...@gmail.com> wrote:
>>
>> There's a discussion of this at https://github.com/apache/spark/pull/5403
>>
>>
>>
>> On Wed, Jun 10, 2015 at 7:08 AM, Corey Nolet <cjno...@gmail.com> wrote:
>>>
>>> Is it possible to configure Spark to do all of its shuffling FULLY in
>>> memory (given that I have enough memory to store all the data)?
>>>
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to