Ok so it is the case that small shuffles can be done without hitting any
disk. Is this the same case for the aux shuffle service in yarn? Can that
be done without hitting disk?

On Wed, Jun 10, 2015 at 9:17 PM, Patrick Wendell <pwend...@gmail.com> wrote:

> In many cases the shuffle will actually hit the OS buffer cache and
> not ever touch spinning disk if it is a size that is less than memory
> on the machine.
>
> - Patrick
>
> On Wed, Jun 10, 2015 at 5:06 PM, Corey Nolet <cjno...@gmail.com> wrote:
> > So with this... to help my understanding of Spark under the hood-
> >
> > Is this statement correct "When data needs to pass between multiple
> JVMs, a
> > shuffle will always hit disk"?
> >
> > On Wed, Jun 10, 2015 at 10:11 AM, Josh Rosen <rosenvi...@gmail.com>
> wrote:
> >>
> >> There's a discussion of this at
> https://github.com/apache/spark/pull/5403
> >>
> >>
> >>
> >> On Wed, Jun 10, 2015 at 7:08 AM, Corey Nolet <cjno...@gmail.com> wrote:
> >>>
> >>> Is it possible to configure Spark to do all of its shuffling FULLY in
> >>> memory (given that I have enough memory to store all the data)?
> >>>
> >>>
> >>>
> >>
> >
>

Reply via email to