In many cases the shuffle will actually hit the OS buffer cache and not ever touch spinning disk if it is a size that is less than memory on the machine.
- Patrick On Wed, Jun 10, 2015 at 5:06 PM, Corey Nolet <cjno...@gmail.com> wrote: > So with this... to help my understanding of Spark under the hood- > > Is this statement correct "When data needs to pass between multiple JVMs, a > shuffle will always hit disk"? > > On Wed, Jun 10, 2015 at 10:11 AM, Josh Rosen <rosenvi...@gmail.com> wrote: >> >> There's a discussion of this at https://github.com/apache/spark/pull/5403 >> >> >> >> On Wed, Jun 10, 2015 at 7:08 AM, Corey Nolet <cjno...@gmail.com> wrote: >>> >>> Is it possible to configure Spark to do all of its shuffling FULLY in >>> memory (given that I have enough memory to store all the data)? >>> >>> >>> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org