It might help if I clarify my questions. :-) 1. Is persist() applied during the transformation right before the persist() call in the graph? Or is is applied after the transform's processing is complete? In the case of things like GroupBy, is the Seq backed by disk as it is being created? We're trying to get a sense of how the processing is handled behind the scenes with respect to disk.
2. When else is disk used internally? Any pointers are appreciated. -Suren On Mon, Apr 7, 2014 at 8:46 AM, Surendranauth Hiraman < suren.hira...@velos.io> wrote: > Hi, > > Any thoughts on this? Thanks. > > -Suren > > > > On Thu, Apr 3, 2014 at 8:27 AM, Surendranauth Hiraman < > suren.hira...@velos.io> wrote: > >> Hi, >> >> I know if we call persist with the right options, we can have Spark >> persist an RDD's data on disk. >> >> I am wondering what happens in intermediate operations that could >> conceivably create large collections/Sequences, like GroupBy and shuffling. >> >> Basically, one part of the question is when is disk used internally? >> >> And is calling persist() on the RDD returned by such transformations what >> let's it know to use disk in those situations? Trying to understand if >> persist() is applied during the transformation or after it. >> >> Thank you. >> >> >> SUREN HIRAMAN, VP TECHNOLOGY >> Velos >> Accelerating Machine Learning >> >> 440 NINTH AVENUE, 11TH FLOOR >> NEW YORK, NY 10001 >> O: (917) 525-2466 ext. 105 >> F: 646.349.4063 >> E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io >> W: www.velos.io >> >> > > > -- > > SUREN HIRAMAN, VP TECHNOLOGY > Velos > Accelerating Machine Learning > > 440 NINTH AVENUE, 11TH FLOOR > NEW YORK, NY 10001 > O: (917) 525-2466 ext. 105 > F: 646.349.4063 > E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io > W: www.velos.io > > -- SUREN HIRAMAN, VP TECHNOLOGY Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR NEW YORK, NY 10001 O: (917) 525-2466 ext. 105 F: 646.349.4063 E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io W: www.velos.io