Re: Spark Disk Usage

Surendranauth Hiraman Mon, 07 Apr 2014 10:21:18 -0700

It might help if I clarify my questions. :-)

1. Is persist() applied during the transformation right before the
persist() call in the graph? Or is is applied after the transform's
processing is complete? In the case of things like GroupBy, is the Seq
backed by disk as it is being created? We're trying to get a sense of how
the processing is handled behind the scenes with respect to disk.


2. When else is disk used internally?

Any pointers are appreciated.

-Suren




On Mon, Apr 7, 2014 at 8:46 AM, Surendranauth Hiraman <
[email protected]> wrote:

> Hi,
>
> Any thoughts on this? Thanks.
>
> -Suren
>
>
>
> On Thu, Apr 3, 2014 at 8:27 AM, Surendranauth Hiraman <
> [email protected]> wrote:
>
>> Hi,
>>
>> I know if we call persist with the right options, we can have Spark
>> persist an RDD's data on disk.
>>
>> I am wondering what happens in intermediate operations that could
>> conceivably create large collections/Sequences, like GroupBy and shuffling.
>>
>> Basically, one part of the question is when is disk used internally?
>>
>> And is calling persist() on the RDD returned by such transformations what
>> let's it know to use disk in those situations? Trying to understand if
>> persist() is applied during the transformation or after it.
>>
>> Thank you.
>>
>>
>> SUREN HIRAMAN, VP TECHNOLOGY
>> Velos
>> Accelerating Machine Learning
>>
>> 440 NINTH AVENUE, 11TH FLOOR
>> NEW YORK, NY 10001
>> O: (917) 525-2466 ext. 105
>> F: 646.349.4063
>> E: suren.hiraman@v <[email protected]>elos.io
>> W: www.velos.io
>>
>>
>
>
> --
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: (917) 525-2466 ext. 105
> F: 646.349.4063
> E: suren.hiraman@v <[email protected]>elos.io
> W: www.velos.io
>
>


-- 

SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105
F: 646.349.4063
E: suren.hiraman@v <[email protected]>elos.io
W: www.velos.io

Re: Spark Disk Usage

Reply via email to