(I¡¯ve clarified the statement (1) of my previous mail. See below.)
From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr]
Sent: Friday, June 13, 2014 10:05 AM
To: user@spark.apache.org
Subject: RE: Question about RDD cache, unpersist, materialization
Currently I use
From: Daniel Siegmann [mailto:daniel.siegm...@velos.io]
Sent: Friday, June 13, 2014 5:38 AM
To: user@spark.apache.org
Subject: Re: Question about RDD cache, unpersist, materialization
I've run into this issue. The goal of caching / persist seems to be to avoid
recomputing an RDD when it
not yet computed)
> compute_that_rdd;
> do_actual_unpersist();
> }
>
>
>
> *From:* Daniel Siegmann [mailto:daniel.siegm...@velos.io]
> *Sent:* Friday, June 13, 2014 5:38 AM
> *To:* user@spark.apache.org
> *Subject:* Re: Question about RDD cache, unpersist, materialization
>
)
compute_that_rdd;
do_actual_unpersist();
}
From: Daniel Siegmann [mailto:daniel.siegm...@velos.io]
Sent: Friday, June 13, 2014 5:38 AM
To: user@spark.apache.org
Subject: Re: Question about RDD cache, unpersist, materialization
I've run into this issue. The goal of ca
I've run into this issue. The goal of caching / persist seems to be to
avoid recomputing an RDD when its data will be needed multiple times.
However, once the following RDDs are computed the cache is no longer
needed. The currently design provides no obvious way to detect when the
cache is no longe
If you want to force materialization use .count()
Also if you can simply don't unpersist anything, unless you really need to free
the memory
—
Sent from Mailbox
On Wed, Jun 11, 2014 at 5:13 AM, innowireless TaeYun Kim
wrote:
> BTW, it is possible that rdd.first() does not compute the whole p
BTW, it is possible that rdd.first() does not compute the whole partitions.
So, first() cannot be uses for the situation below.
-Original Message-
From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr]
Sent: Wednesday, June 11, 2014 11:40 AM
To: user@spark.apache.org
Subject