RE: Question about RDD cache, unpersist, materialization

2014-06-12 Thread innowireless TaeYun Kim
(I¡¯ve clarified the statement (1) of my previous mail. See below.) From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Friday, June 13, 2014 10:05 AM To: user@spark.apache.org Subject: RE: Question about RDD cache, unpersist, materialization Currently I use

RE: Question about RDD cache, unpersist, materialization

2014-06-12 Thread innowireless TaeYun Kim
From: Daniel Siegmann [mailto:daniel.siegm...@velos.io] Sent: Friday, June 13, 2014 5:38 AM To: user@spark.apache.org Subject: Re: Question about RDD cache, unpersist, materialization I've run into this issue. The goal of caching / persist seems to be to avoid recomputing an RDD when it

Re: Question about RDD cache, unpersist, materialization

2014-06-12 Thread Nicholas Chammas
not yet computed) > compute_that_rdd; > do_actual_unpersist(); > } > > > > *From:* Daniel Siegmann [mailto:daniel.siegm...@velos.io] > *Sent:* Friday, June 13, 2014 5:38 AM > *To:* user@spark.apache.org > *Subject:* Re: Question about RDD cache, unpersist, materialization >

RE: Question about RDD cache, unpersist, materialization

2014-06-12 Thread innowireless TaeYun Kim
) compute_that_rdd; do_actual_unpersist(); } From: Daniel Siegmann [mailto:daniel.siegm...@velos.io] Sent: Friday, June 13, 2014 5:38 AM To: user@spark.apache.org Subject: Re: Question about RDD cache, unpersist, materialization I've run into this issue. The goal of ca

Re: Question about RDD cache, unpersist, materialization

2014-06-12 Thread Daniel Siegmann
I've run into this issue. The goal of caching / persist seems to be to avoid recomputing an RDD when its data will be needed multiple times. However, once the following RDDs are computed the cache is no longer needed. The currently design provides no obvious way to detect when the cache is no longe

RE: Question about RDD cache, unpersist, materialization

2014-06-10 Thread Nick Pentreath
If you want to force materialization use .count() Also if you can simply don't unpersist anything, unless you really need to free the memory  — Sent from Mailbox On Wed, Jun 11, 2014 at 5:13 AM, innowireless TaeYun Kim wrote: > BTW, it is possible that rdd.first() does not compute the whole p

RE: Question about RDD cache, unpersist, materialization

2014-06-10 Thread innowireless TaeYun Kim
BTW, it is possible that rdd.first() does not compute the whole partitions. So, first() cannot be uses for the situation below. -Original Message- From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Wednesday, June 11, 2014 11:40 AM To: user@spark.apache.org Subject