I think it's always twice, could you provide some demo case for sometimes the RDD1 calculated only once?
On Sat, Jan 17, 2015 at 2:37 AM, Peng Cheng <pc...@uow.edu.au> wrote: > I'm talking about RDD1 (not persisted or checkpointed) in this situation: > > ...(somewhere) -> RDD1 -> RDD2 > | | > V V > RDD3 -> RDD4 -> Action! > > To my experience the change RDD1 get recalculated is volatile, sometimes > once, sometimes twice. When calculation of this RDD is expensive (e.g. > involves using an RESTful service that charges me money), this compels me > to > persist RDD1 which takes extra memory, and in case the Action! doesn't > always happen, I don't know when to unpersist it to free those memory. > > A related problem might be in $SQLContest.jsonRDD(), since the source > jsonRDD is used twice (one for schema inferring, another for data read). It > almost guarantees that the source jsonRDD is calculated twice. Has this > problem be addressed so far? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/If-an-RDD-appeared-twice-in-a-DAG-of-which-calculation-is-triggered-by-a-single-action-will-this-RDD-tp21192.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- ~Yours, Xuefeng Wu/吴雪峰 敬上