Subject: Possible long lineage issue when using DStream to update a normal RDD
Hi all,
Recently in our project, we need to update a RDD using data regularly received
from DStream, I plan to use foreachRDD API to achieve this:
var MyRDD = ...
dstream.foreachRDD { rdd =
MyRDD = MyRDD.join(rdd
appreciated.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Possible-long-lineage-issue-when-using-DStream-to-update-a-normal-RDD-tp22812.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
...@gmail.com]
Sent: Friday, May 8, 2015 2:51 PM
To: Shao, Saisai
Cc: user@spark.apache.org
Subject: Re: Possible long lineage issue when using DStream to update a normal
RDD
Thank you for this suggestion! But may I ask what's the advantage to use
checkpoint instead of cache here? Cuz they both cut lineage
this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Possible-long-lineage-issue-when-using-DStream-to-update-a-normal-RDD-tp22812.html
Sent from the Apache Spark User List mailing list archive at Nabble.com