No sure if it can help, btw: Checkpoint cuts the lineage. The checkpoint method is a flag. In order to actually perform the checkpoint you must do NOT materialise the RDD before it has been flagged otherwise the flag is just ignored.
rdd2 = rdd1.map(..) rdd2.checkpoint() rdd2.count rdd2.isCheckpointed // true Il mercoledì 18 giugno 2014, dash <b...@nd.edu> ha scritto: > If a RDD object have non-empty .dependencies, does that means it have > lineage? How could I remove it? > > I'm doing iterative computing and each iteration depends on the result > computed in previous iteration. After several iteration, it will throw > StackOverflowError. > > At first I'm trying to use cache, I read the code in pregel.scala, which is > part of GraphX, they use a count method to materialize the object after > cache, but I attached a debugger and seems such approach does not empty > .dependencies, and that also does not work in my code. > > Another alternative approach is using checkpoint, I tried checkpoint > vertices and edges for my Graph object and then materialize it by count > vertices and edges. Then I use .isCheckpointed to check if it is correctly > checkpointed, but it always return false. > > > > -- > View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-for-removing-lineage-of-a-RDD-or-Graph-object-tp7779.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >