I think the missing parent may be not abnormal. From my understanding, when a Spark task cannot find its parent, it can use some meta data to find the result of its parent or recalculate its parent's value. Imaging in a loop, a Spark task tries to find some value from the last iteration's result.
2013/11/1 Walrus theCat <walrusthe...@gmail.com> > Are there heuristics to check when the scheduler says it is "missing > parents" and just hangs? > > > > On Thu, Oct 31, 2013 at 4:56 PM, Walrus theCat <walrusthe...@gmail.com>wrote: > >> Hi, >> >> I'm not sure what's going on here. My code seems to be working thus far >> (map at SparkLR:90 completed.) What can I do to help the scheduler out >> here? >> >> Thanks >> >> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: Completed >> ShuffleMapTask(10, 211) >> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: Stage 10 (map at >> SparkLR.scala:90) finished in 0.923 s >> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: looking for newly runnable >> stages >> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: running: Set(Stage 11) >> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: waiting: Set(Stage 9, >> Stage 8) >> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: failed: Set() >> 13/10/31 02:10:16 INFO scheduler.DAGScheduler: Missing parents for Stage >> 9: List(Stage 11) >> 13/10/31 02:10:16 INFO scheduler.DAGScheduler: Missing parents for Stage >> 8: List(Stage 9) >> >> >> >> > -- -- Shangyu, Luo Department of Computer Science Rice University -- Not Just Think About It, But Do It! -- Success is never final. -- Losers always whine about their best