Andrew Or created SPARK-1538: -------------------------------- Summary: SparkUI forgets about all persisted RDD's not associated with stages Key: SPARK-1538 URL: https://issues.apache.org/jira/browse/SPARK-1538 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.1 Reporter: Andrew Or Priority: Blocker Fix For: 1.0.0
The following command creates two RDDs in one Stage: sc.parallelize(1 to 1000, 4).persist.map(_ + 1).count More specifically, parallelize creates one, and map creates another. If we persist only the first one, it does not actually show up on the StorageTab of the SparkUI. This is because StageInfo only keeps around information for the last RDD associated with the stage, but forgets about all of its parents. The proposal here is to have StageInfo climb the RDD dependency ladder to keep a list of all associated RDDInfos. -- This message was sent by Atlassian JIRA (v6.2#6252)