[ 
https://issues.apache.org/jira/browse/SPARK-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-1538:
-----------------------------

    Summary: SparkUI forgets about all persisted RDD's not directly associated 
with the Stage  (was: SparkUI forgets about all persisted RDD's not directly 
associated with stages)

> SparkUI forgets about all persisted RDD's not directly associated with the 
> Stage
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-1538
>                 URL: https://issues.apache.org/jira/browse/SPARK-1538
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 0.9.1
>            Reporter: Andrew Or
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> The following command creates two RDDs in one Stage:
> sc.parallelize(1 to 1000, 4).persist.map(_ + 1).count
> More specifically, parallelize creates one, and map creates another. If we 
> persist only the first one, it does not actually show up on the StorageTab of 
> the SparkUI.
> This is because StageInfo only keeps around information for the last RDD 
> associated with the stage, but forgets about all of its parents. The proposal 
> here is to have StageInfo climb the RDD dependency ladder to keep a list of 
> all associated RDDInfos.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to