Re: ResultStage's parent stages only ShuffleMapStages?

Jeff Zhang Fri, 06 Nov 2015 00:37:23 -0800

Right, there're only 2 kinds of stage: ResultStage & ShuffleMapStage.
ShuffleMapStage will shuffle its data for downstream consumption, but
ResultStage don't need to do that.


I guess you may be confused these concepts with Map/Reduce.   Actually
ShuffleMapStage could be represented as either Map or Reduce as long as it
produce intermediate data for downstream consumption.




On Fri, Nov 6, 2015 at 4:15 PM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> Just to make sure that what I see in the code and think I understand
> is indeed correct...
>
> When a job is submitted to DAGScheduler, it creates a new ResultStage
> that in turn queries for the parent stages of itself given the RDD
> (using `getParentStagesAndId` in `newResultStage`).
>
> Are a ResultStage's parent stages only ShuffleMapStages?
>
> Pozdrawiam,
> Jacek
>
> --
> Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl
> Follow me at https://twitter.com/jaceklaskowski
> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Best Regards

Jeff Zhang

Re: ResultStage's parent stages only ShuffleMapStages?

Reply via email to