[jira] [Commented] (SPARK-37313) Child stage using merged output or not should be based on the availability of merged output from parent stage

Mars (Jira) Thu, 17 Nov 2022 03:50:35 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-37313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635314#comment-17635314
 ]


Mars commented on SPARK-37313:
------------------------------

as comment said 
[https://github.com/apache/spark/pull/34461#issuecomment-964557253]
I'm working on this Issue and trying to implement this functionality [~minyang] 
[~mridul] 

> Child stage using merged output or not should be based on the availability of 
> merged output from parent stage
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-37313
>                 URL: https://issues.apache.org/jira/browse/SPARK-37313
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Shuffle, Spark Core
>    Affects Versions: 3.2.1
>            Reporter: Minchu Yang
>            Priority: Minor
>
> As discussed in the 
> [thread|https://github.com/apache/spark/pull/34461#pullrequestreview-799701494]
>  in SPARK-37023, during a stage retry, if parent stage has already generated 
> merged output in the previous attempt, with current behavior, the child stage 
> would not able to fetch the merged output, as this is controlled by 
> dependency.shuffleMergeEnabled (see current implementation 
> [here|https://github.com/apache/spark/blob/31b6f614d3173c8a5852243bf7d0b6200788432d/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L134-L136])
>  during the stage retry.
> Instead of using a single variable to control behavior at both mapper side 
> (push side) and reducer side (using merged output), whether child stage uses 
> merged output or not must only be based on whether merged output is available 
> for it to use(as discussed 
> [here|https://github.com/apache/spark/pull/34461#issuecomment-964557253]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37313) Child stage using merged output or not should be based on the availability of merged output from parent stage

Reply via email to