[ https://issues.apache.org/jira/browse/SPARK-37313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635314#comment-17635314 ]
Mars commented on SPARK-37313: ------------------------------ as comment said [https://github.com/apache/spark/pull/34461#issuecomment-964557253] I'm working on this Issue and trying to implement this functionality [~minyang] [~mridul] > Child stage using merged output or not should be based on the availability of > merged output from parent stage > ------------------------------------------------------------------------------------------------------------- > > Key: SPARK-37313 > URL: https://issues.apache.org/jira/browse/SPARK-37313 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core > Affects Versions: 3.2.1 > Reporter: Minchu Yang > Priority: Minor > > As discussed in the > [thread|https://github.com/apache/spark/pull/34461#pullrequestreview-799701494] > in SPARK-37023, during a stage retry, if parent stage has already generated > merged output in the previous attempt, with current behavior, the child stage > would not able to fetch the merged output, as this is controlled by > dependency.shuffleMergeEnabled (see current implementation > [here|https://github.com/apache/spark/blob/31b6f614d3173c8a5852243bf7d0b6200788432d/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L134-L136]) > during the stage retry. > Instead of using a single variable to control behavior at both mapper side > (push side) and reducer side (using merged output), whether child stage uses > merged output or not must only be based on whether merged output is available > for it to use(as discussed > [here|https://github.com/apache/spark/pull/34461#issuecomment-964557253]). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org