[ https://issues.apache.org/jira/browse/SPARK-36772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415803#comment-17415803 ]
Mridul Muralidharan edited comment on SPARK-36772 at 9/15/21, 11:55 PM: ------------------------------------------------------------------------ +CC [~Gengliang.Wang], [~Ngone51] was (Author: mridulm80): +CC [~Gengliang.Wang] > FinalizeShuffleMerge fails with an exception due to attempt id not matching > --------------------------------------------------------------------------- > > Key: SPARK-36772 > URL: https://issues.apache.org/jira/browse/SPARK-36772 > Project: Spark > Issue Type: Bug > Components: Shuffle > Affects Versions: 3.2.0 > Reporter: Mridul Muralidharan > Priority: Blocker > > As part of driver request to external shuffle services (ESS) to finalize the > merge, it also passes its [application attempt > id|https://github.com/apache/spark/blob/3f09093a21306b0fbcb132d4c9f285e56ac6b43c/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java#L180] > so that ESS can validate the request is from the correct attempt. > This attempt id is fetched from the TransportConf passed in when creating the > [ExternalBlockStoreClient|https://github.com/apache/spark/blob/67421d80b8935d91b86e8cd3becb211fa2abd54f/core/src/main/scala/org/apache/spark/SparkEnv.scala#L352] > - and the transport conf leverages a [cloned > copy|https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/network/netty/SparkTransportConf.scala#L47] > of the SparkConf passed to it. > Application attempt id is set as part of SparkContext > [initialization|https://github.com/apache/spark/blob/67421d80b8935d91b86e8cd3becb211fa2abd54f/core/src/main/scala/org/apache/spark/SparkContext.scala#L586]. > But this happens after driver SparkEnv has [already been > created|https://github.com/apache/spark/blob/67421d80b8935d91b86e8cd3becb211fa2abd54f/core/src/main/scala/org/apache/spark/SparkContext.scala#L460]. > Hence the attempt id that ExternalBlockStoreClient uses will always end up > being -1 : which will not match the attempt id at ESS (which is based on > spark.app.attempt.id) : resulting in merge finalization to always fail (" > java.lang.IllegalArgumentException: The attempt id -1 in this > FinalizeShuffleMerge message does not match with the current attempt id 1 > stored in shuffle service for application ...") -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org