[ https://issues.apache.org/jira/browse/SPARK-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Graves resolved SPARK-19812. ----------------------------------- Resolution: Fixed Fix Version/s: 2.3.0 2.2.0 > YARN shuffle service fails to relocate recovery DB across NFS directories > ------------------------------------------------------------------------- > > Key: SPARK-19812 > URL: https://issues.apache.org/jira/browse/SPARK-19812 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 2.0.1 > Reporter: Thomas Graves > Assignee: Thomas Graves > Fix For: 2.2.0, 2.3.0 > > > The yarn shuffle service tries to switch from the yarn local directories to > the real recovery directory but can fail to move the existing recovery db's. > It fails due to Files.move not doing directories that have contents. > 2017-03-03 14:57:19,558 [main] ERROR yarn.YarnShuffleService: Failed to move > recovery file sparkShuffleRecovery.ldb to the path > /mapred/yarn-nodemanager/nm-aux-services/spark_shuffle > java.nio.file.DirectoryNotEmptyException:/yarn-local/sparkShuffleRecovery.ldb > at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:498) > at > sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) > at java.nio.file.Files.move(Files.java:1395) > at > org.apache.spark.network.yarn.YarnShuffleService.initRecoveryDb(YarnShuffleService.java:369) > at > org.apache.spark.network.yarn.YarnShuffleService.createSecretManager(YarnShuffleService.java:200) > at > org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:174) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:143) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:262) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:357) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:636) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:684) > This used to use f.renameTo and we switched it in the pr due to review > comments and it looks like didn't do a final real test. The tests are using > files rather then directories so it didn't catch. We need to fix the test > also. > history: > https://github.com/apache/spark/pull/14999/commits/65de8531ccb91287f5a8a749c7819e99533b9440 -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org