Jeff Field created SPARK-15371:
----------------------------------

             Summary: YARNShuffleService doesn't get current local-dirs from 
NodeManager
                 Key: SPARK-15371
                 URL: https://issues.apache.org/jira/browse/SPARK-15371
             Project: Spark
          Issue Type: Bug
          Components: Shuffle, YARN
    Affects Versions: 1.6.1, 1.6.0, 1.5.2, 1.5.1, 1.5.0, 1.6.2, 2.0.0
            Reporter: Jeff Field
            Priority: Minor


In YarnShuffleService.java, the YarnShuffleService loads in the conf settings 
from YARN to get a list of local directories, and then if it doesn't find an 
existing levelDB file on any of them (for recovery), it will create one in the 
directory that is the first element of the list. Since it isn't asking YARN for 
the current list of healthy local-dirs (rather just the ones in the config), if 
the first directory is a known-bad location to the NodeManager, 
YarnShuffleService will continue to try to use it.

Removing the bad directory from the config fixes this, but Spark should get a 
current list from YARN instead of using the list from the config. There are 
examples of this in 
https://github.com/apache/hadoop/blob/branch-2.7.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java
 but I'm not sure the right way for Spark to implement that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to