[ 
https://issues.apache.org/jira/browse/YARN-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204901#comment-14204901
 ] 

Steve Loughran commented on YARN-2839:
--------------------------------------

We don't see a stack trace; we see ERRORS in the logs

{code}
2014-11-10 03:02:18,431 [Thread-2] INFO  nodemanager.LocalDirsHandlerService 
(LocalDirsHandlerService.java:logDiskStatus(339)) - Disk(s) failed: 1/1 
local-dirs are bad: 
/tmp/jenkins/workspace/slider-core/target/testexistsfailswithunknowncluster/testexistsfailswithunknowncluster-localDir-nm-0_0;
 1/1 log-dirs are bad: 
/tmp/jenkins/workspace/slider-core/target/testexistsfailswithunknowncluster/testexistsfailswithunknowncluster-logDir-nm-0_0
2014-11-10 03:02:18,432 [Thread-2] ERROR nodemanager.LocalDirsHandlerService 
(LocalDirsHandlerService.java:updateDirsAfterTest(332)) - Most of the disks 
failed. 1/1 local-dirs are bad: 
/tmp/jenkins/workspace/slider-core/target/testexistsfailswithunknowncluster/testexistsfailswithunknowncluster-localDir-nm-0_0;
 1/1 log-dirs are bad: 
/tmp/jenkins/workspace/slider-core/target/testexistsfailswithunknowncluster/testexistsfailswithunknowncluster-logDir-nm-0_0
2014-11-10 03:02:18,433 [Thread-2] INFO  localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:validateConf(216)) - per directory file limit 
= 8192
{code}

> YARN minicluster doesn't bail out if all the NM disks are dead
> --------------------------------------------------------------
>
>                 Key: YARN-2839
>                 URL: https://issues.apache.org/jira/browse/YARN-2839
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.6.0
>            Reporter: Steve Loughran
>
> Some jenkins tests of mine have been failing deep in the resource 
> localization process. Iif all the disks of the NMs are considered bad they 
> refuse to work, but the Yarn Minicluster doesn't fail itself.
> YARN-90 assumes that the NM disks will come back. This isn't likely to hold 
> in a short-lived mini cluster —better to have it probe the NMs and fail if 
> they aren't healthy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to