[ 
https://issues.apache.org/jira/browse/YARN-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646855#comment-16646855
 ] 

Haibo Chen commented on YARN-8775:
----------------------------------

I see.  That is indeed an issue.

Thinking about this a bit, a sub-optimal way to disable the periodical disk 
check would be to set DISK_HEALTH_CHECK_INTERVAL  to a sufficiently large 
number (say 1 day). We would still enable disk checker, but the periodical 
check is postponed beyond the test lifetime, hence never executed.

we can then make checkDirs() public and call it explicit from the unit tests.

> TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File 
> modifications
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-8775
>                 URL: https://issues.apache.org/jira/browse/YARN-8775
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: test, yarn
>    Affects Versions: 3.0.0
>            Reporter: Antal Bálint Steinbach
>            Assignee: Antal Bálint Steinbach
>            Priority: Major
>         Attachments: YARN-8775.001.patch, YARN-8775.002.patch
>
>
> The test can fail sometimes when file operations were done during the check 
> done by the thread in _LocalDirsHandlerService._
> {code:java}
> java.lang.AssertionError: NodeManager could not identify disk failure.
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.assertTrue(Assert.java:41)
>       at 
> org.apache.hadoop.yarn.server.TestDiskFailures.verifyDisksHealth(TestDiskFailures.java:239)
>       at 
> org.apache.hadoop.yarn.server.TestDiskFailures.testDirsFailures(TestDiskFailures.java:202)
>       at 
> org.apache.hadoop.yarn.server.TestDiskFailures.testLocalDirsFailures(TestDiskFailures.java:99)
> Stderr
> 2018-09-13 08:21:49,822 INFO [main] server.TestDiskFailures 
> (TestDiskFailures.java:prepareDirToFail(277)) - Prepared 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1
>  to fail.
> 2018-09-13 08:21:49,823 INFO [main] server.TestDiskFailures 
> (TestDiskFailures.java:prepareDirToFail(277)) - Prepared 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
>  to fail.
> 2018-09-13 08:21:49,823 WARN [DiskHealthMonitor-Timer] 
> nodemanager.DirectoryCollection (DirectoryCollection.java:checkDirs(283)) - 
> Directory 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1
>  error, Not a directory: 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1,
>  removing from list of valid directories
> 2018-09-13 08:21:49,824 WARN [DiskHealthMonitor-Timer] 
> localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:initializeLogDir(1329)) - Could not 
> initialize log dir 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
> java.io.FileNotFoundException: Destination exists and is not a directory: 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:515)
> at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:496)
> at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1081)
> at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:178)
> at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:205)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:747)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:743)
> at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:743)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDir(ResourceLocalizationService.java:1324)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDirs(ResourceLocalizationService.java:1318)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$000(ResourceLocalizationService.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$2.onDirsChanged(ResourceLocalizationService.java:269)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:317)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:452)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$500(LocalDirsHandlerService.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.run(LocalDirsHandlerService.java:166)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> 2018-09-13 08:21:59,824 INFO [main] server.TestDiskFailures 
> (TestDiskFailures.java:verifyDisksHealth(237)) - ExpectedDirs=
> 2018-09-13 08:21:59,825 INFO [main] server.TestDiskFailures 
> (TestDiskFailures.java:verifyDisksHealth(238)) - 
> SeenDirs=/tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to