[ https://issues.apache.org/jira/browse/YARN-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336259#comment-15336259 ]
Junping Du commented on YARN-5214: ---------------------------------- Thanks [~leftnoteasy] for review and comments! bq. even after R/W lock changes, when anything bad happens on disks, DirectoryCollection will be stuck under write locks, so NodeStatusUpdater will be blocked as well. Not really. From jstack above, you can see the pending operation on busy IO happen in below is out of any lock now. {noformat} Map<String, DiskErrorInformation> dirsFailedCheck = testDirs(allLocalDirs, preCheckGoodDirs); {noformat} So NodeStatusUpdater won't get blocked when testDirs pending on operation of mkdir. bq. 1) In short term, errorDirs/fullDirs/localDirs are copy-on-write list, so we don't need to acquire lock getGoodDirs/getFailedDirs/getFailedDirs. This could lead to inconsistency data in rare cases, but I think in general this is safe and inconsistency data will be updated in next heartbeat. In general, read/write lock is more flexible and more consistent as we have several resources under race condition. Copy-on-write list only can guarantee no modification exception happen between a read and write operation on the same list, but no way to provide consistent semantic across lists. Thus, I would prefer to use read/write lock here and CopyOnWriteArrayList can be replaced with plain Arraylist. Isn't it? bq. 2) In longer term, we may need to consider a DirectoryCollection stuck under busy IO is unhealthy state, NodeStatusUpdater should be able to report such status to RM, so RM will avoid allocating any new containers to such nodes. I agree we should provide better IO control on each node of YARN cluster. We can report some unhealthy status when IO get stuck or even better to count IO load as a resource for better/smart scheduling. However, how to better react for the very busy IO case is a different topic for the problem try to get resolved in this JIRA. In any case, NM heartbeat is not supposed to be cut-off unless daemon crash. > Pending on synchronized method DirectoryCollection#checkDirs can hang NM's > NodeStatusUpdater > -------------------------------------------------------------------------------------------- > > Key: YARN-5214 > URL: https://issues.apache.org/jira/browse/YARN-5214 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Reporter: Junping Du > Assignee: Junping Du > Priority: Critical > Attachments: YARN-5214.patch > > > In one cluster, we notice NM's heartbeat to RM is suddenly stopped and wait a > while and marked LOST by RM. From the log, the NM daemon is still running, > but jstack hints NM's NodeStatusUpdater thread get blocked: > 1. Node Status Updater thread get blocked by 0x000000008065eae8 > {noformat} > "Node Status Updater" #191 prio=5 os_prio=0 tid=0x00007f0354194000 nid=0x26fa > waiting for monitor entry [0x00007f035945a000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.getFailedDirs(DirectoryCollection.java:170) > - waiting to lock <0x000000008065eae8> (a > org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection) > at > org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getDisksHealthReport(LocalDirsHandlerService.java:287) > at > org.apache.hadoop.yarn.server.nodemanager.NodeHealthCheckerService.getHealthReport(NodeHealthCheckerService.java:58) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.getNodeStatus(NodeStatusUpdaterImpl.java:389) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.access$300(NodeStatusUpdaterImpl.java:83) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:643) > at java.lang.Thread.run(Thread.java:745) > {noformat} > 2. The actual holder of this lock is DiskHealthMonitor: > {noformat} > "DiskHealthMonitor-Timer" #132 daemon prio=5 os_prio=0 tid=0x00007f0397393000 > nid=0x26bd runnable [0x00007f035e511000] > java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1316) > at > org.apache.hadoop.util.DiskChecker.mkdirsWithExistsCheck(DiskChecker.java:67) > at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.verifyDirUsingMkdir(DirectoryCollection.java:340) > at > org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.testDirs(DirectoryCollection.java:312) > at > org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:231) > - locked <0x000000008065eae8> (a > org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection) > at > org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:389) > at > org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$400(LocalDirsHandlerService.java:50) > at > org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.run(LocalDirsHandlerService.java:122) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > {noformat} > This disk operation could take longer time than expectation especially in > high IO throughput case and we should have fine-grained lock for related > operations here. > The same issue on HDFS get raised and fixed in HDFS-7489, and we probably > should have similar fix here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org