[ 
https://issues.apache.org/jira/browse/YARN-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336259#comment-15336259
 ] 

Junping Du commented on YARN-5214:
----------------------------------

Thanks [~leftnoteasy] for review and comments!
bq. even after R/W lock changes, when anything bad happens on disks, 
DirectoryCollection will be stuck under write locks, so NodeStatusUpdater will 
be blocked as well.
Not really. From jstack above, you can see the pending operation on busy IO 
happen in below is out of any lock now.
{noformat}
Map<String, DiskErrorInformation> dirsFailedCheck = testDirs(allLocalDirs,
         preCheckGoodDirs);
{noformat}
So NodeStatusUpdater won't get blocked when testDirs pending on operation of 
mkdir.

bq. 1) In short term, errorDirs/fullDirs/localDirs are copy-on-write list, so 
we don't need to acquire lock getGoodDirs/getFailedDirs/getFailedDirs. This 
could lead to inconsistency data in rare cases, but I think in general this is 
safe and inconsistency data will be updated in next heartbeat.
In general, read/write lock is more flexible and more consistent as we have 
several resources under race condition. Copy-on-write list only can guarantee 
no modification exception happen between a read and write operation on the same 
list, but no way to provide consistent semantic across lists. Thus, I would 
prefer to use read/write lock here and CopyOnWriteArrayList can be replaced 
with plain Arraylist. Isn't it?

bq. 2) In longer term, we may need to consider a DirectoryCollection stuck 
under busy IO is unhealthy state, NodeStatusUpdater should be able to report 
such status to RM, so RM will avoid allocating any new containers to such nodes.
I agree we should provide better IO control on each node of YARN cluster. We 
can report some unhealthy status when IO get stuck or even better to count IO 
load as a resource for better/smart scheduling. However, how to better react 
for the very busy IO case is a different topic for the problem try to get 
resolved in this JIRA. In any case, NM heartbeat is not supposed to be cut-off 
unless daemon crash.

> Pending on synchronized method DirectoryCollection#checkDirs can hang NM's 
> NodeStatusUpdater
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-5214
>                 URL: https://issues.apache.org/jira/browse/YARN-5214
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: YARN-5214.patch
>
>
> In one cluster, we notice NM's heartbeat to RM is suddenly stopped and wait a 
> while and marked LOST by RM. From the log, the NM daemon is still running, 
> but jstack hints NM's NodeStatusUpdater thread get blocked:
> 1.  Node Status Updater thread get blocked by 0x000000008065eae8 
> {noformat}
> "Node Status Updater" #191 prio=5 os_prio=0 tid=0x00007f0354194000 nid=0x26fa 
> waiting for monitor entry [0x00007f035945a000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.getFailedDirs(DirectoryCollection.java:170)
>         - waiting to lock <0x000000008065eae8> (a 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getDisksHealthReport(LocalDirsHandlerService.java:287)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeHealthCheckerService.getHealthReport(NodeHealthCheckerService.java:58)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.getNodeStatus(NodeStatusUpdaterImpl.java:389)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.access$300(NodeStatusUpdaterImpl.java:83)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:643)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> 2. The actual holder of this lock is DiskHealthMonitor:
> {noformat}
> "DiskHealthMonitor-Timer" #132 daemon prio=5 os_prio=0 tid=0x00007f0397393000 
> nid=0x26bd runnable [0x00007f035e511000]
>    java.lang.Thread.State: RUNNABLE
>         at java.io.UnixFileSystem.createDirectory(Native Method)
>         at java.io.File.mkdir(File.java:1316)
>         at 
> org.apache.hadoop.util.DiskChecker.mkdirsWithExistsCheck(DiskChecker.java:67)
>         at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:104)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.verifyDirUsingMkdir(DirectoryCollection.java:340)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.testDirs(DirectoryCollection.java:312)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:231)
>         - locked <0x000000008065eae8> (a 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:389)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$400(LocalDirsHandlerService.java:50)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.run(LocalDirsHandlerService.java:122)
>         at java.util.TimerThread.mainLoop(Timer.java:555)
>         at java.util.TimerThread.run(Timer.java:505)
> {noformat}
> This disk operation could take longer time than expectation especially in 
> high IO throughput case and we should have fine-grained lock for related 
> operations here. 
> The same issue on HDFS get raised and fixed in HDFS-7489, and we probably 
> should have similar fix here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to