[ https://issues.apache.org/jira/browse/YARN-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yunyao Zhang updated YARN-9480: ------------------------------- Attachment: (was: YARN-9480.001.patch) > createAppDir() in LogAggregationService shouldn't block dispatcher thread of > ContainerManagerImpl > ------------------------------------------------------------------------------------------------- > > Key: YARN-9480 > URL: https://issues.apache.org/jira/browse/YARN-9480 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager > Reporter: liyakun > Assignee: Yunyao Zhang > Priority: Major > Attachments: YARN-9480.001.patch > > > At present, when startContainers(), if NM does not contain the application, > it will enter the step of INIT_APPLICATION. In the application init step, > createAppDir() will be executed, and it is a blocking operation. > createAppDir() is an operation that needs to interact with an external file > system. This operation is affected by the SLA of the external file system. > Once the external file system has a high latency, the NM dispatcher thread of > ContainerManagerImpl will be stuck. (In fact, I have seen a scene that NM > stuck here for more than an hour.) > I think it would be more reasonable to move createAppDir() to the actual time > of uploading log (in other threads). And according to the logRetentionPolicy, > many of the containers may not get to this step, which will save a lot of > interactions with external file system. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org