[ https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhengchenyu updated YARN-6728: ------------------------------ Description: In our cluster, I found many map keep "NEW" state for several minutes. Here I got the container log: {code} [2017-06-13T18:21:23.068+08:00] [INFO] containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 304) [AsyncDispatcher event handler] : Adding container_1495632926847_2459604_01_000011 to application application_1495632926847_2459604 [2017-06-13T18:23:08.715+08:00] [INFO] containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) [AsyncDispatcher event handler] : Container container_1495632926847_2459604_01_000011 transitioned from NEW to LOCALIZING {code} Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch of AsyncDispather run slow, because they visit the defaultFs. Our cluster increase to 4k node, the pressure of defaultFs increase. (Note: log-aggregation is enable. ) was: In our cluster, I found many map keep "NEW" state for several minutes. Here I got the container log: {code} [2017-06-13T18:21:23.068+08:00] [INFO] containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 304) [AsyncDispatcher event handler] : Adding container_1495632926847_2459604_01_000011 to application application_1495632926847_2459604 [2017-06-13T18:23:08.715+08:00] [INFO] containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) [AsyncDispatcher event handler] : Container container_1495632926847_2459604_01_000011 transitioned from NEW to LOCALIZING {code} Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch of AsyncDispather run slow, because they visit the defaultFs. Our cluster increase to 4k node, the pressure of defaultFs increase. (Note: we ) > Job will run slow when the performance of defaultFs degrades and the > log-aggregation is enable. > ------------------------------------------------------------------------------------------------ > > Key: YARN-6728 > URL: https://issues.apache.org/jira/browse/YARN-6728 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, yarn > Affects Versions: 2.7.1 > Environment: CentOS 7.1 hadoop-2.7.1 > Reporter: zhengchenyu > Fix For: 2.9.0, 2.7.4 > > Original Estimate: 1m > Remaining Estimate: 1m > > In our cluster, I found many map keep "NEW" state for several minutes. Here > I got the container log: > {code} > [2017-06-13T18:21:23.068+08:00] [INFO] > containermanager.application.ApplicationImpl.transition(ApplicationImpl.java > 304) [AsyncDispatcher event handler] : Adding > container_1495632926847_2459604_01_000011 to application > application_1495632926847_2459604 > [2017-06-13T18:23:08.715+08:00] [INFO] > containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) > [AsyncDispatcher event handler] : Container > container_1495632926847_2459604_01_000011 transitioned from NEW to LOCALIZING > {code} > Then I search the log from 18:21:23.068 to 18:23:08.715. I found some > dispatch of AsyncDispather run slow, because they visit the defaultFs. Our > cluster increase to 4k node, the pressure of defaultFs increase. (Note: > log-aggregation is enable. ) -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org