[ https://issues.apache.org/jira/browse/YARN-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212095#comment-16212095 ]
Xiao Chen commented on YARN-7261: --------------------------------- Thanks [~yufeigu] for creating the jira and providing a patch. For context, Yufei and myself have seen an intermittent issue where localization took very long. It is suspected that the copying from hdfs took long, but HDFS metrics/logs doesn't show any smoking guns. We'd like to use this jira to add more debugging information. The log we collected currently looks like: {noformat} 2017-09-15 10:55:50,738 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_e70_1505214525894_75227_01_000014 2017-09-15 10:55:50,738 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/cached/pub/deviceDetailsQuery_1505472717000.xml, 1505472808731, FILE, null } ... 2017-09-15 10:58:38,760 DEBUG org.apache.hadoop.yarn.util.FSDownload: Changing permissions for path file:/var/hdfs/5/yarn/nm/filecache/7363_tmp/deviceDetailsQuery_1505472717000.xml to perm r-xr-xr-x 2017-09-15 10:58:38,775 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e70_1505214525894_75227_01_000014 transitioned from LOCALIZING to LOCALIZED {noformat} But no details on what happened in the 3 minutes. The patch LGTM. 1 question: Do you think adding a debug message to {{ResourceLocalizationService#addResource}}, to indicate the when the following 1 & 2 conditions are false would be helpful? {code} /* * Here multiple containers may request the same resource. So we need * to start downloading only when * 1) ResourceState == DOWNLOADING * 2) We are able to acquire non blocking semaphore lock. * If not we will skip this resource as either it is getting downloaded * or it FAILED / LOCALIZED. */ {code} > Add debug message in class FSDownload for better download latency monitoring > ---------------------------------------------------------------------------- > > Key: YARN-7261 > URL: https://issues.apache.org/jira/browse/YARN-7261 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager > Reporter: Yufei Gu > Assignee: Yufei Gu > Attachments: YARN-7261.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org