[ 
https://issues.apache.org/jira/browse/YARN-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212095#comment-16212095
 ] 

Xiao Chen commented on YARN-7261:
---------------------------------

Thanks [~yufeigu] for creating the jira and providing a patch.

For context, Yufei and myself have seen an intermittent issue where 
localization took very long. It is suspected that the copying from hdfs took 
long, but HDFS metrics/logs doesn't show any smoking guns. We'd like to use 
this jira to add more debugging information.

The log we collected currently looks like:
{noformat}
2017-09-15 10:55:50,738 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Created localizer for container_e70_1505214525894_75227_01_000014
2017-09-15 10:55:50,738 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/cached/pub/deviceDetailsQuery_1505472717000.xml, 
1505472808731, FILE, null }
...
2017-09-15 10:58:38,760 DEBUG org.apache.hadoop.yarn.util.FSDownload: Changing 
permissions for path 
file:/var/hdfs/5/yarn/nm/filecache/7363_tmp/deviceDetailsQuery_1505472717000.xml
 to perm r-xr-xr-x
2017-09-15 10:58:38,775 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_e70_1505214525894_75227_01_000014 transitioned from 
LOCALIZING to LOCALIZED
{noformat}
But no details on what happened in the 3 minutes.

The patch LGTM. 1 question:
Do you think adding a debug message to 
{{ResourceLocalizationService#addResource}}, to indicate the when the following 
1 & 2 conditions are false would be helpful?
{code}
      /*
       * Here multiple containers may request the same resource. So we need
       * to start downloading only when
       * 1) ResourceState == DOWNLOADING
       * 2) We are able to acquire non blocking semaphore lock.
       * If not we will skip this resource as either it is getting downloaded
       * or it FAILED / LOCALIZED.
       */
{code}

> Add debug message in class FSDownload for better download latency monitoring
> ----------------------------------------------------------------------------
>
>                 Key: YARN-7261
>                 URL: https://issues.apache.org/jira/browse/YARN-7261
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: Yufei Gu
>            Assignee: Yufei Gu
>         Attachments: YARN-7261.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to