[ 
https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427301#comment-15427301
 ] 

Jason Lowe commented on YARN-1529:
----------------------------------

bq. One comment that I have is we are adding a new API, albeit a small one, for 
YARN application developers.

That's a great point, and actually I'd be perfectly happy if this JIRA simply 
added the NM-level metric source and skipped the container API part for now.  
If we're moving towards doing this via the ATS anyway, we may not want/need the 
env variable API.  It might be worth splitting the patch so the less 
controversial NM-level metrics can go in earlier and we can discuss the 
per-container metrics API in another.  If the consensus is that this patch 
should include the per-container metrics API via the container env as well then 
I'm OK with that too.  I also agree that hiding the implementation details of 
that API would be important, whether that's in this JIRA or another.

Either way the patch needs an update, and please feel free to do so.

> Add Localization overhead metrics to NM
> ---------------------------------------
>
>                 Key: YARN-1529
>                 URL: https://issues.apache.org/jira/browse/YARN-1529
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: Gera Shegalov
>            Assignee: Chris Trezzo
>         Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, 
> YARN-1529.v03.patch, YARN-1529.v04.patch
>
>
> Users are often unaware of localization cost that their jobs incur. To 
> measure effectiveness of localization caches it is necessary to expose the 
> overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be 
> fetched from a central location, typically on HDFS, that results in a number 
> of download requests for the files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache 
> misses.
> LocalizedFilesCached: total localization requests that were served from local 
> caches. Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that 
> were served out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container 
> to go from ResourceRequestTransition to LocalizedTransition



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to