[jira] [Comment Edited] (YARN-1529) Add Localization overhead metrics to NM

2016-08-18 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427222#comment-15427222
 ] 

Chris Trezzo edited comment on YARN-1529 at 8/18/16 9:52 PM:
-

Thanks [~jlowe] for the rebased patch! I agree that it would be nice to not tie 
these localization metrics to ATS so that more people can leverage them earlier.

One comment that I have is we are adding a new API, albeit a small one, for 
YARN application developers. This API is the serialized data we put into the 
environment variable (LOCALIZATION_COUNTERS) to communicate the localization 
statistics to the application-level process. Currently, if a YARN developer 
wants to leverage these metrics, they have to figure out how information is 
serialized into this env var and hope it doesn't change. What do you think 
about adding a small class/method that defines this a little more formally and 
contains the deserialization logic? That way if another application, let's say 
TEZ, wants to leverage this data, they can just call the new deserialize method.

If you think this is a good idea, I can post another patch with the added 
class. Thanks!


was (Author: ctrezzo):
Thanks [~jlowe] for the rebased patch! I agree that it would be nice to not tie 
these localization metrics to ATS so that more people can leverage them earlier.

One comment that I have is we are adding a new API, albeit a small one, for 
YARN application developers. This API is the serialized data we put into the 
environment variable (LOCALIZATION_COUNTERS) to communicate the localization 
statistics to the application-level container. Currently, if a YARN developer 
wants to leverage these metrics, they have to figure out how information is 
serialized into this env var and hope it doesn't change. What do you think 
about adding a small class/method that defines this a little more formally and 
contains the deserialization logic? That way if another application, let's say 
TEZ, wants to leverage this data, they can just call the new deserialize method.

If you think this is a good idea, I can post another patch with the added 
class. Thanks!

> Add Localization overhead metrics to NM
> ---
>
> Key: YARN-1529
> URL: https://issues.apache.org/jira/browse/YARN-1529
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Gera Shegalov
>Assignee: Chris Trezzo
> Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, 
> YARN-1529.v03.patch, YARN-1529.v04.patch
>
>
> Users are often unaware of localization cost that their jobs incur. To 
> measure effectiveness of localization caches it is necessary to expose the 
> overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be 
> fetched from a central location, typically on HDFS, that results in a number 
> of download requests for the files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache 
> misses.
> LocalizedFilesCached: total localization requests that were served from local 
> caches. Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that 
> were served out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container 
> to go from ResourceRequestTransition to LocalizedTransition



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-1529) Add Localization overhead metrics to NM

2016-07-22 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389841#comment-15389841
 ] 

Chris Trezzo edited comment on YARN-1529 at 7/22/16 5:09 PM:
-

[~mingma] that makes total sense. [~sjlee0] Is there anything that would 
prevent an application-level process running in a container from querying ATS 
for YARN level metrics about the container itself while the container is 
running?

As a side node, one interesting thing about these particular metrics is as they 
stand now, once the container is up and running they do not change (i.e. all 
localization for the container is done).


was (Author: ctrezzo):
[~mingma] that makes total sense. [~sjlee0] Is there anything that would 
prevent an application-level process running in a container from querying ATS 
for framework level metrics about the container itself while the container is 
running?

As a side node, one interesting thing about these particular metrics is as they 
stand now, once the container is up and running they do not change (i.e. all 
localization for the container is done).

> Add Localization overhead metrics to NM
> ---
>
> Key: YARN-1529
> URL: https://issues.apache.org/jira/browse/YARN-1529
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, 
> YARN-1529.v03.patch
>
>
> Users are often unaware of localization cost that their jobs incur. To 
> measure effectiveness of localization caches it is necessary to expose the 
> overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be 
> fetched from a central location, typically on HDFS, that results in a number 
> of download requests for the files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache 
> misses.
> LocalizedFilesCached: total localization requests that were served from local 
> caches. Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that 
> were served out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container 
> to go from ResourceRequestTransition to LocalizedTransition



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org