[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238407#comment-15238407
 ] 

Sangjin Lee commented on YARN-3816:
-----------------------------------

Thanks [~gtCarrera9] for the quick update!

As for the new metric type (i.e. base type + "_" + contributing child entity 
type), I do see the rationale (or need) to distinguish aggregation coming from 
different entities. We should still note that the metric would show somewhat 
awkwardly if we read the applications via queries. Aggregated metrics would 
look like "MEMORY_YARN_CONTAINER" for example. I'm not quite sure if there 
would be additional issues.

Also, I think we should be real judicious in permitting the aggregation. The 
most important case should be YARN container-to-app. For per-framework metrics, 
AMs themselves should handle internal aggregations themselves and simply add to 
the application, as they usually have the app-level metrics already anyway. 
That should be the main way to support them.

(TimelineMetric.java)
- l.244: “accumulated” -> “aggregated”?

(AppLevelTimelineCollector.java)
- l.126: typo: “teal-time” -> “real-time"

(TimelineCollector.java)
- l.83, 87: since these methods expose internals of the {{TimelineCollector}} 
class, I would make them {{protected}} to ensure only subclasses can use them
- l. 171: I could suggest one more optimization in terms of memory footprint. 
If the given entity does not have metrics, then we can/should skip the entire 
aggregation status step.
- l.230: It should be {{putIfAbsent()}}. Otherwise, {{put()}} would simply 
overwrite the value even if the value exists, and it will result in an 
incorrect object being used.

(ApplicationColumnPrefix.java)
- l.214: per comments on the JIRA, this new {{store()}} method should be 
removed, right?

I would encourage others to take a closer look at this too. Thanks!

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> ----------------------------------------------------------------------------
>
>                 Key: YARN-3816
>                 URL: https://issues.apache.org/jira/browse/YARN-3816
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Li Lu
>              Labels: yarn-2928-1st-milestone
>         Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-YARN-2928-v5.patch, YARN-3816-YARN-2928-v6.patch, 
> YARN-3816-feature-YARN-2928.v4.1.patch, YARN-3816-poc-v1.patch, 
> YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to