[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791374#comment-14791374
 ] 

Junping Du commented on YARN-3816:
----------------------------------

Reply to Varun's comments:

bq. In TimelineMetric#accumulateTo, can latestMetric be TIME_SERIES ? If 
not(seems to be the case as per current code), is the else part of the 
condition if (latestMetric.getType().equals(Type.SINGLE_VALUE)) { required ? 
Wont be handling TIME_SERIES then ?
I am not sure if I understand your comments correctly. But it definitely 
support TIME_SERIES type for latestMetrics and handle two types separately.

bq. One of the aggregateMetrics method in TimelineCollector is not required if 
you do not plan to use it elsewhere. Agree with Naga that a set should suffice 
here.
YARN-3817 should use this. Will replace with SET as above comments.

bq. Default value can be mentioned explicity for 
yarn.timeline-service.aggregation.accumulation.enabled in config file.
The same comments as above.

bq. REST API code can be removed I guess if not required for POC.
We use this REST API in our previous PoC. Any significant problem with these 
REST API code? If not, we can leave it here and enhance it later.

bq. BTW, the aggregate flag appended into column qualifer will be used by 
offline aggregator?
Yes. I think it can be used by offline aggregator in YARN-3817. Li, would you 
confirm this?


Reply to Li's comments:
bq. TimelineAggregationBasis.java, Shall we differentiate realtime and offline 
aggregations? IIUC, APPLICATION type represents realtime aggregation while the 
other two represents offline aggregation.
I think we can also add another enum of AggregationType to have {online, 
offline} which is orthogonal to aggregation level here. Until this JIRA, we 
haven't invovle offline aggregation concept, may be add the concept of offline 
aggregation in YARN-3817 sounds better?

bq. setToAggregate(), why do we need a final here?
Theoritically, all parameters in method should be added with final tag to mark 
as immutable except we want to change it inside of method. This is also true 
for local variable that we don't plan to reassign it after first assignment. 
Although our convention doesn't force any rules on this, it should be better 
and let's leave it here.

bq. So in conclusion, if t1 is null, we set delta to 0 since it's the first 
value to be aggregated. Or else, we aggregate the delta?
That's true. I will update the comments which is slightly confusing people.

bq. I'm fine to keep the else part of the 
latestMetric.getType().equals(Type.SINGLE_VALUE) for now. Maybe we'd like to 
update this part when implementing TODO.
TODO is for checking metric_id, not type. We do support latestMetric as 
TIME_SERIES instead of SINGLE_VALUE, but we pick up the last (latest time) 
value in our logic. The TODO is saying we should check the metric id (CPU, 
MEMORY, etc.) should be compatible as we don't want to aggregate CPU metric 
into Memory metric. Today, this is guranteed by method caller's logic. Someday 
later, we should also check inside of method as other callers could make wrong 
on this. In addition, we cannot easily check metric_id should be equal as we 
need to handle cases that we could do accumulation on different (but 
compatible) metric IDs, like: CPU metrics accumulated on CPU_AREA metrics, etc. 

bq. l.183, I noticed we're copying the whole list to rebuild the valueList, 
then only used one element there? Are we sure the values list is small enough 
all the time?
Do you mean the if case for "op.equals(Operation.REP)"? That's good point. Will 
fix it in next patch.

bq. TimelineCollector.java, so we put intermediate aggregation status into the 
collector as some hash maps. Will there be challenges to rebuild these status? 
We may need to face the situations like nm restart (we're an aux service inside 
nm).
These hashmaps should be easily rebuild with accessing the raw entity table for 
all existing container metrics during NM restart.

bq. We need a method similar to appendAggregatedMetricsToEntities in YARN-3817. 
Actually I believe this is a very helpful method in normal aggregation 
workflow. It would be great if we can find a more visible place to put it.
Ok. I can make it as static method somewhere to share.

bq. I have one question about doAccumulation: in offline aggregation we're 
using a two-staged aggregation: we perform a flow run level aggregation in the 
mapper, and then aggregate entities in the same flow again in the reducer for 
entities from different mappers. In this case, we need to set the 
doAccumulation parameter into false for the aggregation in the reducer, right?
I think so. Your reducer seems to be just sum up flow run metrics (aggregated 
and accumulated) together so you should set doAccumulation to false.

Will update patch according to above comments soon.

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> ----------------------------------------------------------------------------
>
>                 Key: YARN-3816
>                 URL: https://issues.apache.org/jira/browse/YARN-3816
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to