[ 
https://issues.apache.org/jira/browse/YARN-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203115#comment-16203115
 ] 

Linlin Zhou edited comment on YARN-7109 at 10/13/17 6:42 AM:
-------------------------------------------------------------

Totally 7 aggregation functions are added in the patch, they are:
Min, Avg, Count, Median, TopKMin, TopKMax, MaxFreq.
Some comments and questions on the patch:
1. What's the expected functionality of the third parameter 'state'
in below method of TimelineMetricOperation?
abstract TimelineMetric exec(TimelineMetric incoming, TimelineMetric base,
   Map<Object, Object> state);

All the newly added functions except for Min are 'stateful',
e.g. Avg needs to record both the current average value
and the number of metrics which produced the value,
Median needs to store all the metrics fed to it.
But I don’t prefer to put the extra data structures into ‘state’ object.
In the patch, I only use 'state' to hold the k value of TopKMin and TopKMax and 
the capacity param of Median and MaxFreq.

2. Introduction of TimelineMetricAggregator
This is an abstract base class which encapsulatesthe data and operations needed 
by an aggregated function. It offers two methods: update(TimelineMetric 
incoming) and List<Number> getAggregatedValues().
Each aggregation function has a corresponding subclass, e.g. 
TimelineMetricAggregatorMedian maps to the Median function. 
When we need to add a new aggregation function, we add a new subclass of 
TimelineMetricAggregator. 

3. Does aggregation only apply to SINGEL_VALUE type? Do we need to support 
TIME_SERIES?
Implementation of ‘Max’ and ‘Sum’ function treat the incoming metric as 
SINGLE_VALUE.
In the modified version, I do not make such assumption. Each function will 
aggregate all the value in values(TreeMap), that is I take TIME_SERIES as 
multiple SINGLE_VALUE metrics and aggregate them one by one.

4. MaxFreq and Median store all the metrics feeded bounded with or without 
capacity limit?
To avoid out of memory, user is allowed to set a capacity to restrict the max 
number of metrics stored.  When metrics reach the capacity, the oldest metrics 
will be removed.
If user do not specified the capacity, we will store all the metrics.

5. The capacity and k parameters can only be set at the initial time.
For simplicity and elegance, currently the two parameters can only set once.

6. ‘Max’ and ‘Sum’ function will change the first incoming metric, is it ok?
Originally Max and Sum will take the first incoming metric as the aggregated 
metric, this metric will be changed after more aggregations. In the new 
version, the aggregated metric is always a new instance and won’t change the 
first incoming metric.

7. TimelineMetric offers getAggregatedValues() method.
I assume we only care value in an aggregated metric, timestamp can be omitted. 
TopKMax, TopKMin and MaxFreq may return multiple values, for convenience and 
meaningful I added the method getAggregatedValues(). For compatibility, we can 
still use getSingelDateValue and getValues() to get the aggregated results.

8. The result of TopK function does not guarantee any particular order.

9. Do I need to set aggregator as xmlElement?
“TimelineMetricAggregator aggregator;” is a new field in TimelineMetric, I see 
other fields like type, id, values all have annotation of ‘@XmlElement’, should 
aggregator do the same setting?

10. TimelineMetricCalculator do not support comparasion between Integer/Long 
and Float/Double, is this by design?

11. Do we really need the replace aggregation function?
Currently I do not modify it as ‘Max’ and ‘Sum’, and I can’t think of a 
scenario where this function is needed. 

11. Is there any other aggregation function need to add?


was (Author: littlestone00):
Totally 7 aggregation functions are added in the patch, they are:
Min, Avg, Count, Median, TopKMin, TopKMax, MaxFreq.
Some comments and questions on the patch:
1. What's the expected functionality of the third parameter 'state'
in below method of TimelineMetricOperation?
abstract TimelineMetric exec(TimelineMetric incoming, TimelineMetric base,
   Map<Object, Object> state);

All the newly added functions except for Min are 'stateful',
e.g. Avg needs to record both the current average value
and the number of metrics which produced the value,
Median needs to store all the metrics fed to it.
But I don’t prefer to put the extra data structures into ‘state’ object.
In the patch, I only use 'state' to hold the k value of TopKMin and TopKMax and 
the capacity param of Median and MaxFreq.

2. Introduction of TimelineMetricAggregator
This is an abstract base class which encapsulatesthe data and operations needed 
by an aggregated function. It offers two methods: update(TimelineMetric 
incoming) and List<Number> getAggregatedValues().
Each aggregation function has a corresponding subclass, e.g. 
TimelineMetricAggregatorMedian maps to the Median function. 
When we need to add a new aggregation function, we add a new subclass of 
TimelineMetricAggregator. 

3. Does aggregation only apply to SINGEL_VALUE type? Do we need to support 
TIME_SERIES?
Implementation of ‘Max’ and ‘Sum’ function treat the incoming metric as 
SINGLE_VALUE.
In the modified version, I do not make such assumption. Each function will 
aggregate all the value in values(TreeMap), that is I take TIME_SERIES as 
multiple SINGLE_VALUE metrics and aggregate them one by one.

4. MaxFreq and Median store all the metrics feeded bounded with or without 
capacity limit?
To avoid out of memory, user is allowed to set a capacity to restrict the max 
number of metrics stored.  When metrics reach the capacity, the oldest metrics 
will be removed.
If user do not specified the capacity, we will store all the metrics.

5. The capacity and k parameters can only be set at the initial time.
For simplicity and elegance, currently the two parameters can only set once.

6. ‘Max’ and ‘Sum’ function will change the first incoming metric, is it ok?
Originally Max and Sum will take the first incoming metric as the aggregated 
metric, this metric will be changed after more aggregations. In the new 
version, the aggregated metric is always a new instance and won’t change the 
first incoming metric.

7. TimelineMetric offers getAggregatedValues() method.
I assume we only care value in an aggregated metric, timestamp can be omitted. 
TopKMax, TopKMin and MaxFreq may return multiple values, for convenience and 
meaningful I added the method getAggregatedValues(). For compatibility, we can 
still use getSingelDateValue and getValues() to get the aggregated results.

8. The result of TopK function does not guarantee any particular order.

9. Do I need to set aggregator as xmlElement?
“TimelineMetricAggregator aggregator;” is a new field in TimelineMetric, I see 
other fields like type, id, values all have annotation of ‘@XmlElement’, should 
aggregator do the same setting?

10. TimelineMetricCalculator do not support comparasion between Integer/Long 
and Float/Double, is this by design?

11. Do we really need the replace aggregation function?
Currently I do not modify it as ‘Max’ and ‘Sum’, and I can’t think of a 
scenario where this function is needed. 

11. Is there any other aggregation function need to add?[^attachment-name.zip]

> Extend aggregation operation for new ATS design
> -----------------------------------------------
>
>                 Key: YARN-7109
>                 URL: https://issues.apache.org/jira/browse/YARN-7109
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zian Chen
>            Assignee: Linlin Zhou
>              Labels: patch
>         Attachments: yarn-7109.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to