subject:"\[jira\] \[Commented\] \(YARN\-3901\) Populate flow run data in the flow

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2016-07-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369763#comment-15369763
 ] 

Hudson commented on YARN-3901:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #10074 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10074/])
YARN-3901. Populate flow run data in the flow_run & flow activity tables 
(sjlee: rev a68e3839218523403f42acd7bdd7ce1da59a5e60)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityColumn.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/common/TimestampGenerator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowActivityTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/apptoflow/AppToFlowColumn.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/application/ApplicationColumn.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/HBaseTimelineWriterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowRunColumnFamily.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/TestHBaseTimelineStorage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/application/ApplicationColumnPrefix.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityColumnPrefix.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/TestHBaseStorageFlowRun.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowActivityColumnPrefix.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowRunColumnPrefix.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowActivityColumnFamily.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowActivityRowKey.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowRunCoprocessor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/common/ColumnHelper.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/common/ColumnPrefix.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/Attribute.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/TimelineSchemaCreator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowRunTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/common/TimelineWriterUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/AggregationCompactionDimension.java
*

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-17 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791626#comment-14791626
 ] 

Vrushali C commented on YARN-3901:
--

Thanks [~jrottinghuis], it is quite exciting to work on hbase cell tags and 
coprocessors. Looking forward to the upcoming jiras! 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.10.patch, YARN-3901-YARN-2928.2.patch, 
> YARN-3901-YARN-2928.3.patch, YARN-3901-YARN-2928.4.patch, 
> YARN-3901-YARN-2928.5.patch, YARN-3901-YARN-2928.6.patch, 
> YARN-3901-YARN-2928.7.patch, YARN-3901-YARN-2928.8.patch, 
> YARN-3901-YARN-2928.9.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-17 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803137#comment-14803137
 ] 

Junping Du commented on YARN-3901:
--

Patch LGTM too. Looks like Jenkins doesn't be triggered against v10 patch for 
some reason and I just kick it off manually. 
Let's wait for Jenkins result.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.10.patch, YARN-3901-YARN-2928.2.patch, 
> YARN-3901-YARN-2928.3.patch, YARN-3901-YARN-2928.4.patch, 
> YARN-3901-YARN-2928.5.patch, YARN-3901-YARN-2928.6.patch, 
> YARN-3901-YARN-2928.7.patch, YARN-3901-YARN-2928.8.patch, 
> YARN-3901-YARN-2928.9.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-17 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803109#comment-14803109
 ] 

Sangjin Lee commented on YARN-3901:
---

The latest patch (v.10) LGTM. Thanks much [~vrushalic] for the update!

Please let me know if you have additional feedback. I'd like to commit this 
soon.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.10.patch, YARN-3901-YARN-2928.2.patch, 
> YARN-3901-YARN-2928.3.patch, YARN-3901-YARN-2928.4.patch, 
> YARN-3901-YARN-2928.5.patch, YARN-3901-YARN-2928.6.patch, 
> YARN-3901-YARN-2928.7.patch, YARN-3901-YARN-2928.8.patch, 
> YARN-3901-YARN-2928.9.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-17 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803213#comment-14803213
 ] 

Li Lu commented on YARN-3901:
-

Patch LGTM. Pending Jenkins. 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.10.patch, YARN-3901-YARN-2928.2.patch, 
> YARN-3901-YARN-2928.3.patch, YARN-3901-YARN-2928.4.patch, 
> YARN-3901-YARN-2928.5.patch, YARN-3901-YARN-2928.6.patch, 
> YARN-3901-YARN-2928.7.patch, YARN-3901-YARN-2928.8.patch, 
> YARN-3901-YARN-2928.9.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-17 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803240#comment-14803240
 ] 

Sangjin Lee commented on YARN-3901:
---

Jenkins does kick in but for some reason, it cannot post the result to the 
JIRA. The result is the following:

-1 overall

| Vote |   Subsystem |  Runtime   | Comment

|  -1  |  pre-patch  |  16m 3s| Findbugs (version ) appears to be 
|  | || broken on YARN-2928.
|  +1  |@author  |  0m 0s | The patch does not contain any 
|  | || @author tags.
|  +1  | tests included  |  0m 0s | The patch appears to include 4 new 
|  | || or modified test files.
|  +1  |  javac  |  8m 26s| There were no new javac warning 
|  | || messages.
|  +1  |javadoc  |  10m 47s   | There were no new javadoc warning 
|  | || messages.
|  +1  |  release audit  |  0m 24s| The applied patch does not increase 
|  | || the total number of release audit
|  | || warnings.
|  +1  | checkstyle  |  0m 16s| There were no new checkstyle 
|  | || issues.
|  -1  | whitespace  |  0m 50s| The patch has 9 line(s) that end in 
|  | || whitespace. Use git apply
|  | || --whitespace=fix.
|  +1  |install  |  1m 39s| mvn install still works. 
|  +1  |eclipse:eclipse  |  0m 43s| The patch built with 
|  | || eclipse:eclipse.
|  +1  |   findbugs  |  0m 52s| The patch does not introduce any 
|  | || new Findbugs (version 3.0.0)
|  | || warnings.
|  +1  | yarn tests  |  2m 37s| Tests passed in 
|  | || hadoop-yarn-server-timelineservice.
|  | |  42m 43s   | 


|| Subsystem || Report/Notes ||

| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12756422/YARN-3901-YARN-2928.10.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / b1960e0 |
| whitespace | 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-timelineservice test log | 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9191/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |

I'll remove the whitespace as I commit it. +1?

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.10.patch, YARN-3901-YARN-2928.2.patch, 
> YARN-3901-YARN-2928.3.patch, YARN-3901-YARN-2928.4.patch, 
> YARN-3901-YARN-2928.5.patch, YARN-3901-YARN-2928.6.patch, 
> YARN-3901-YARN-2928.7.patch, YARN-3901-YARN-2928.8.patch, 
> YARN-3901-YARN-2928.9.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-16 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791359#comment-14791359
 ] 

Sangjin Lee commented on YARN-3901:
---

I did a quick review, and it looks real close now. Some minor comments (mostly 
javadoc comment nits).

(TimestampGenerator.java)
- l.57: nit: "unlikely" -> "Unlikely"
- l.60: nit: should have a period (".") at the end
- l.76, 79: the same

(FlowScanner.java)
- l.50: nit: should have a period (".") at the end
- There are other javadoc comments here that do not start with capital letters 
or end with periods. Could you scan them and fix them?

(TestFlowDataGenerator.java)
- it can be non-public, right?


> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch, YARN-3901-YARN-2928.7.patch, 
> YARN-3901-YARN-2928.8.patch, YARN-3901-YARN-2928.9.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-16 Thread Joep Rottinghuis (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791446#comment-14791446
 ] 

Joep Rottinghuis commented on YARN-3901:


Thanks for patiently making updates as we figured out the details of the 
algorithm for readless aggregation and ran into the sharp edged of HBase 
coprocessors [~vrushalic]

Patch looks fine to be committed to me. I'll put further comments and tweaks in 
YARN-4062

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch, YARN-3901-YARN-2928.7.patch, 
> YARN-3901-YARN-2928.8.patch, YARN-3901-YARN-2928.9.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-16 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791334#comment-14791334
 ] 

Vrushali C commented on YARN-3901:
--

Thanks Sangjin, I corrected the patch to include those files now. 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch, YARN-3901-YARN-2928.7.patch, 
> YARN-3901-YARN-2928.8.patch, YARN-3901-YARN-2928.9.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-16 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791330#comment-14791330
 ] 

Sangjin Lee commented on YARN-3901:
---

It appears the latest patch is missing some new files. I see 
{{TestHBaseStorageFlowRun}} instead of {{TestHBaseStorageFlowRunFlowActivity}} 
but it seems much shorter than before. I'm assuming we're missing 
{{TestHBaseStorageFlowActivity}}? Also, I think we're missing 
{{TestFlowDataGenerator}}.

Could you make sure all the new files are included in the patch? Thanks.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch, YARN-3901-YARN-2928.7.patch, 
> YARN-3901-YARN-2928.8.patch, YARN-3901-YARN-2928.9.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-15 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746516#comment-14746516
 ] 

Sangjin Lee commented on YARN-3901:
---

{quote}
Yes, the algorithm makes sense. I thought JVM may have some special 
optimization with the getAndAdd method on x86 since x86 has a specific 
fetch-and-add (LOCK:XADD). However I'm not sure if this is actually reflected 
in most current JVMs (https://bugs.openjdk.java.net/browse/JDK-6973482). I'm OK 
with both, since the advantage of either one of them in our use case is 
nontrivial (https://blogs.oracle.com/dave/entry/atomic_fetch_and_add_vs). If 
keeping up with current time is important, we need to stick to the CAS based 
solution and not to change it in future.
{quote}

I see. I missed the part where you were comparing CAS with fetch-and-add. Yes, 
I think keeping up with the current time is an important requirement here. 
Also, this may not be one of the bigger performance issues IMO.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch, YARN-3901-YARN-2928.7.patch, 
> YARN-3901-YARN-2928.8.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-15 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746558#comment-14746558
 ] 

Sangjin Lee commented on YARN-3901:
---

[~vrushalic], it seems that {{package-info.java}} is not quite right, and that 
might be related with the javadoc errors. Your patch removed 
{{package-info.java}} at 
org/apache/hadoop/yarn/server/timelineservice/storage/common, but changes the 
package incorrectly to 
org.apache.hadoop.yarn.server.timelineservice.storage.common for 
{{package-info.java}} at org/apache/hadoop/yarn/server/timelineservice/storage.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch, YARN-3901-YARN-2928.7.patch, 
> YARN-3901-YARN-2928.8.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-15 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746305#comment-14746305
 ] 

Sangjin Lee commented on YARN-3901:
---

I can answer and clarify some of [~gtCarrera9]'s questions.

bq. Any special considerations on not directly using getAndAdd 
(fetch-and-increment) here?

It's essentially using AtomicLong.compareAndSet(). The few lines around it are 
mostly to keep pace with the current time. I hope that makes sense.

bq. It's fine to directly read data out in the UT now. We may want to switch 
this to use readers sometime later?

I'm actually adding more unit tests that use the reader in YARN-4074.

bq. I noticed we never remove or disable anything in this table, so do we do 
this by setting the ttl of the table? Or we create different rows for the same 
application to differentiate the day an entity was posted? I think we're using 
the latter but would like to confirm. 

We're doing the latter. We create a new record in this table any time a new 
activity is done for a given day for a flow.

bq. Also, with the current design, when the reader tries to get latest 
activities on a cluster, it can craft a row key prefix 
cluster!inv(currTime-24h) and scan from the very beginning to this record, 
right? (I'm trying to connect the two pieces together. )

If you're interested in the latest only, then it's even simpler. The prefix is 
just {{cluster!}} and we can grab from the beginning. What you mention would 
get activities for today only, which is slightly different.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch, YARN-3901-YARN-2928.7.patch, 
> YARN-3901-YARN-2928.8.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-15 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746352#comment-14746352
 ] 

Li Lu commented on YARN-3901:
-

Thanks [~sjlee0]! 

bq. It's essentially using AtomicLong.compareAndSet(). The few lines around it 
are mostly to keep pace with the current time. I hope that makes sense.
Yes, the algorithm makes sense. I thought JVM may have some special 
optimization with the getAndAdd method on x86 since x86 has a specific 
fetch-and-add (LOCK:XADD). However I'm not sure if this is actually reflected 
in most current JVMs (https://bugs.openjdk.java.net/browse/JDK-6973482). I'm OK 
with both, since the advantage of either one of them in our use case is 
nontrivial (https://blogs.oracle.com/dave/entry/atomic_fetch_and_add_vs). If 
keeping up with current time is important, we need to stick to the CAS based 
solution and _not_ to change it in future. 

bq.  We create a new record in this table any time a new activity is done for a 
given day for a flow.
Thanks. I missed the {{getTopOfTheDayTimestamp}} part. 

bq. The prefix is just cluster! and we can grab from the beginning. What you 
mention would get activities for today only, which is slightly different.
Right. Thanks for the clarification! I meant activities for the past 24 hours. 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch, YARN-3901-YARN-2928.7.patch, 
> YARN-3901-YARN-2928.8.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-15 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746239#comment-14746239
 ] 

Li Lu commented on YARN-3901:
-

Hi [~vrushalic], thanks for the work! I feel the current patch is much closer. 
Some of my review comments:

- Column.java
Let's make it clear in the comments if attributes can be null. (I believe it 
can be null but let's make it explicit. )

- TimelineWriterUtils.java
Quick question: when checking application finish event and its timestamp, we 
only search for the very last event event. Will there be some corner cases 
where the application finish event is not the last one? Such as, another event 
accidentally has the same timestamp as the finish event? I'm not sure if we 
have any strong guarantee on this (so that the chance is low). 

- TimestampGenerator.java
The lock-free algorithm to generate (locally-) unique timestamps looks good to 
me. Depend on the usage maybe we'd like to promote this algorithm to a more 
common place if other part of the code needs it as well. Any special 
considerations on not directly using getAndAdd (fetch-and-increment) here? 

- TimelineWriterUtils.java
TimelineWriterUtils#getTagFromAttribute is much better than the previous 
versions. It may be better to let the key of attribute entry to identify its 
own type? Right now we're requiring AggregationOperations and 
AggregationCompactionDimension to have disjoint enum values. This implicit 
requirement may be error pruning: if two attributes accidentally have DEFAULT, 
we're confusing them. Not sure if this is the top priority for now but we may 
need to fix it sometime later. 

- FlowScanner.java
Unify findMaxCell and findMinCell? 
I'm a little bit confused on SUM and SUM_FINAL: what's the difference (we treat 
them in the same way in collectCells)? 
Right now we treat all metric values to be Long. Let's decide that later. 

- FlowRunTable.java
We're mixing real "info" with metrics (info.m!x) together in the info 
column family. I think this is worth noting somewhere? 

- TestHBaseStorageFlowRunFlowActivity.java
Split into test flowRun and test flowActivity? Although added in the same JIRA, 
the function of the flowRun table and flowActivity tables are different (online 
aggregation and real-time activity info). 
It's fine to directly read data out in the UT now. We may want to switch this 
to use readers sometime later?
Do we need to test single data aggregation? The two {{getEntityMetricsApp}} 
methods only generates time series metrics. 
checkFlowRunTable(), shall we have some better ways to check the values 141 and 
57? Say, is it possible to check them dynamically rather than hard code them? 

- With regard to FlowActivityTable, maybe I'm still missing some points here, 
but how do we maintain the active flow list for the past 24 hrs? I noticed we 
never remove or disable anything in this table, so do we do this by setting the 
ttl of the table? Or we create different rows for the same application to 
differentiate the day an entity was posted? I think we're using the latter but 
would like to confirm. Also, with the current design, when the reader tries to 
get latest activities on a cluster, it can craft a row key prefix 
{{cluster!inv(currTime-24h)}} and scan from the very beginning to this record, 
right? (I'm trying to connect the two pieces together. )

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch, YARN-3901-YARN-2928.7.patch, 
> YARN-3901-YARN-2928.8.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-14 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743869#comment-14743869
 ] 

Sangjin Lee commented on YARN-3901:
---

Thanks for updating the patch [~vrushalic]. It seems that the patch does not 
apply cleanly because {{TimelineSchemaCreator.java}} has been updated for 
YARN-4102. Could you please update the patch so it applies cleanly? Thanks.

https://builds.apache.org/job/PreCommit-YARN-Build/9119/console

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch, YARN-3901-YARN-2928.7.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-14 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744270#comment-14744270
 ] 

Li Lu commented on YARN-3901:
-

OK, thanks for the info! 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch, YARN-3901-YARN-2928.7.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-14 Thread Joep Rottinghuis (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744320#comment-14744320
 ] 

Joep Rottinghuis commented on YARN-3901:


[~sjlee0] 
bq.  I remember Joep Rottinghuis mentioning that an unset timestamp is 
equivalent to Cell.getTimestamp() returning Long.MAX_VALUE. Joep Rottinghuis?

Yes, if you see the org.apache.hadoop.hbase.client.Put#addColumn method without 
a timestamp, it uses the timestamp from the Put. There are two constructors for 
a Put, one with the timestamp, one without. The one without uses 
HConstants.LATEST_TIMESTAMP which is defined as:
{code}
  /**
   * Timestamp to use when we want to refer to the latest cell.
   * This is the timestamp sent by clients when no timestamp is specified on
   * commit.
   */
  public static final long LATEST_TIMESTAMP = Long.MAX_VALUE;
{code}

That is then used (indirectly through LATEST_TIMESTAMP_BYTES) in the KeyValue 
class in the #isLatestTimestamp method, which in turn is used in the 
KeyValue#updateLatestStamp that sets it to "now" on the server side.

I'm not 100% sure (we need to test this) but I'm assuming that the 
transformation of this isLatestTimestamp happens after coprocessors or never at 
all (the cells might just be written with the latest timestamp, and one might 
not have the ability to ask what the row looked like at any particular time at 
all). I thought this might be overwritten on the server side, but can't find 
that code now.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch, YARN-3901-YARN-2928.7.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-14 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744789#comment-14744789
 ] 

Sangjin Lee commented on YARN-3901:
---

Again the jenkins result wasn't posted. Here it is:

-1 overall

| Vote |   Subsystem |  Runtime   | Comment

|  -1  |  pre-patch  |  16m 1s| Findbugs (version ) appears to be 
|  | || broken on YARN-2928.
|  +1  |@author  |  0m 0s | The patch does not contain any 
|  | || @author tags.
|  +1  | tests included  |  0m 0s | The patch appears to include 2 new 
|  | || or modified test files.
|  +1  |  javac  |  8m 4s | There were no new javac warning 
|  | || messages.
|  -1  |javadoc  |  7m 42s| The applied patch generated 166 
|  | || additional warning messages.
|  +1  |  release audit  |  0m 21s| The applied patch does not increase 
|  | || the total number of release audit
|  | || warnings.
|  +1  | checkstyle  |  0m 19s| There were no new checkstyle 
|  | || issues.
|  -1  | whitespace  |  0m 47s| The patch has 1 line(s) that end in 
|  | || whitespace. Use git apply
|  | || --whitespace=fix.
|  +1  |install  |  1m 33s| mvn install still works. 
|  +1  |eclipse:eclipse  |  0m 40s| The patch built with 
|  | || eclipse:eclipse.
|  +1  |   findbugs  |  0m 52s| The patch does not introduce any 
|  | || new Findbugs (version 3.0.0)
|  | || warnings.
|  +1  | yarn tests  |  1m 56s| Tests passed in 
|  | || hadoop-yarn-server-timelineservice.
|  | |  38m 22s   | 


|| Subsystem || Report/Notes ||

| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755862/YARN-3901-YARN-2928.8.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / b1960e0 |
| javadoc | 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/patchprocess/diffJavadocWarnings.txt
 |
| whitespace | 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-timelineservice test log | 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9134/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |


> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch, YARN-3901-YARN-2928.7.patch, 
> YARN-3901-YARN-2928.8.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-11 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741507#comment-14741507
 ] 

Sangjin Lee commented on YARN-3901:
---

Thanks [~vrushalic] for updating the patch! I did a quick review (I know you'll 
be making bit more changes for findbugs, etc.), and wanted to share feedback.

(ColumnHelper.java)
- l.70: if the timestamp is null, the *current* timestamp (not server) is used, 
right? So we should update this comment?
- l.99,104: let's use primitive long over object Long
- l.99: does this need to be non-private?

(FlowRunCoprocessor.java)
- l.146: Since {{Cell.getTimestamp()}} returns a primitive long, it will never 
be a null Long object. I remember [~jrottinghuis] mentioning that an unset 
timestamp is equivalent to {{Cell.getTimestamp()}} returning Long.MAX_VALUE. 
[~jrottinghuis]?

(TimestampGenerator.java)
- If we're going to have {{ColumnHelper}} use this, I suggest moving this to 
the storage.common package.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-11 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741575#comment-14741575
 ] 

Vrushali C commented on YARN-3901:
--

Thanks Sangjin, I will make these updates. I am also looking at the patch with 
Joep and hope to have an updated patch shortly. 

bq. Since Cell.getTimestamp() returns a primitive long, it will never be a null 
Long object. I remember Joep Rottinghuis mentioning that an unset timestamp is 
equivalent to Cell.getTimestamp() returning Long.MAX_VALUE. 
Yes, I will update this. 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-10 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739036#comment-14739036
 ] 

Vrushali C commented on YARN-3901:
--

Hi [~gtCarrera9]

The start and end times for a flow run can be evaluated if you have all the 
known start times and end times of applications in that flow run and min/max 
timestamp can be evaluated. Hence this can be determined from the flow run 
table. But in the flow activity table, the purpose is to note that a flow was 
"active" on that day, meaning an application in that flow either started, 
completed or was running on that day. So when Joep and I had reviewed my patch 
together we realized that calculating the min/max in the flow activity table 
wont work for apps that span day boundaries and so in his comment on Aug 29th, 
there is a note "No timestamp needed in FlowActivity table. Runs can start one 
day and end another. Probably start without, add later if needed." That meant 
we did not need the coprocessor to determine min or max in the flow activity 
table. Hence I removed it.

HTH
Vrushali



> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-09 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737890#comment-14737890
 ] 

Li Lu commented on YARN-3901:
-

Hi [~vrushalic], one quick question, in which discussion did we decide to 
remove flow activity coprocessor? I appreciate your pointer. 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735452#comment-14735452
 ] 

Vrushali C commented on YARN-3901:
--

Thanks [~sjlee0] for the review!

I will correct the variable ordering for static and private members as well as 
making variables final.

bq. l.210: Strictly speaking, GenericObjectMapper will return an integer if the 
value fits within an integer; so it's not exactly a concern for min/max 
(timestamps) but for caution we might want to stay with Number instead of long
Comparisons are not allowed for Number datatype. 
{code} 
The operator < is undefined for the argument type(s) java.lang.Number, 
java.lang.Number
{code} 

So I would have to do something like {code} Number d = a.longValue() + 
b.longValue(); {code}  Do you think this is better? 

bq. l.52: Is the TimestampGenerator class going to be used outside 
FlowRunCoprocessor? If not, I would argue that we should make it an inner class 
of FlowRunCoprocessor. At least we should make it non-public to keep it within 
the package. If it would see general use outside this class, then it might be 
better to make it a true public class in the common package. I suspect a 
non-public class might be what we want here.
I am thinking I will need this when the flush/compaction scanner is added in. 
If you'd like, I can move it in as a non-public class for now and then move it 
out if needed. 

bq. It's up to you, but you could leave the row key improvement to YARN-4074. 
That might be easier to manage the changes between yours and mine. I'm 
restructuring all *RowKey classes uniformly.
I actually needed this in the unit test while checking the FlowActivityTable 
contents, if you want I can take it out and you can add that test case in when 
you add in the RowKey changes? 

bq. l.144: This would mean that some cell timestamps would have the unit of the 
milliseconds and others would be in nanoseconds. I'm a little bit concerned if 
we ever interpret these timestamps incorrectly. Could there be a more explicit 
way of clearly differentiating them? I don't have good suggestions at the 
moment.
Yeah, I was thinking about that too. Right now, metrics will get their own 
timestamps. For other columns, we'd be using the nanoseconds. I am trying to 
see if we can just use milliseconds.

bq. it might be good to have short comments on what each method is testing
I did try to make the unit test names themselves descriptive like 
testFlowActivityTable or testWriteFlowRunMinMaxToHBase or 
testWriteFlowRunMetricsOneFlow or testWriteFlowActivityOneFlow but I agree some 
more comments in the unit test will surely help. 

Will upload a new patch shortly, thanks! 


> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735384#comment-14735384
 ] 

Sangjin Lee commented on YARN-3901:
---

Thanks for the updated patch [~vrushalic]! I went over the new patch, and the 
following is the quick feedback. I'll also apply it with YARN-4074, and test it 
a little more.

(HBaseTimelineWriterImpl.java)
- l.141-155: the whole thing could be inside {{if (isApplication)...}}
- l.264: this null check is not needed

(FlowRunCoprocessor.java)
- l.52: Is the {{TimestampGenerator}} class going to be used outside 
{{FlowRunCoprocessor}}? If not, I would argue that we should make it an inner 
class of {{FlowRunCoprocessor}}. At least we should make it non-public to keep 
it within the package. If it would see general use outside this class, then it 
might be better to make it a true public class in the common package. I suspect 
a non-public class might be what we want here.
- l.52: let's make it final
- l.54: style nit: I think the common style is to place the static variables 
before instance variables
- Also, overall it seems we're using both the diamond operator (<>) and the old 
style generic declaration. It might be good to stick with one style (in which 
case the diamond operator might be better).
- l.144: This would mean that some cell timestamps would have the unit of the 
milliseconds and others would be in nanoseconds. I'm a little bit concerned if 
we ever interpret these timestamps incorrectly. Could there be a more explicit 
way of clearly differentiating them? I don't have good suggestions at the 
moment.

(FlowScanner.java)
- variable ordering
- l.210: Strictly speaking, {{GenericObjectMapper}} will return an integer if 
the value fits within an integer; so it's not exactly a concern for min/max 
(timestamps) but for caution we might want to stay with {{Number}} instead of 
long.

(TimestampGenerator.java)
- l.29: make it final
- variable ordering
- see above for the public/non-public comment

(FlowActivityRowKey.java)
- It's up to you, but you could leave the row key improvement to YARN-4074. 
That might be easier to manage the changes between yours and mine. I'm 
restructuring all *RowKey classes uniformly.

(TestHBaseTimelineWriterImplFlowRun.java)
- it might be good to have short comments on what each method is testing


> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Joep Rottinghuis (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735887#comment-14735887
 ] 

Joep Rottinghuis commented on YARN-3901:


Thanks [~vrushalic]. I'm going to dig through the details on the latest patch.
Separately [~sjlee0] and I further discussed the challenges of taking the 
timestamp on the coprocessor, buffering writes, app restarts, timestamp 
collisions and ordering of various writes that come on.

1) Given that we have timestamps in # millis, then multiplying by 1,000 should 
suffice. It is unlikely that we'd have > 1M writes for one column in one region 
server for one flow. If we multiply by 1M we get close to the total date range 
that can fit in a long (still years to come, but still).

2) If we do any shifting of time, we should do the same everywhere to keep 
things consistent, and to keep the ability to ask what a particular row 
(roughly) looked like at any particular time (like last night midnight, what 
was the state of this entire row).

3) We think in the column helper, if the ATS client supplies a timestamp, we 
should multiply by 1,000. If we read any timestamp from HBase, we'll divide by 
1,000.

4) If the ATS client doesn't supply the timestamp, we'll grab the timestamp in 
the ats writer the moment the write arrives (and before it is batched / 
buffered in the buffered mutator, HBase client, or RS queue). We then take this 
time and multiply by 1,000. Reads again divide by 1,000 to get back to millis 
in epoch as before.

5) For Agg operation SUM, MIN, and MAX we take the least significant 3 digits 
of the app_id and add this to the (timestamp*1000), so that we create a unique 
timestamp per app in an active flow-run. This should avoid any collisions.
This takes care of uniqueness (no collisions on a single ms), but also solves 
for older instances of a writer (in case of a second AM attempt for example) or 
any other kind of ordering issue. The write are timestamped when they arrive at 
the writer.

6) If some piece of client code doesn't set any timestamp (this should be an 
error) then we cannot effectively order the writes as per the previous point. 
We still need to ensure that we don't have collisions. If the client supplied 
timestamp if LONG.Maxvalue, then we can generate the timestamp in the 
coprocessor on the servers side, modulo the counter to ensure uniqueness. We 
should still multiply by 1K to make the same amount of space for the unique 
counter.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Joep Rottinghuis (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735900#comment-14735900
 ] 

Joep Rottinghuis commented on YARN-3901:


The one remaining issue we have to tackle is when there are two app attempts. 
The previous app attempt ends up buffering some writes, and the new app attempt 
ends up writing a final_value.
Now if the flush happens before the first attempt its write comes in, we no 
longer have the unaggregated value for that app_id in order to discard against 
(the timestamp should have taken care of this order).
We can deal with this issue in three ways:
1) Ignore (risky and very hard to debug if it ever happens)
2) Keep the final value around until it has aged a certain time. Upside is that 
the value is initially kept (for for example 1-2 days?) and then later 
discarded. Downside is that we won't collapse values as quickly on flush as we 
can. The collapse would probably happen when a compaction happens, possibly 
only when a major compaction happens. But previous unaggregated values may have 
been written to disk anyway, so not sure how much of an issue this really is.
3) keep a list of the last x app_ids (aggregation compaction dimension values) 
on the aggregated flow-level data. What we would then do in the aggregator is 
to go through all the values as we currently do. We'd collapse all the values 
to keep only the latest per flow. Before we sum an item for the flow, we'd 
compare if the app_id was in the list of most recent x (10) apps that were 
completed and collapsed. 
Pro is that with a lower app completion rate in a flow, we'd be guarded against 
stale writes for longer than a fixed time period. We'd still limit the size of 
extra storage in tags to a list of x (10?) items.
Downside is that if apps complete in very rapid succession, we would 
potentially be protected from stale writes from an app for a shorter period of 
time. Given that there is a correlation between an app completion and its 
previous run, this may not be a huge factor. It's not like random previous app 
attempts are launched. This is really to cover the case when a new app attempt 
is launched, but the previous writer had some buffered writes that somehow 
still got through.

I'm sort of tempted towards 2, since that is the most similar to the existing 
TTL functionality, and probably the easiest to code and understand. Simply 
compact only after a certain time period has passed.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736140#comment-14736140
 ] 

Sangjin Lee commented on YARN-3901:
---

Somehow the jenkins info didn't make it to the JIRA: 
https://builds.apache.org/job/PreCommit-YARN-Build/9044/

-1 overall

| Vote |   Subsystem |  Runtime   | Comment
|  -1  |  pre-patch  |  15m 55s   | Findbugs (version ) appears to be 
|  | || broken on YARN-2928.
|  +1  |@author  |  0m 1s | The patch does not contain any 
|  | || @author tags.
|  +1  | tests included  |  0m 0s | The patch appears to include 2 new 
|  | || or modified test files.
|  +1  |  javac  |  8m 6s | There were no new javac warning 
|  | || messages.
|  +1  |javadoc  |  10m 12s   | There were no new javadoc warning 
|  | || messages.
|  +1  |  release audit  |  0m 23s| The applied patch does not increase 
|  | || the total number of release audit
|  | || warnings.
|  +1  | checkstyle  |  0m 16s| There were no new checkstyle 
|  | || issues.
|  -1  | whitespace  |  0m 31s| The patch has 7 line(s) that end in 
|  | || whitespace. Use git apply
|  | || --whitespace=fix.
|  +1  |install  |  1m 34s| mvn install still works. 
|  +1  |eclipse:eclipse  |  0m 40s| The patch built with 
|  | || eclipse:eclipse.
|  -1  |   findbugs  |  0m 56s| The patch appears to introduce 7 
|  | || new Findbugs (version 3.0.0)
|  | || warnings.
|  +1  | yarn tests  |  1m 54s| Tests passed in 
|  | || hadoop-yarn-server-timelineservice.
|  | |  40m 33s   | 


Reason | Tests
 FindBugs  |  module:hadoop-yarn-server-timelineservice 


|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754731/YARN-3901-YARN-2928.5.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / e6afe26 |
| whitespace | 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-timelineservice.html
 |
| hadoop-yarn-server-timelineservice test log | 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9044/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |



> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-04 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731056#comment-14731056
 ] 

Junping Du commented on YARN-3901:
--

Thanks Vrushali for updating the patch. This is a great/huge work which also 
means it may need more rounds of review and could receive much more criticize 
than the normal case. Thanks again for being patient.
After go through the code, I had a few comments (with omitted some duplicated 
ideas with Joep or Li's comments):

In HBaseTimelineWriterImpl.java,
For isApplicationFinished() and getApplicationFinishedTime(), the event list in 
TimelineEntity is a SortedSet, so we can use last() to retrieve the last event 
instead of for each every element? It shouldn't be other events after 
FINISHED_EVENT_TYPE. Isn't it? In addition, because the method is general 
enough. In addition, we can consider to move them to TimelineUtils class so can 
be reused by other classes. Also, need to fix some indentation issues in this 
class.

In ColumnHelper.java,
{code}
+  for (Attribute attribute : attributes) {
+if (attribute != null) {
+  p.setAttribute(attribute.getName(), attribute.getValue());
+}
+  }
{code}
Do we expect null element added to attributes? If not, we should complain with 
NPE or other exception instead of ignore it silently.

In ColumnPrefix.java, Indentation issue in Javadoc.

In TimelineWriterUtils.java,
I think getIncomingAttributes() tries to clone an array of attributes with 
appending an extra attribute in AggregationOperations. May be we should have a 
javadoc to describe it. The 3 if else cases sounds unnecessary and can be 
combined.

I didn't go to coprocessor classes quite deeply but I agree with Joep's above 
comments that it need more Javadoc to explain what are outstanding methods 
doing.
In FlowRunCoprocessor.java, getTagFromAttribute() sounds like we are using 
exception to differentiate normal case in matching string with enum elements. 
Can we improve it with using EnumUtils?

In AggregationCompactionDimension.java,
I think the only usage here is to provide a method getAttribute() which return 
an attribute object mixed with app_id (in byte array). If so, why we make this 
an enum class instead of a regular class as APPLICATION_ID is the only element? 
May be more straightforward way is to have a utility class to getAttribute() 
directly.

In AggregationOperations.java,
Indentation issues.

Haven't quite go through code around flow activity table, more comments should 
comes in my 2nd round review.

Some quick check on test code, for TestHBaseTimelineWriterImplFlowRun.java,
{code}
+  Result r1 = table1.get(g);
+  if (r1 != null && !r1.isEmpty()) {
+Map values = r1.getFamilyMap(FlowRunColumnFamily.INFO
+.getBytes());
+assertEquals(2, r1.size());
...
{code}
Do we accept r1 to be null or empty result? I don't think so, so may be we 
should check the size of r1 earlier so we are not ignore the real failure cases?


> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-04 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731173#comment-14731173
 ] 

Vrushali C commented on YARN-3901:
--

Thanks [~djp] , appreciate the feedback! I too am figuring out how to make this 
more readable and sincerely appreciate everyone's time and efforts in reviewing 
it. 

Some responses:
bq. TestHBaseTimelineWriterImplFlowRun.java, Do we accept r1 to be null or 
empty result? I don't think so, so may be we should check the size of r1 
earlier so we are not ignore the real failure cases?

Right, I will add an assert not null here.

bq. In TimelineWriterUtils.java,
I think getIncomingAttributes() tries to clone an array of attributes with 
appending an extra attribute in AggregationOperations. May be we should have a 
javadoc to describe it. The 3 if else cases sounds unnecessary and can be 
combined.
I think I will update this method a bit more and add more comments so that it 
explains well what the code is doing.

bq. In ColumnHelper.java, Do we expect null element added to attributes? If 
not, we should complain with NPE or other exception instead of ignore it 
silently.
Hmm. So this method in ColumnHelper is called from several places and is nested 
between many calls from hbase writer till here. Since the list of attributes is 
a variable length list of parameters, it could be null or turn out to be empty 
if some function in between decides to remove an attribute, so this was more of 
a safety check. The list of Attributes can be modified at several places in the 
call stack, so it is not actually an error if it comes to this point as an 
empty list. But I will think over this a bit more. 

bq. For isApplicationFinished() and getApplicationFinishedTime(), the event 
list in TimelineEntity is a SortedSet, so we can use last() to retrieve the 
last event instead of for each every element? It shouldn't be other events 
after FINISHED_EVENT_TYPE. Isn't it?
Ah, I did not know that it was a sorted set, will update the code accordingly.

bq.  In addition, because the method is general enough. In addition, we can 
consider to move them to TimelineUtils class so can be reused by other classes. 
Sounds good, will refactor it.

Looks like the indentation is a bit off in some places, I will update the 
formatting as recommeded by [~gtCarrera9], [~jrottinghuis] and [~djp] 


> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-04 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731650#comment-14731650
 ] 

Sangjin Lee commented on YARN-3901:
---

[~gtCarrera9], I think we're not far off actually. I've been testing using the 
v.3 patch, and with a few more changes that Vrushali will be doing, it should 
be pretty close to a reasonably complete state. At this point, it would be more 
efficient to bring this to completion. My 2 cents.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-04 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731639#comment-14731639
 ] 

Li Lu commented on YARN-3901:
-

Hi [~vrushalic], since part of this JIRA is blocking the web UI work, but we 
may still need sometime to reach a stable state on this JIRA, is it possible to 
separate the flow activity part? In this way we may unblock the flow activity 
table queries, as well as the web services. However, if it's too hard to 
separate the JIRA we can proceed all tasks in this JIRA. Thoughts? 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-04 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731648#comment-14731648
 ] 

Sangjin Lee commented on YARN-3901:
---

(1) FlowScanner cell order issue
When I added the reader code and started testing it against the flow run unit 
tests, I found that reading the END_TIME column on the flow run table didn't 
work. The flow run column read on END_TIME is essentially 
{{Result.getValue()}}. However, HBase was failing to find the END_TIME column 
although it clearly existed in the result. It was basically failing at the 
binary search:

{code}
  public Cell getColumnLatestCell(byte [] family, byte [] qualifier) {
Cell [] kvs = rawCells(); // side effect possibly.
if (kvs == null || kvs.length == 0) {
  return null;
}
int pos = binarySearch(kvs, family, qualifier);
if (pos == -1) {
  return null;
}
if (CellUtil.matchingColumn(kvs[pos], family, qualifier)) {
  return kvs[pos];
}
return null;
  }
{code}

The binary search was failing because the cells in the result were stored in 
the wrong order.

The cells were stored in the wrong order because it was being added by our 
co-processor (in FlowScanner.nextInternal()).

{code}
189 if (runningSum.size() > 0) {
190   for (Map.Entry newCellSum : runningSum.entrySet()) {
191 // create a new cell that represents the flow metric
192 Cell c = newCell(metricCell.get(newCellSum.getKey()),
193 newCellSum.getValue());
194 cells.add(c);
195   }
196 }
197 if (currentMinCell != null) {
198   cells.add(currentMinCell);
199 }
200 if (currentMaxCell != null) {
201   cells.add(currentMaxCell);
202 }
{code}

And this order is preserved all the way to the reader. The fix is to add the 
cells in the right order via KeyValueComparator. This fix is included in my 
patch on YARN-4074. This will be fixed in this JIRA.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-04 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731651#comment-14731651
 ] 

Li Lu commented on YARN-3901:
-

[~sjlee0] sure, then let's keep all the work here! 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-04 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731654#comment-14731654
 ] 

Sangjin Lee commented on YARN-3901:
---

(2) colliding puts in the co-processor
We found another issue with the write side of things via unit test. It fails 
only occasionally (and more often on some environments than others). It happens 
when 2 puts on the same column are coming in very closely (namely within 1 
millisecond). The code in question is {{FlowRunCoprocessor.prePut()}}:

{code}
  for (Map.Entry entry : put.getFamilyCellMap()
  .entrySet()) {
List newCells = new ArrayList<>(entry.getValue().size());
for (Cell cell : entry.getValue()) {
  // for each cell in the put add the tags
  // Assumption is that all the cells in
  // one put are the same operation
  newCells.add(CellUtil.createCell(CellUtil.cloneRow(cell),
  CellUtil.cloneFamily(cell), CellUtil.cloneQualifier(cell),
  cell.getTimestamp(), KeyValue.Type.Put,
  CellUtil.cloneValue(cell), Tag.fromList(tags)));
}
newFamilyMap.put(entry.getKey(), newCells);
  } // for each entry
{code}

If 2 cells for example carry the same timestamp, then the later one ends up 
overwriting the previous one, effectively losing one put. This was triggered by 
one of the tests in {{TestHBaseTimelineWriterImplFlowRun.java}}.

It's an edge case which is rather unlikely to happen normally, but is an issue 
nonetheless. And how to solve this problem is pretty complicated. We'll soon 
post possible approaches for handling this.

But at any rate, I suspect we could isolate this issue into a separate JIRA, 
and tackle it post-UI-POC. I'd appreciate your feedback.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-03 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729511#comment-14729511
 ] 

Vrushali C commented on YARN-3901:
--


Thanks [~gtCarrera9] for the review. Let me try to give some explanations to 
your questions above. 

bq. Name of Attribute seems to be quite general. Maybe we want something more 
specific? From my understanding, Attribute acts as the "command" (as the 
meaning in design pattern) of the aggregation?

Yes, attribute indicates what action needs to be taken in the aggregation 
step/reading step. What name do you recommend? 

bq. TimelineSchemaCreator May conflict with YARN-4102. I'm fine with either 
order to put them in.
Yes, I thought about that, but the patch in YARN-4102 is not committed yet, so 
could not rebase. I am good with rebasing the YARN-3901 patch if YARN-4102 gets 
in.

bq. Are we assuming there will be at most two attributes for each column 
prefix? In FlowScanner we're only dealing with two attributes, one from 
compaction one from operations. But in FlowActivityColumnPrefix we're assuming 
there's a list of attributes?
No, there can be any number of attributes for a column prefix. Currently MIN, 
MAX and SUM happen to be exclusive in the sense, if you want a min for start 
time, it's unlikely that you want to be SUMing  up the start times. FlowScanner 
looks for application id from the aggregation compaction dimensions and for the 
MIN/MAX/SUM from the aggregation operations. This FlowScanner class is very 
different from FlowActivityColumnPrefix. The FlowActivityColumnPrefix or any 
class that generates a Put will deal with a list of attributes. 

bq. What is our plan on FlowActivityColumnPrefix#IN_PROGRESS_TIME? 
Yes, this is timestamp that needs to be put into the flow activity table for 
all (long) running applications. If an application in a flow starts on say Day1 
and runs through day 2, day 3 and ends on day4, then the flow activity table 
needs to have an entry for this flow for day2 and day3. This is the in progress 
time of that application, it is the TBD part being thought over in YARN-4069. 
We need to think if we want the RM to write it, or the App master or something 
else offline. 

bq. In FlowScanner, after aggregation (in nextInternal) we're simply adding 
aggregated data as a Cell. However I haven't found where we're guaranteeing the 
new node is not aggregate again (and we create another new cell for the 
aggregation result). Are we doing this deliberately or I'm missing anything 
here?
Hmm, not sure I got the question but let me try to explain what the FlowScanner 
should be doing. It will read each cell one by one. Say for start time column, 
it reads the cells. Now for a flow, we want that value which is the lowest for 
the start time of the flow. Hence these cells have a tag of MIN. So, the 
nextInternal will return one cell with the min value for the column start time. 
Similarly for max and for SUM, it sums up the cell values. Hope this helps.

Also, I will double check the formatting related comments and update as 
necessary. Appreciate the review! 



> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-03 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729886#comment-14729886
 ] 

Vrushali C commented on YARN-3901:
--


bq. From the code I can see we add the newly created cell to the list of cells 
(cells.add(someNewCell)). Is this operation only modifying the returned value 
of the next call, or it permanently writes the aggregated values back to HBase? 
This may sound silly but I'm not very familiar with the observer coprocessors.

Yes, in this code, we are only "reading" or returning back cells to the client 
(hbase client). But when we add in the compaction/flush coprocessors, they will 
write back to hbase as well. 

bq. Maybe a name that is more specific will help? How about something like 
AggregationPutAttribute?
Hmm. It's not really an attribute for the Put, it's more like the 
characteristic of the cell value itself. We could use these attributes outside 
of Aggregation as well, so don't want the agg prefix here. Still thinking what 
to call it. Perhaps FlowValueAttribute? 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-03 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729890#comment-14729890
 ] 

Li Lu commented on YARN-3901:
-

bq. Yes, in this code, we are only "reading" or returning back cells to the 
client (hbase client). But when we add in the compaction/flush coprocessors, 
they will write back to hbase as well.
Oh I see. This is the missing piece that confused me. Thanks for the 
clarification! 

bq. We could use these attributes outside of Aggregation as well, so don't want 
the agg prefix here. 
OK, if this is the plan, let's leave it here to unblock the critical path of 
the whole JIRA? We can clean this up later. 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-03 Thread Joep Rottinghuis (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729910#comment-14729910
]

Joep Rottinghuis commented on YARN-3901:

Not sure if [~gtCarrera9] is asking what happens if we have consolidated a
"finished" app into the flow aggregate and then we get an older batched put
from the same app (from an older AM that wasn't properly killed or something).
That is currently an open problem that we can address by storing the list of x
most recently completed apps in a tag on the total sum. We can't quite store
all app IDs that have been collapsed into the flow sum, because that list could
potentially be really long. We could keep a list of the last x apps, to
radically reduce the likelihood that stale batched writes end up messing up the
aggregation if there were to be a race condition (however rare that might be).
I think we can add that guarding behavior in a separate jira and not further
complicate the first cut (that might suffer from this rare race condition).

As [~sjlee0] pointed out and we were discussing offline this morning
scan.setMaxResultSize(limit); doesn't limit the # rows that are returned, but
limits the size in bytes. Not sure we want to address that here, or if we'll
let him adjust that in his patch.
We should limit the number of rows we retrieve from scan and if needed (as
[~sjlee0] pointed out) add a PageFilter with a limit in addition to the limit
to further restrict what is prefetched in a buffer (gets more complicated when
scan spans region servers).

It is a little confusing to read the patch and see where ColumnPrefix has just
an additional argument in the store method for Attribute... attributes and when
the method with a String qualifier is changed to one with a byte[]. There seems
to be a discrepancy in how EntityTablePrefix and ApplicationTablePrefix are
handles. I'm not sure if is needed to have a getColumnQualifier with a long in
ColumnHelper, but we may have to review this together interactively behind two
laptops.

In HBaseTimelineWriterImpl. onApplicationFinished
You have an old comment:
281// clear out the hashmap in case the downstream store/coprocessor
282 // added any more attributes during the put for END_TIME
that no longer makes sense. Also, I'd simply do:
{code}
storeFlowMetrics(rowKey, metrics, attribute1,
AggregationOperations.SUM_FINAL.getAttribute());
{code}

I'd update the javadoc in AggregationOperations before the enum definitions.
They are the old style and no longer make sense in the more generic case.
SUM indicates that the values need to be summed up, MIN means that only the
minimum needs to be kept etc.

Initially I found the method name getIncomingAttributes and corresponding
member names somewhat confusing (from the method perspective it isn't the
incoming values, it is the outgoing values). Perhaps combinedAttribute and
combineAttributes(...) makes more sense, but the provided logic seems correct.

FlowActivityColumnPrefix.IN_PROGRESS_TIME needs a better javadoc description to
describe its use and meaning.

The coprocessor methods need a little more javadoc to explain what is going on.
To the casual reader this is total voodoo.
The preGetOp creates a new scanner (ok), then does a single next on it (why?)
and then bypasses the environment (huh?).
Similarly if in preScannerOpen we already set scan.setMaxVersions(); then why
is the same still needed in PreGetOp, but in PostScannerOpen we don't do it
anymore (presumably already done in the preOpen).

I like the more generic FlowRunCoprocessor (although it can have a name that is
not associated with a table, because behavior is generic, and arg names such as
frpa are probably artifact from previous version).
In getTagFromAttribute, is it possible to recognize a operation from an
AggregationCompactionDimension without relying on an exception and catching it?
For example, can you do AggregationOperation.isA(Attribute a) or something like
that?

The other thing I realize with the coprocessor is this. It nicely maps
attributes to tags, but we unnecessarily bloat every single put with the
operation.
We could get creative and use a different column prefix for min and max
columns. Then the coprocessor can pick that up during read/flush/compaction.
That makes queries (and filters) much harder. So for now we're probably stuck
with tagging each value. Perhaps not so bad for min and max given that after
flush and compact we store only one value.

For SUM we will always have an Aggregation dimension, so adding a SUM tag then
isn't needed. We assume an aggregation dimension w/o agg operation would
default to SUM. We do certainly need to tag values with SUM_FINAL.

Aside from that, in FlowRunProcessor.prePut do we have to keep doing
Tag.fromList(tags) for each cell, or can we create a Tag once and
re-use it?

When reading through the

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-03 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729821#comment-14729821
 ] 

Li Lu commented on YARN-3901:
-

Hi [~vrushalic], some of my commnets:

bq. Yes, attribute indicates what action needs to be taken in the aggregation 
step/reading step. What name do you recommend?
Maybe a name that is more specific will help? How about something like 
AggregationPutAttribute?

bq. I am good with rebasing the YARN-3901 patch if YARN-4102 gets in.
Given the timing of both patches, let's proceed both patches and rebase the 
later one. Should be trivial. 

bq. No, there can be any number of attributes for a column prefix.
Sure. So maybe we can clarify this somewhere that these AggregationOperations 
options are exclusive in FlowScanner (i.e. adding more than one of them will 
not generate two aggregated results)? 

bq. Yes, this is timestamp that needs to be put into the flow activity table 
for all (long) running applications. 
Ah I see. That makes sense. 

bq. It will read each cell one by one. Say for start time column, it reads the 
cells. Now for a flow, we want that value which is the lowest for the start 
time of the flow. Hence these cells have a tag of MIN. So, the nextInternal 
will return one cell with the min value for the column start time. Similarly 
for max and for SUM, it sums up the cell values. 
Thanks for the clarification. From the code I can see we add the newly created 
cell to the list of cells ({{cells.add(someNewCell)}}). Is this operation only 
modifying the returned value of the next call, or it permanently writes the 
aggregated values back to HBase? This may sound silly but I'm not very familiar 
with the observer coprocessors. 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-02 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728367#comment-14728367
 ] 

Li Lu commented on YARN-3901:
-

Hi [~vrushalic], thanks for the work! I looked at the current patch and have 
the following comments/questions:
- Name of Attribute seems to be quite general. Maybe we want something more 
specific? From my understanding, Attribute acts as the "command" (as the 
meaning in design pattern) of the aggregation? 

HBaseTimelineWriterImpl
- storeInFlowActivityTable, why two levels of indirections? The te parameter is 
never used in the first wrapper. 
- Move the few static helper methods into TimelineEntity? 
- Since both AggregationCompactionDimension and AggregationOperations may 
generate an Attribute, maybe it's helpful to distinguish Attributes from these 
two ways from their names? The name like attribute1 does not look like helpful. 
:(

getIncomingAttributes in TimelineWriterUtils may need some more comments on its 
function. 

TimelineSchemaCreator
May conflict with YARN-4102. I'm fine with either order to put them in. 

Are we assuming there will be at most two attributes for each column prefix? In 
FlowScanner we're only dealing with two attributes, one from compaction one 
from operations. But in FlowActivityColumnPrefix we're assuming there's a list 
of attributes? 

Maybe I'm missing something, but why we're converting hbase attributes into 
tags in FlowRunCoprocessor, but not doing the same thing in 
FlowActivityCoprocessor? Or, what does FlowActivityCoprocessor aggregate on?

What is our plan on FlowActivityColumnPrefix#IN_PROGRESS_TIME? 

In FlowScanner, after aggregation (in nextInternal) we're simply adding 
aggregated data as a Cell. However I haven't found where we're guaranteeing the 
new node is not aggregate again (and we create another new cell for the 
aggregation result). Are we doing this deliberately or I'm missing anything 
here?

nits: 
- l.149, l310, HBaseTimelineWriterImpl
Indentation problems?
- There are some lines are longer than 80. 
- l.64 AggregationOperations, wrong indentation with tab. 
- FlowRunColumnPrefix, FlowActivityColumnPrefix (maybe somewhere else): in 
Hadoop a common practice is to only have one space before the name of member 
variables. We don't really need to make all of them start in the same column. 
- Just curious, how did you choose the numbers associated with different 
AggregationOperations? 

It's a big patch so I may find something more tomorrow. Sorry about that but I 
just want to not to block the whole review process. 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want

41 matches

Mail list logo