[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15369761#comment-15369761 ] Hudson commented on YARN-3049: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10074 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10074/]) YARN-3049. [Storage Implementation] Implement storage reader interface (sjlee: rev 9e5155be363c6610ccf41fe08b7f1394f353ea65) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityColumnPrefix.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/apptoflow/AppToFlowColumn.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timelineservice/TimelineEntity.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/apptoflow/package-info.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/common/TimelineEntitySchemaConstants.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/TimelineSchemaCreator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/common/BaseTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/HBaseTimelineReaderImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/common/TimelineReaderUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/apptoflow/AppToFlowRowKey.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/common/ColumnPrefix.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/apptoflow/AppToFlowColumnFamily.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/apptoflow/AppToFlowTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/FileSystemTimelineReaderImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityRowKey.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/TestHBaseTimelineWriterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/HBaseTimelineWriterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityColumnFamily.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityColumn.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/common/TimelineHBaseSchemaConstants.java > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Fix For: YARN-2928 > > Attachments: YARN-3049-WIP.1.patch, YARN-
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662068#comment-14662068 ] Junping Du commented on YARN-3049: -- +1. Patch LGTM. [~sjlee0], please feel free to go ahead to check in latest patch. Thx! > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662060#comment-14662060 ] Sangjin Lee commented on YARN-3049: --- Let me know if there is any additional comments. I'll wait for about an hour before committing this. Thanks. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661313#comment-14661313 ] Vrushali C commented on YARN-3049: -- Filed https://issues.apache.org/jira/browse/YARN-4025 for all the timestamp/long/byte to string etc conversions and adding in other apis and functions as needed to support the conversions/argument passing. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661012#comment-14661012 ] Sangjin Lee commented on YARN-3049: --- Yes, +1 with proceeding with this patch and addressing the long conversion in another JIRA. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661007#comment-14661007 ] Li Lu commented on YARN-3049: - I checked EntityRowKey.java and seems like we never convert flowRunIds into Strings when forming a row key. I think we're fine since we always treat row keys as byte arrays? > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660991#comment-14660991 ] Vrushali C commented on YARN-3049: -- bq. I'm worried that Bytes.toString() doesn't make the long integer be stored as the way we want. Yes, when we have a long value being stored, we need to store it as Bytes.toBytes(Long) not as a Bytes.toBytes(Long value as String). When it is stored as long, it will be stored sorted as per numerical sort. The same applies to row key. We need to ensure we store Long as Bytes.toBytes(Long) to ensure numerically sorted order. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660987#comment-14660987 ] Vrushali C commented on YARN-3049: -- Yes I will take that jira up. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660952#comment-14660952 ] Zhijie Shen commented on YARN-3049: --- As the issue is not blocking the whole reader implementation, how about letting this patch in first? [~sjlee0]? Some more comments about the issue: 1. ColumnHelper needs to be updated as well to return a byte[] column name instead of a String one. 2. I'm worried that Bytes.toString() doesn't make the long integer be stored as the way we want. If it isn't stored as the 8 bytes, we may not guarantee the order of event columns. 3. FlowRunId in the row key should be fine, because the row key is never converted to String again. But it's good to double check. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660935#comment-14660935 ] Li Lu commented on YARN-3049: - bq. Also, to Li Lu's point, we should provide an additional api for getColumnQualifier which accepts a pre-encoded byte array. It will be in addition to the existing api which accepts a String, so that we can use either one as applicable. +1 for this solution. We can address this in another JIRA so that we're not blocking the reader patch? Would you like to take this JIRA [~vrushalic]? If not I can do the fix. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660923#comment-14660923 ] Vrushali C commented on YARN-3049: -- It looks like the conversion back to String was done to avoid additional API on store. But if this is causing issues with a long value being the column qualifier, I think we should modify/add to the store api to include one which accepts a byte array for the compoundColumnQualifier. Specifically I think this code should be changed to avoid unnecessary conversions between longs to bytes to strings. I thought about changing this in my earlier patch but did not think it was causing issues, hence kept it the way it was. {code} byte[] compoundColumnQualifierBytes = Separator.VALUES.join(columnQualifierWithTsBytes, Bytes.toBytes(info.getKey())); // convert back to string to avoid additional API on store. String compoundColumnQualifier = Bytes.toString(compoundColumnQualifierBytes); EntityColumnPrefix.EVENT.store(rowKey, entityTable, compoundColumnQualifier, null, info.getValue()); {code} Also, to [~gtCarrera]'s point, we should provide an additional api for getColumnQualifier which accepts a pre-encoded byte array. It will be in addition to the existing api which accepts a String, so that we can use either one as applicable. What do you think [~gtCarrera] > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660902#comment-14660902 ] Li Lu commented on YARN-3049: - A little bit more investigation shows that we're using Strings for column qualifier type in our HBase interfaces. They are then encoded into byte arrays in getColumnQualifier() helper function. Given the fact that we may want to add timestamps in column qualifiers, at least we have the following two solutions: # Have a getColumnQualifier() helper function that works on pre-encoded byte arrays? # Change the interface of getColumnQualifier() into byte arrays? Maybe we have some better options, but so far I'm leaning towards the first way, although this makes parsing one column family more tricky. Meanwhile, I think the problem is beyond the scope of this JIRA (it's more like a whole stack fix rather than the reader itself). Therefore I propose to address the problem in a separate JIRA and move forward with the current patch. Any comments [~sjlee0] [~jrottinghuis] [~vrushalic]? Thanks! > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660877#comment-14660877 ] Li Lu commented on YARN-3049: - Hi [~vrushalic], I think the conversion to string happens on the write code path, in YARN-3984, as: {code} + byte[] compoundColumnQualifierBytes = + Separator.VALUES.join(columnQualifierWithTsBytes, + null); + String compoundColumnQualifier = + Bytes.toString(compoundColumnQualifierBytes); + EntityColumnPrefix.EVENT.store(rowKey, entityTable, + compoundColumnQualifier, null, TimelineWriterUtils.EMPTY_BYTES); {code} Are we sure {{compoundColumnQualifier}} is fine with the attached long values? Thanks! > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660867#comment-14660867 ] Vrushali C commented on YARN-3049: -- Hi [~zjshen] In my experience, that kind of conversion between Long to Bytes to String to Bytes to Long does not work. When an object is serialized as a Bytes.toBytes (Long) , we cannot read it back as a Bytes.toString(). It has to be read back as Bytes.toLong(). Is there any reason you need to use String to carry values across? Could you use byte[] instead and then convert them back as appropriate? thanks Vrushali > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660853#comment-14660853 ] Zhijie Shen commented on YARN-3049: --- Here's a quick example: {code} @Test public void test() { // imitate the process to write a long Long a = 1234567890L; byte[] b = Bytes.toBytes(a); String c = Bytes.toString(b); // imitate the process to read a long byte[] d = Bytes.toBytes(c); Long e = Bytes.toLong(d); assertEquals(a, e); } {code} b and d are different bytes, then. Do I use Bytes in a wrong way? > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660444#comment-14660444 ] Sangjin Lee commented on YARN-3049: --- The latest patch (v.7) looks good to me. Which timestamp are you seeing the issue with? Or is it with any timestamp? > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659413#comment-14659413 ] Hadoop QA commented on YARN-3049: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 11s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 49s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 17s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 11s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 20s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 24s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 43m 2s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748985/YARN-3049-YARN-2928.7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 895ccfa | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8779/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8779/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8779/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8779/console | This message was automatically generated. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659256#comment-14659256 ] Sangjin Lee commented on YARN-3049: --- The latest patch looks good to me overall. Just a couple of comments. I concur with [~gtCarrera9] that it might be a good idea to create more abstract methods around it. Note that we may be writing to other tables at this point too. We can even create private helper methods that check whether the entity is an application and so on. It's not critical but could be helpful... Also, in {{HBaseTimelineWriterImpl}}, I see that the app-to-flow table is not being flushed. Either we should flush at the end of the write, or add it to the {{flush()}} method. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659012#comment-14659012 ] Li Lu commented on YARN-3049: - Hi [~zjshen], letting HBase implementation locally looks good to me. One minor comment for the latest patch is, maybe we want to separate the logic like {{if (te.getType().equals(TimelineEntityType.YARN_APPLICATION.toString()))}} in HBaseWriterImpl into a separate private method? I think it will be much clearer to say something like: {code} if (te.getType().equals(TimelineEntityType.YARN_APPLICATION.toString())) { updateAppToFlowTable(te); } {code} As [~sjlee0] mentioned above that we may have some other specialization within HBaseWriterImpl, so maybe it's helpful to let these special designs stand out? > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654737#comment-14654737 ] Hadoop QA commented on YARN-3049: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 34s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 0s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 58s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 19s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 9s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 26s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 22s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 24s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 43m 44s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748758/YARN-3049-YARN-2928.6.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / bf65663 | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8767/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8767/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8767/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8767/console | This message was automatically generated. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654341#comment-14654341 ] Sangjin Lee commented on YARN-3049: --- {quote} I'm trying to understand the discussion here. Yes, and what we worked quite hard to avoid is to identify the types of the incoming entities, in a writer, so that we can apply different write code paths. If this is the case, maybe we can refactor the write method so that it contains an expandable context object? We can easily encapsulate flags in a BitSet-like object, and we may add more if needed. The only problem I'm wondering about is, is it possible for the caller to easily generate a context with all required information (such as isNewApp or appFinish)? BTW, I believe we need to refactor the interface of the read and write methods to use some sorts of contexts anyways. Our current argument lists are not expandable. So if this helps, maybe we can move forward by refactor the write interfaces? {quote} Another place where {{HBaseTimelineWriterImpl}} would check for the entity type (being the application) is splitting the application table (YARN-3906). The current patch checks the type of the entity to be able to send writes to different tables. So that would need to be included in the discussion as well. I completely understand the desire that we want to make writers as much agnostic about entity types and data as possible. However, since a lot of things in the schema need to be based on the applications (flow context, the application table, flow run aggregation, etc.), the need to support that strongly is real. We can either go the route of having the write recognize applications and some of their events strongly (at the expense of making the separation between entities and writers a little weaker), or try to create a context for this decision (as [~gtCarrera9] suggested) and have the writer act on it. As for the latter option, while it still shields the writer from knowing details about entities, it would still need to know similar attributes (e.g. "application created", "whether the entity is an application", etc.), only in a more passive manner. Thoughts? > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652854#comment-14652854 ] Sangjin Lee commented on YARN-3049: --- {quote} Then, we uniformly process the entities no matter what their type is. What we discussed so far implies that we cannot only treat the entities so generally. For application entity, we may need to take an additional step to parse its start/finish event to write more records. {quote} I understand that we want to do that as much as possible. However, we made several calls in terms of schema that call out apps pretty explicitly, and to implement that some amount of special treatment of the application entities is required. For example, the app-to-flow table is already a special table for applications. Similarly, real-time aggregation takes values from application entities to the flow run level. I don't think it's as bad as it might sound. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652850#comment-14652850 ] Li Lu commented on YARN-3049: - bq. What we discussed so far implies that we cannot only treat the entities so generally. For application entity, we may need to take an additional step to parse its start/finish event to write more records. I'm trying to understand the discussion here. Yes, and what we worked quite hard to avoid is to identify the types of the incoming entities, in a writer, so that we can apply different write code paths. If this is the case, maybe we can refactor the write method so that it contains an expandable context object? We can easily encapsulate flags in a BitSet-like object, and we may add more if needed. The only problem I'm wondering about is, is it possible for the caller to easily generate a context with all required information (such as isNewApp or appFinish)? BTW, I believe we need to refactor the interface of the read and write methods to use some sorts of contexts anyways. Our current argument lists are not expandable. So if this helps, maybe we can move forward by refactor the write interfaces? > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652823#comment-14652823 ] Hadoop QA commented on YARN-3049: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 21m 18s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 10m 45s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 23s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 51s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 14s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 51s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 53m 3s | Tests failed in hadoop-yarn-server-resourcemanager. | | {color:red}-1{color} | yarn tests | 0m 24s | Tests failed in hadoop-yarn-server-timelineservice. | | | | 105m 56s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority | | Failed build | hadoop-yarn-server-timelineservice | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748542/YARN-3049-YARN-2928.5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / df0ec47 | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8754/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8754/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8754/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8754/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8754/console | This message was automatically generated. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652806#comment-14652806 ] Zhijie Shen commented on YARN-3049: --- Okay, what will the timestamp be used to do? If there're too much context info required, I agree it's not elegant to incrementally expose them to the backend. One step back, I start to understand that the real situation actually deviates from what I originally thought about the storage layer. When defining the data model, I defined a generic TimelineEntity and make other first-class citizen entities extend it. Then, we uniformly process the entities no matter what their type is. What we discussed so far implies that we cannot only treat the entities so generally. For application entity, we may need to take an additional step to parse its start/finish event to write more records. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652709#comment-14652709 ] Sangjin Lee commented on YARN-3049: --- I like that approach better than the previous. Thanks for the update. How would we be able to handle the "app finished" event? That needs to be supported too for other tables, and adding another flag to the context doesn't seem too appealing? Also, the timestamp of these events are important as they need to be written to some secondary tables. How can we captured them? If {{HBaseTimelineWriterImpl}} needs to recognize and read the event timestamp, then we might as well just look for those events, right? Any thoughts on these? > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652206#comment-14652206 ] Zhijie Shen commented on YARN-3049: --- Hi Sangjin, Thanks for your comments. The proposed method will work for now and can minimize the change we should make. In fact, I used to think of this method too. The reason why I abandoned it is that the method couple the business logic and data storage. It potentially increase the risk that the change in the business logic will break the storage layer. For example, we rename app_created as app_started. This may be still easy to fix, but the maintenance difficulty is likely to increase as logic grows more complex. That's why I think we should let app collector to tell the backend that it's the first request. On the other side, I agree RM should be responsible for this too. Actually this is also what I did in the current patch. If you think my proposal of letting app collector to determine if it is the first request, the way we can do is to extend RM app collector and implement this logic there. Thanks, Zhijie > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652140#comment-14652140 ] Sangjin Lee commented on YARN-3049: --- When {{HBaseTimelineWriterImpl}} processes events for writes, it could have a rule for those couple of special events (identified by the entity type = "yarn application", event type = "application created" or "application finished"), and trigger those events, right? I understand that it is bit unnatural for {{HBaseTimelineWriterImpl}} to recognize those events explicitly, but that could make this self-contained, right? I think this is a rather important point because there are more tables that need to be written to on application creation and completed and also more data than the flow context. For example, the schema proposal calls for writing the application start time and the application end time upon receiving those events among others. We want to have a single point where all these are done. See https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf for more details. I also think that doing it via the RM timeline collector is probably the best for this. The RM timeline collector is the one that's writing these events to begin with, and it can do that without worrying about the *app* timeline collector starting up in time, etc. Thoughts? > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650114#comment-14650114 ] Hadoop QA commented on YARN-3049: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 21m 48s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 11m 50s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 25s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 5s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 16s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 48s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 52s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 26s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 53m 17s | Tests failed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 1m 26s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 111m 15s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748249/YARN-3049-YARN-2928.4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / df0ec47 | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8741/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8741/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8741/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8741/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8741/console | This message was automatically generated. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650106#comment-14650106 ] Zhijie Shen commented on YARN-3049: --- What I meant before is that HBaseTimelineWriterImpl is not aware of a life cycle/session of the application, such that it's hard to detect the app creation event inside HBaseTimelineWriterImpl and make it transparent the caller. Instead, app collector can know if it is the first put request for this app sent to the writer. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650075#comment-14650075 ] Sangjin Lee commented on YARN-3049: --- I thought that the application created event would be written by the RM (and its embedded collector), no? So I'm not sure if the writer (for the app timeline collector) being bound to the session of the application is an issue. Maybe I misunderstood your comment? In essence, the flow of control that I was thinking of is not really different than your v.3 patch. My point was more about the way we're passing that information. I think it should be possible from inside {{HBaseTimelineWriterImpl}} to detect that it received an application created event (likely originating from RM) and trigger writing to these tables. Also, note that we want to store the application created timestamp, and also application finished event along with its timestamp. That's not for this table but for other tables that are mentioned in the schema proposal doc. To be able to do these as well, it would be most natural to do it based on seeing these events. Thoughts? > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649915#comment-14649915 ] Zhijie Shen commented on YARN-3049: --- I uploaded a new patch to address Sangjin's comments except bellow: bq. l.93: What does it mean to indicate newApp for a set of entities? What if the set of entities contains bunch of different applications? I don't worry about this, because the the put request to the app collector is related to the same app. bq. See comments above; rather than relying on the boolean flag in the arguments, can we detect the case of the application created event and do it? See my comments above. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649770#comment-14649770 ] Zhijie Shen commented on YARN-3049: --- [~sjlee0], yeah, I agree it's not a decent solution to let the user code to trigger writing the app to flow mapping. The reason why I did this before is that we can avoid check and put for each individual entity put request, which will obviously slow dow the write path. Detecting the application created event sounds a reasonable option. However, I'm afraid we cannot hide it inside the writer as the implementation detail, because the writer is bind to the session of an application. One solution I can think of is tackling the session start in the app collector. Upon the first put request received by the app collector, we tell the writer to also write the app to flow mapping. What do you think? > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649658#comment-14649658 ] Sangjin Lee commented on YARN-3049: --- Sorry [~zjshen] it took me a while to get to this. The patch looks pretty good actually. I have one high level point I'd like to discuss with you, and several smaller comments. I see that you added a new boolean argument in {{TimelineCollector.putEntity()}}, {{TimelineCollector.putEntities()}}, and {{TimelineWriter.write()}} to indicate we're dealing with a new app (and thus writing to the app-to-flow table). I'm not sure whether that is really what we want to do. Can we not detect and leverage the fact that we're dealing with an "application created" event and trigger those actions instead of having an explicit argument that gets passed down all the way from the clients? First, in this approach we would be completely relying on the client code to specify this correctly. Secondly, I would argue that the fact that we need to detect that we're introducing a new application and write to these tables is somewhat of an "implementation detail" of the HBase writer. For example, other writers may not even care about that and have no need for it. The fact that this detail leaks all the way to the callers is awkward at best. My initial thinking of how to do this was inside {{HBaseTimelineWriterImpl}} on detecting the application created event to trigger this action. What do you think? (TimelineEntity.java) - l.138: it might be better to use the type {{SortedSet}} or {{NavigableSet}} to make it explicit we want ordering (TimelineCollector.java) - l.93: What does it mean to indicate newApp for a set of entities? What if the set of entities contains bunch of different applications? (HBaseTimelineWriterImpl.java) - See comments above; rather than relying on the boolean flag in the arguments, can we detect the case of the application created event and do it? (ColumnPrefix.java) - l.67: nit: I think the word "from" is needed there. It's just that the space was missing between "result" and "from". (TimelineReaderUtils.java) - l.33: nit: "both matches" -> "both match" > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648336#comment-14648336 ] Hadoop QA commented on YARN-3049: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 2s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 45s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 42s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 16s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 25s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 46s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 53m 2s | Tests passed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 1m 24s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 97m 43s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748046/YARN-3049-YARN-2928.3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / df0ec47 | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8722/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8722/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8722/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8722/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8722/console | This message was automatically generated. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648243#comment-14648243 ] Li Lu commented on YARN-3049: - Hi [~zjshen]! Some of my comments: bq. As I see a lot of arguments for the reader interface (as well as the writer one) and the potential signature change in future (e.g, adding newApp in this patch), I start to think of grouping the primitive arguments, shielding them in some category object, such as EntityContext, EntityFilters, Opts and so on, and using these as the arguments of the interface instead. I agree. Actually I spent quite some time wondering if we really need to add the {{newApp}} argument in this patch. Encapsulating all related information into a category object appears to be a nice way to avoid future interface changes. +1. bq. Given it may be a non-trivial work, can we get this patch in and follow up the filter change in another jira just in case? Definitely. Let's consolidate the whole workflow first. Then we can start these improvements. bq. In fact, it has been tested. I change the write path by letting newApp = true, and check if we can query the entity successfully without giving the flow/flowRun explicitly. However, I didn't do much assertion around the fields of retrieved entities, because I consider of deferring this work together with rewriting the whole HBase backend unit test. Sounds good to me. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648218#comment-14648218 ] Zhijie Shen commented on YARN-3049: --- [~gtCarrera9], thanks for review. I've addressed most of your comments in the new patch exception followings: bq. However, I still incline to proceed the changes in this JIRA so that we can speed up consolidating our POC patches. Exactly. bq. Reader interface: use TimelineCollectorContext to package reader arguments? Yeah, I can see the rationale behind it, but maybe it's not TimelineCollectorContext. As I see a lot of arguments for the reader interface (as well as the writer one) and the potential signature change in future (e.g, adding newApp in this patch), I start to think of grouping the primitive arguments, shielding them in some category object, such as EntityContext, EntityFilters, Opts and so on, and using these as the arguments of the interface instead. Therefore, if we want to add newApp here, we don't really need to change the method signature, but add a getter/setter in Opts. Please let me know how you think about the idea. I can file another jira to deal with the issue. bq. We're now performing filters by ourselves in memory. I'm wondering if it will be more efficient to translate some of our filter specifications into HBase filters? That sounds a good idea, which should potentially improve the read performance. Let me do some investigation how to map our filter into HBase filter and push it to the backend. Given it may be a non-trivial work, can we get this patch in and follow up the filter change in another jira just in case? bq. Add a specific test in TestHBaseTimelineWriterImpl for App2FlowTable? In fact, it has been tested. I change the write path by letting newApp = true, and check if we can query the entity successfully without giving the flow/flowRun explicitly. However, I didn't do much assertion around the fields of retrieved entities, because I consider of deferring this work together with rewriting the whole HBase backend unit test. The current tests are too preliminary to capture the potential bugs around DB operations. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646893#comment-14646893 ] Li Lu commented on YARN-3049: - Hi [~zjshen], some of my comments: - The addition on {{newApp}} is to indicate if we need if we need to update the app2flow index table. This change is an interface change and it's slightly more than I thought. However, I still incline to proceed the changes in this JIRA so that we can speed up consolidating our POC patches. - FileSystemTimelineReaderImpl, in {{fillFields}}, maybe we can use EnumSet.allOf() to generate the universe of fields so that we can reuse the logic of the following for loop for Field.ALL? - Reader interface: use TimelineCollectorContext to package reader arguments? - HBaseTimelineReaderImpl: l.160 (all line numbers are after patch) {code} byte[] row = result.getRow(); {code} unused? l.213 name of private method {{getEntity}}: I think we may want to distinguish that with the external {{getEntity}} API. How about parseEntity or getEntitiFromResult? We're now performing filters by ourselves in memory. I'm wondering if it will be more efficient to translate some of our filter specifications into HBase filters? l.113, 136, 142: I'm a little bit worry about the {{0L}}s. Shall we have something like DEFAULT_TIME to make the argument list more readable? I assume the problem raised in l.369 ("if the event come with no info, it will be missed") will be addressed after YARN-3984? - HBaseTimelineWriterImpl: l.121-122: The log information is unclear about the write happened onto the App2Flow table? Also, we may want to keep this message in debug level? - TimelineSchemaCreator: Why we are not adding {{a2f}} as an option, similar to what we did in l.94-102 for {{e}} and {{m}}? - App2FlowColumn: l.51, {{private}} appears to be redundant in enums. Similarly in l.42 or App2FlowColumnFamily. nits: - Name of App2FlowTable, AppToFlowTable? Saving one character every time is not quite helpful... - l. 248, 263, 336: I'm confused by the name readConnections... - Add a specific test in TestHBaseTimelineWriterImpl for App2FlowTable? > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646465#comment-14646465 ] Li Lu commented on YARN-3049: - Thanks [~zjshen]! For now I think it's fine to include the changes on app2flow table. I'll take a look at your latest patch. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646339#comment-14646339 ] Zhijie Shen commented on YARN-3049: --- TestApplicationPriority.testApplicationPriorityAllocation seems to have a race condition issue. I cannot reproduce it locally both on trunk or with on YARN-2928 with this patch. Anyway, it seems not to be related to this jira. Will file a separate Jira to track the test failure. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645444#comment-14645444 ] Hadoop QA commented on YARN-3049: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 5s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 8m 2s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 10m 10s | The applied patch generated 5 additional warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 44s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 11s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 50s | The patch appears to introduce 7 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 53m 13s | Tests failed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 1m 26s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 99m 47s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747693/YARN-3049-YARN-2928.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / df0ec47 | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/8701/artifact/patchprocess/diffJavadocWarnings.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8701/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8701/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8701/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8701/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8701/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8701/console | This message was automatically generated. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645196#comment-14645196 ] Li Lu commented on YARN-3049: - Given the progress on YARN-3949, shall we focus back onto this JIRA now? IIUC we can also build offline readers on top of this JIRA. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634352#comment-14634352 ] Zhijie Shen commented on YARN-3049: --- [~sjlee0], yeah, for POC purpose, I temporally do flush upon each put. I suspect it will significantly impact the write performance. We may need to sync on this issue > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634326#comment-14634326 ] Sangjin Lee commented on YARN-3049: --- I do see that you're adding a call to {{BufferedMutator.flush()}} here, as well as part of the fix that went into YARN-3908 (writing to the the event column prefix as opposed to the incorrect metric metric column prefix). I'll go over WIP patch v.2 soon... > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630867#comment-14630867 ] Varun Saxena commented on YARN-3049: [~zjshen], should cluster ID be mandatory in REST URL ? We can assume it to be belonging to same cluster as where this timeline reader is running and take it from config, if its not supplied by client. Thats how I did it in YARN-3814. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627349#comment-14627349 ] Li Lu commented on YARN-3049: - Hi [~zjshen], I have a concern similar to [~sjlee0]'s, on reading timeline metrics: {code} + // Simply assume that if the value set contains more than 1 elements, the + // metric is a TIME_SERIES metric, otherwise, it's a TIME_SERIES metric + metric.setType(metricResult.getValue().size() > 1 ? + TimelineMetric.Type.TIME_SERIES : TimelineMetric.Type.TIME_SERIES); {code} I thought you meant to say, if the size of valueSet is greater than one, set type to TIME_SERIES, or else, set it to SINGLE_DATA? Or else we cannot read any SINGLE_DATA out... > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627080#comment-14627080 ] Sangjin Lee commented on YARN-3049: --- Thanks [~zjshen] for your WIP patch! I skimmed through it, and I generally agree with the approach you're taking in this patch. Some early comments and thoughts: - Later we could work on the filtering code to make it more expressive, etc. I see you have defined a number of {{match*()}} methods, and that's a good start in that direction. - {{lookupFlowContext()}}: I suspect we might want to cache the flow context for better performance. Ideally it would need to be limited by size (LRU). - Maybe a nit, but instead of setting something and clearing it later on if it is not supposed to be retrieved, how about setting it only if it is supposed to be retrieved? I'm talking about code that fetches contents such as relatesTo, info, config, events, ... - {{getEntities()}}: just break instead of pollLast()? - {{readMetrics()}}: SINGLE_VALUE v. TIME_SERIES? > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619347#comment-14619347 ] Zhijie Shen commented on YARN-3049: --- Updated the title accordingly to describe the scope of this jira more accurately. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)