[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15369808#comment-15369808 ] Hudson commented on YARN-3411: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10074 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10074/]) YARN-3411. [Storage implementation] explore the native HBase write (sjlee: rev 5a4278ccbd22b50ea1e80d28c3eea1777ffc2875) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/Range.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/EntityColumnDetails.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/HBaseTimelineWriterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/TestHBaseTimelineWriterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/EntityColumnFamily.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/TimelineWriterUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/TimelineSchemaCreator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/TimelineEntitySchemaConstants.java > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Fix For: YARN-2928 > > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571607#comment-14571607 ] Vrushali C commented on YARN-3411: -- After evaluating both approaches of backend storage implementations in terms of their performance, scalability, usability, maintenance as given by YARN-3134 (Phoenix based HBase schema) and YARN-3411 (hybrid HBase schema - vanilla HBase tables in the direct write path and phoenix based tables for reporting), conclusion is to use vanilla hbase tables in the direct write path. Attached to YARN-2928 is a write-up that describes how we ended up choosing the approach of writing to vanilla HBase tables (YARN-3411) in the direct write path. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Fix For: YARN-2928 > > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561741#comment-14561741 ] Varun Saxena commented on YARN-3411: [~zjshen], I was actually talking about store insertion time and not the entity start time. If you look at {{LevelDbTimelineStore#checkStartTimeInDb}}, you would find that there is a store insert time(which is taken as current system time) also added in addition to entity start time. Pls note that store insert time and entity start time are not same. In ATSv1, we could specify a timestamp in query which is used to ignore entities that were inserted into the store after it. This is done by matching against the store insert time(which is not same as entity start time). So for backward compatibility sake, do we need to support it ? If yes, I dont see it being captured as part of writer implementations, as of now. If there is no use case for it though, we can drop it in ATSv2. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Fix For: YARN-2928 > > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561739#comment-14561739 ] Varun Saxena commented on YARN-3411: [~zjshen], I was actually talking about store insertion time and not the entity start time. If you look at {{LevelDbTimelineStore#checkStartTimeInDb}}, you would find that there is a store insert time(which is taken as current system time) also added in addition to entity start time. Pls note that store insert time and entity start time are not same. In ATSv1, we could specify a timestamp in query which is used to ignore entities that were inserted into the store after it. This is done by matching against the store insert time(which is not same as entity start time). So for backward compatibility sake, do we need to support it ? If yes, I dont see it being captured as part of writer implementations, as of now. If there is no use case for it though, we can drop it in ATSv2. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Fix For: YARN-2928 > > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561682#comment-14561682 ] Zhijie Shen commented on YARN-3411: --- Yeah, in v1, there's a starttime for entity, which is used to indicate when the entity starts to exist. This value is used in multiple places. For example, when we query entities, the matched entities are sorted according to timestamp to be returned. Also, in v1 the retention granularity is at entity level. We check if the starttime of an entity is out of TTL, and then decide to discard it and its events. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Fix For: YARN-2928 > > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561659#comment-14561659 ] Varun Saxena commented on YARN-3411: In ATSv1, we consider the timestamp when entity is added to backend store in addition to entity creation time. This is used while filtering out entities during querying. I cannot see this being captured specifically in this patch. It can be easily added to Column Family info. [~zjshen], [~sjlee0], do we need to add this info ? Zhijie, for this, any specific use case you know of in ATSv1 ? > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Fix For: YARN-2928 > > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555133#comment-14555133 ] Junping Du commented on YARN-3411: -- Excellent work! Thanks [~vrushalic], [~sjlee0] and all for the contribution! > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555056#comment-14555056 ] Sangjin Lee commented on YARN-3411: --- Just committed the patch. Thanks much [~vrushalic] for working on the patch, and everyone for very helpful reviews! > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554968#comment-14554968 ] Sangjin Lee commented on YARN-3411: --- Any other review feedback? > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554959#comment-14554959 ] Vrushali C commented on YARN-3411: -- Thanks [~djp], filed YARN-3699 for the flow version discussion. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554877#comment-14554877 ] Junping Du commented on YARN-3411: -- Address 2nd one in YARN-3696 also sounds good. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554868#comment-14554868 ] Vrushali C commented on YARN-3411: -- Sounds good, done > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554855#comment-14554855 ] Junping Du commented on YARN-3411: -- Thanks [~jrottinghuis] for comments! bq. I think it is reasonable that two implementations can differ in their backing schema as long as they both can write the data and retrieve the data with the same key information. Phoenix may need to add somethings to the rowkey in order to work properly, it may have to add some things, and ditto for the raw HBase implementation, some additional secondary lookups may be needed etc. That is part of the performance comparison to see. I would agree with this in overall: in long term, each implementation should get optimized to the best which could have different strategies on schema design. But in short term performance test plan, the ideal case is two schema of implementations should get as much closed as it can so the result can get rid of noises caused by schema different. However, I agree that the current schema different shouldn't hint too much performance different. It should be fine if we can tolerant these noises. bq. Junping Du with respect to adding the flow version in the key, I think the problem with that is that you now require the caller to know what the version is in order to query back. I don't think that is a natural requirement. I know that I ran the "ComputeUniqueUsers" flow on the cluster, so I have user cluster and flowname, but I don't need to know the version to just query the last few runs right? If you do have the version (for reducer estimation and you want the last runs of the same flow back) then it should be possible to query by flow and by version, but I don't think it should be mandatory. Therefore I don't think that flow version must perse be a rowkey in all implementations. Thanks for sharing the use case here. The case for query against cluster and flowname is pretty solid. However, the query against specific flow version also reasonable as user may have interest to understand the differences between different versions of flow on execution time, resource consumption, task failures/exceptions, etc? In addition, I have a quick question here: does making flow_version a key means it is mandatory in query? Just like we have flow_run as key, but we still treat it optional in query (or search). Isn't it? bq. I think we'll find that with certain schema choices some things will be more performant while others will be somewhat slower. That's true. We should keep schema design discussion open even after the patch here get in. I would give existing patch a +1 with: 1. open a JIRA for discussion on decision of flow version to key or column, then either Phoenix or HBase implementation could update with conclusion we made. 2. open a cleanup JIRA with addressing minor issues mentioned above. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554820#comment-14554820 ] Sangjin Lee commented on YARN-3411: --- [~vrushalic], how about making YARN-3696 a little broader to address these clean-up items, such as removing the extra constructor and other minor review feedback? > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554803#comment-14554803 ] Junping Du commented on YARN-3411: -- Sounds like a plan. I think we need a separated JIRA to follow up with clean. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554794#comment-14554794 ] Junping Du commented on YARN-3411: -- bq. While I was discussing the patches with Joep Rottinghuis, we thought it might be better to call it something more generic so that going forward if we need to add more columns in other column families, we can reuse some of this structure. Ok. I didn't aware of this discussion before. We can leave a TODO later either to make it more generic or sounds more precisely. About toLowerCase() which is a pretty trivial comment, feel free to address it or not. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554787#comment-14554787 ] Junping Du commented on YARN-3411: -- I see. Thanks for sharing this background. I agree that it shouldn't too much different from the performance prospective, so we can decide it later. Prefer to open a separated JIRA to discuss on this as a follow up as we either need to change for HBase side or Phoenix side. What do you think? > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554786#comment-14554786 ] Junping Du commented on YARN-3411: -- I see. Thanks for sharing this background. I agree that it shouldn't too much different from the performance prospective, so we can decide it later. Prefer to open a separated JIRA to discuss on this as a follow up as we either need to change for HBase side or Phoenix side. What do you think? > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554690#comment-14554690 ] Li Lu commented on YARN-3411: - Sorry about the delayed reply. Firstly, I thought flowVersion was a part of the ID of a timeline entity. If this is wrong, then we may probably change the Phoenix implementation rather than fix it in HBase. I can make this quick fix. Secondly, I'm not sure the impact of this on the performance evaluation planned tomorrow. In the test I think we mainly care about the raw writing performance to both systems. If we're sending the same amount of data and the flowVersion fields are all the same, then I expect the impact of this schema difference will not be the dominant factor. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554606#comment-14554606 ] Joep Rottinghuis commented on YARN-3411: With respect to discussion on auto-creating tables I strongly agree with Junping that schema creation should be a separate and discrete step. I do think that this should be as simple to do as possible, but auto-creating tables and simply creating them on the fly if they don't exist might seem like a good thing to do to avoid initial friction, but will lead to very surprising results if any user ever has a problem with the classpath setup and/or configurations rolled to a cluster. We operate a dozen or so production, ad-hoc and test clusters, some specific with only HBase, others without. The odds of passing a wrong config, or getting a config mixed up is significant. With auto-creation one could simply connect to the wrong HBase instance and then data would start flowing to the wrong cluster. I think it'd be better to see explicit failures in that case. The error message should probably be crystal clear and read something like: ATS (class so-and-so) is trying to write to HBase cluster (so-and-so) and is missing required table (so-and-so). In an earlier comment I suggested to have a config key to have a prefix for all table names so that people can easily switch all tables from one schema (test, experimentation) to another (staging, prod). > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554599#comment-14554599 ] Sangjin Lee commented on YARN-3411: --- This constructor ({{HBaseTimelineWriterImpl(Configuration)}}) will not be used by non-test code as it is instantiated via the no-arg constructor. I think we should simply remove this. I'm fine with cleaning this up after the performance test. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554595#comment-14554595 ] Joep Rottinghuis commented on YARN-3411: I think it is reasonable that two implementations can differ in their backing schema as long as they both can write the data and retrieve the data with the same key information. Phoenix may need to add somethings to the rowkey in order to work properly, it may have to add some things, and ditto for the raw HBase implementation, some additional secondary lookups may be needed etc. That is part of the performance comparison to see. [~djp] with respect to adding the flow version in the key, I think the problem with that is that you now require the caller to know what the version is in order to query back. I don't think that is a natural requirement. I know that I ran the "ComputeUniqueUsers" flow on the cluster, so I have user cluster and flowname, but I don't need to know the version to just query the last few runs right? If you do have the version (for reducer estimation and you want the last runs of the same flow back) then it should be possible to query by flow _and_ by version, but I don't think it should be mandatory. Therefore I don't think that flow version must perse be a rowkey in all implementations. I think we'll find that with certain schema choices some things will be more performant while others will be somewhat slower. It will be a mater of finding those schema choices that will give good enough write performance to handle scale and give good read performance for the most common use cases, while maintaining reasonable performance for other queries. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554586#comment-14554586 ] Sangjin Lee commented on YARN-3411: --- [~djp], I understand your concern on this. That said, as Vrushali mentioned, the schemas are sufficiently different between phoenix and hbase so that I doubt that the presence of the flow version in the row key will make a significant difference. Also, in the specific test we will run, the flow version will be trivial ("1"). For background, I chimed in on a few JIRAs that the flow version does not need to be part of the row key (only the flow id and the run id should). I cannot find those comments easily, but IMO the flow version does not constitute a primary key. Rather, it is an attribute of the flow, and can be stored off the primary key. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554544#comment-14554544 ] Vrushali C commented on YARN-3411: -- Yes, I in fact filed a jira https://issues.apache.org/jira/browse/YARN-3696 > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554532#comment-14554532 ] Vrushali C commented on YARN-3411: -- HI [~djp] Thanks, appreciate your review very much! bq. Given all columns here belongs to INFO C_F, we can omit the parameter of EntityColumnFamily.INFO. Also, rename EntityColumnDetails to EntityInfoColumnFamilyDetails could sound more clear. I actually had it named as EntityInfoColumnDetails in the previous patches. While I was discussing the patches with [~jrottinghuis], we thought it might be better to call it something more generic so that going forward if we need to add more columns in other column families, we can reuse some of this structure. Also, [~jrottinghuis] had some really good ideas on how to make things very generic across tables but in the interest of time, this was not done in this patch. bq. This is a private constructor and all callers are under control to make sure value is already lower case. So toLowerCase() is not necessary (the same for EntityColumnFamily). Yes, the values being initialized right now are lower case, but going forward, if some adds in an uppercase value, or a sentence case, we may not realize it. While storing in hbase, we need to careful about the column names, since we need to query for that exact name, hence the explicit lower casing in the constructor. bq. Do we try to add a new configuration "yarn.application.id" here? It doesn't sounds right and HBaseTimelineWriterImpl.class.getName() doesn't sounds like a default value for this configuration. Am I missing anything? So, I didn't have a good string to pass in, so I picked a value. But I am open to initializing it to something more appropriate. Is there a general recommendation around this... I can change it to that. thanks Vrushali > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554521#comment-14554521 ] Vrushali C commented on YARN-3411: -- Hi [~djp] I understand your point. The schema between the two approaches is quite different, not just from the row key perspective. [~gtCarrera9] I am wondering why is flow version part of the row key? Does it need to be or it can be a column in Phoenix? thanks Vrushali > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554268#comment-14554268 ] Junping Du commented on YARN-3411: -- Beside the major comment above, some minor comments over the latest (007) patch - some of them are NITs (optional to address here or a follow up JIRA): {code} public HBaseTimelineWriterImpl(Configuration conf) throws IOException { super(conf.get("yarn.application.id", HBaseTimelineWriterImpl.class.getName())); } {code} I don't quite understand this. Do we try to add a new configuration "yarn.application.id" here? It doesn't sounds right and HBaseTimelineWriterImpl.class.getName() doesn't sounds like a default value for this configuration. Am I missing anything? {code} +enum EntityColumnDetails { + ID(EntityColumnFamily.INFO, "id"), + TYPE(EntityColumnFamily.INFO, "type"), + CREATED_TIME(EntityColumnFamily.INFO, "created_time"), + MODIFIED_TIME(EntityColumnFamily.INFO, "modified_time"), + FLOW_VERSION(EntityColumnFamily.INFO, "flow_version"), + PREFIX_IS_RELATED_TO(EntityColumnFamily.INFO, "r"), + PREFIX_RELATES_TO(EntityColumnFamily.INFO, "s"), + PREFIX_EVENTS(EntityColumnFamily.INFO, "e"); {code} Given all columns here belongs to INFO C_F, we can omit the parameter of EntityColumnFamily.INFO. Also, rename EntityColumnDetails to EntityInfoColumnFamilyDetails could sound more clear. {code} + private EntityColumnDetails(EntityColumnFamily columnFamily, + String value) { +this.columnFamily = columnFamily; +this.value = value; +this.inBytes = Bytes.toBytes(this.value.toLowerCase()); + } {code} This is a private constructor and all callers are under control to make sure value is already lower case. So toLowerCase() is not necessary (the same for EntityColumnFamily). Other looks fine. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554142#comment-14554142 ] Junping Du commented on YARN-3411: -- Hi [~vrushalic], I had the same concern with [~zjshen]'s previous comments that we are putting flow_version in column instead of keys which is different with current schema for Phoenix patch. Putting a field as key or column could have different performance hint for writing as only row key involve indexing but column is not (we don't have secondary index for now). To make our performance test/comparison as apple-to-apple, may be we should keep consistent between the two schema? > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554076#comment-14554076 ] Junping Du commented on YARN-3411: -- Automation is a good point. However, the other point is about wrong operation. The significant ops like creation/drop the table could be better to left user to handle and decisions. Just like HDFS, user need to do format manually before starting the namenode for the first time. It could be better to follow the same practice with providing commondline for admin user to format (initialize) table. Thoughts? > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554063#comment-14554063 ] Junping Du commented on YARN-3411: -- Sure. The plan sounds good to me. Just make sure we keep our eyes on it. Add a TODO to the patch could be better. :) > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553369#comment-14553369 ] Li Lu commented on YARN-3411: - I looked at the latest patch and think it's in a good shape for performance benchmarks. I've also ran it with a local single node hbase cluster, and it worked fine with our performance benchmark application as well as the PI sample application. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553370#comment-14553370 ] Sangjin Lee commented on YARN-3411: --- The latest patch LGTM. I'm fine with having a follow-up JIRA to address Junping's comment (and other minor issues if any). Once everyone chimes in and gives it +1, I'd be happy to commit this patch. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553352#comment-14553352 ] Vrushali C commented on YARN-3411: -- bq. Or we cannot diff value with null and real 0. Hi Junping, So, currently, you are right, we can't differentiate between nulls and real 0. In hRaven, we use 0 in case of nulls for long or int values. But for things like timestamps, we need stricter checks. After the performance test, I will file a jira to ensure we handle this more carefully and return null (Long object) in case it's actually null. Hope that is fine. thanks Vrushali > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553353#comment-14553353 ] Vrushali C commented on YARN-3411: -- bq. Or we cannot diff value with null and real 0. Hi Junping, So, currently, you are right, we can't differentiate between nulls and real 0. In hRaven, we use 0 in case of nulls for long or int values. But for things like timestamps, we need stricter checks. After the performance test, I will file a jira to ensure we handle this more carefully and return null (Long object) in case it's actually null. Hope that is fine. thanks Vrushali > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553302#comment-14553302 ] Hadoop QA commented on YARN-3411: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 4s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 45s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 15s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 38s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 14s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 37m 30s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734244/YARN-3411-YARN-2928.007.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 463e070 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8033/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8033/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8033/console | This message was automatically generated. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553186#comment-14553186 ] Li Lu commented on YARN-3411: - Hi [~vrushalic], sure, don't worry about the test code clean up for now. I'll try it locally. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551251#comment-14551251 ] Vrushali C commented on YARN-3411: -- Thanks [~gtCarrera9]! Will make the access modifier change and look into the TestTimelineWriterImpl point. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551238#comment-14551238 ] Li Lu commented on YARN-3411: - Hi [~vrushalic], thanks for working on this! I just got back and looked at the latest (v6) patch. In general I think it's quite close. I just have a few quick comments. # I notice you made {{EntityColumnFamilyDetails}} and {{EntityInfoColumnDetails}} to be public. Even though we're adding @Private tags to them, we may still want to use the strictest access modifier possible. If they're fine to be visible to package only, maybe we can use default, rather than public for them? (I tried with my local version and seems like they worked fine, but not sure if I missed anything. ) # I share the same concern as [~zjshen] on schema creation. Having a separate step to set up the HBase data schema appears to add some workload to Hadoop deployments. So, although it's totally fine to have this step for our performance benchmark, it will be great to keep this in mind when we consolidate the storage part. # Maybe it's helpful to move the entity creation steps of TestHBaseTimelineWriterImpl to TestTimelineWriterImpl? In this way we can improve test coverage for both writer implementations after this work. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551018#comment-14551018 ] Zhijie Shen commented on YARN-3411: --- bq. A flow can be uniquely identified with the flow name and run id (and of course cluster and user id). I think in Phoenix impl, we have treated version as part of identifier of a unique flow. bq. Hmm, so schema creation happens more or less once in the lifetime of the hbase cluster like during cluster setup (or perhaps if we decide to drop and recreate it, which is rare in production). I believe writers will come to life and cease to exist with each yarn application lifecycle but cluster is more or less eternal, so adding to this step to the lifecycle of a Writer Impl object seems somewhat out of place to me. Fair point. And this is another place different from Phoenix impl, which creates table if they don't exist. My perspective is more about automation, and it's better to leave fewer steps for users to setup the service. Perhaps we can find somewhere else to invoke the table initialization once if the service is setup for YARN cluster, and HBase/Phoenix is used as the backend. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550858#comment-14550858 ] Vrushali C commented on YARN-3411: -- Thanks [~djp] ! I will make these changes.. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550733#comment-14550733 ] Junping Du commented on YARN-3411: -- bq. Regarding the public/private annotations, I often got beat it to the head that in principle the private annotation is opt-in; i.e. if there is no visibility annotation it's implicitly assumed to be up for use. I've gotten review comments that said we should mark them explicitly as private even if they are clearly YARN-internal classes. That's just my experience on this. In my understanding, it depends on if this class is within api package that under hadoop-yarn-api or hadoop-yarn-common modules. If so, we may need to explicitly mark it as Private if we don't like to share it outside of hadoop projects. For other classes (like case here for TimelineWriterUtils which is in server side), we don't have to mark any annotation. [~vinodkv], can you confirm this? > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550717#comment-14550717 ] Karthik Kambatla commented on YARN-3411: By classes I meant outer classes. Inner classes and other members inherit the annotation of the outer class. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550714#comment-14550714 ] Karthik Kambatla commented on YARN-3411: My recommendation regarding visibility annotations is to always specify them irrespective of whether they are Private or Public. My rationale - at the time of implementing something, it is good to actively think about intended usage. That said, our compat guidelines explicitly say that classes not annotated are implicitly Private. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550688#comment-14550688 ] Junping Du commented on YARN-3411: -- bq. I was thinking it is not necessary since the entity information would come in a more streaming fashion, one update at a time anyways. If say, one column is written and other is not, the callee can retry again, hbase put will simply over-write existing value. It sounds reasonable. Let's keep it so far and check if necessary to change for some corner cases (like we could update event and metrics at the same time for some job's final counter) in future. Thanks for addressing my previous comments. Latest patch looks much closer! Some additional comments (besides Zhijie's comments above) on latest (006) patch: In TimelineCollectorManager.java, {code} + @Override + protected void serviceStop() throws Exception { +super.serviceStop(); +if (writer != null) { + writer.close(); +} + } {code} We should stop running collectors before stopping the shared writer. Also, put super.serviceStop(); to the last one to stop as conforming the common practice. In EntityColumnFamilyDetails.java, {code} + * @param rowKey + * @param entityTable + * @param inputValue + * @throws IOException + */ + public void store(byte[] rowKey, BufferedMutator entityTable, String key, + String inputValue) throws IOException { {code} Lacking a param of key in javadoc. In HBaseTimelineWriterImpl.java, For write() which is synchronized semantics so far (we haven't implemented async yet), we put each kind of entity to the table by calling entityTable.mutate(...) which cache these entities locally and flush it later under some conditions. Do we need to call entityTable.flush() for semantics of strict synchronized writing? If not, at least, we should flush it at serviceStop() as close() directly could lose some cached data. {code} @Override protected void serviceStop() throws Exception { super.serviceStop(); if (entityTable != null) { LOG.info("closing entity table"); entityTable.close(); } if (conn != null) { LOG.info("closing the hbase Connection"); conn.close(); } } {code} Also, as comments above, put super.serviceStop() as the last one to stop. In Range.java, {code} +@InterfaceAudience.Private +@InterfaceStability.Unstable +public class Range { {code} For class marked as private, we don't necessary to put InterfaceStability annotation there. In TimelineWriterUtils.java, {code} +for (byte[] comp : components) { + finalSize += comp.length; +} {code} Null check for comp. {code} + System.arraycopy(components[i], 0, buf, offset, components[i].length); + offset += components[i].length; {code} Null check for components[i]. {code} + * @param source + * @param separator + * @return byte[][] after splitting the input source + */ + public static byte[][] split(byte[] source, byte[] separator, int limit) { {code} Missing param of limit in javadoc. It sounds we didn't do any null check for separator in splitRanges(), but we do null check in join(). It should keep consistent at least in one class. {code} + public static long getValueAsLong(final byte[] key, + final Map taskValues) throws IOException { +byte[] value = taskValues.get(key); +if (value != null) { + Number val = (Number) GenericObjectMapper.read(value); + return val.longValue(); +} else { + return 0L; +} + } {code} Shall we use Long instead of long? Or we cannot diff value with null and real 0. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usab
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550686#comment-14550686 ] Sangjin Lee commented on YARN-3411: --- Regarding the public/private annotations, I often got beat it to the head that in principle the private annotation is opt-in; i.e. if there is no visibility annotation it's implicitly assumed to be up for use. I've gotten review comments that said we should mark them explicitly as private even if they are clearly YARN-internal classes. That's just my experience on this. What is the official recommendation on this? cc [~vinodkv], [~kasha] > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550660#comment-14550660 ] Vrushali C commented on YARN-3411: -- Hi [~zjshen] Thanks for the review! bq I saw in HBase implementation flow version is not included as part of row key. This is a bit different from primary key design of Phoenix implementation. Would you mind elaborating your rationale a bit? Yes, I think the flow version need not be part of the primary key. A flow can be uniquely identified with the flow name and run id (and of course cluster and user id). Given a run id, we can determine the version. For production jobs, the version does not change, so we would be repeating it across runs. I haven’t looked into the Phoenix schema to understand why it is needed on the Phoenix side. cc [~gtCarrera9] bq Shall we make the constants in TimelineEntitySchemaConstants follow Hadoop convention? We can keep them in this class now. Once we decide to move on with HBase impl, we should move (some of) them into YarnConfiguration as API. Yes, I did not add them to YarnConfiguration as API since I figured it may be cleaner to keep this code contained within timelineservice.storage to help remove it if needed. But will rename them as per Hadoop convention. bq. In fact, you can leave these classes not annotated. I see, I had added the annotations for these classes after some of the review suggestions, I think from @sjlee. bq. According to TimelineSchemaCreator, we need to run command line to create the table when we setup the backend, right? Can we include creating the table into the lifecycle of HBaseTimelineWriterImpl? Hmm, so schema creation happens more or less once in the lifetime of the hbase cluster like during cluster setup (or perhaps if we decide to drop and recreate it, which is rare in production). I believe writers will come to life and cease to exist with each yarn application lifecycle but cluster is more or less eternal, so adding to this step to the lifecycle of a Writer Impl object seems somewhat out of place to me. Appreciate the review! thanks Vrushali > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549848#comment-14549848 ] Zhijie Shen commented on YARN-3411: --- [~vrushalic], thanks for working on the patch. Some comment from my side: 1. I saw in HBase implementation flow version is not included as part of row key. This is a bit different from primary key design of Phoenix implementation. Would you mind elaborating your rationale a bit? 2. Shall we make the constants in TimelineEntitySchemaConstants follow Hadoop convention? We can keep them in this class now. Once we decide to move on with HBase impl, we should move (some of) them into YarnConfiguration as API. 3. I saw the classes are marked \@Public, but they're the backend classes and not accessible by the user directly. In fact, you can leave these classes not annotated. 4. According to TimelineSchemaCreator, we need to run command line to create the table when we setup the backend, right? Can we include creating the table into the lifecycle of HBaseTimelineWriterImpl? > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549776#comment-14549776 ] Sangjin Lee commented on YARN-3411: --- Or, better: {code} Configuration hbaseConf = HBaseConfiguration.create(conf); {code} > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549725#comment-14549725 ] Hadoop QA commented on YARN-3411: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 15s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 58s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 16s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 38s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 15s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 38m 2s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733693/YARN-3411-YARN-2928.006.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 463e070 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7990/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7990/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7990/console | This message was automatically generated. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549692#comment-14549692 ] Sangjin Lee commented on YARN-3411: --- It seems like the line that adds the hbase configuration was removed from HBaseTimelineWriterImpl. Is that intentional? How would it be able to use and load hbase configuration then? Come to think of it, I think we may need to add both hbase-site.xml and hbase-default.xml? {code} conf.addResource("hbase-default.xml"); conf.addResource("hbase-site.xml"); {code} > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549664#comment-14549664 ] Hadoop QA commented on YARN-3411: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 57s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 9m 40s | The applied patch generated 3 additional warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 15s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 37s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 7m 49s | Tests failed in hadoop-yarn-server-timelineservice. | | | | 43m 43s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl | | | hadoop.yarn.server.timelineservice.storage.TestPhoenixTimelineWriterImpl | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733677/YARN-3411-YARN-2928.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 463e070 | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/7989/artifact/patchprocess/diffJavadocWarnings.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7989/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7989/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7989/console | This message was automatically generated. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549631#comment-14549631 ] Vrushali C commented on YARN-3411: -- Updating the code now to fix the javadoc warning and [~sjlee0]'s review suggestion. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549601#comment-14549601 ] Hadoop QA commented on YARN-3411: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 2s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 9m 46s | The applied patch generated 3 additional warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 19s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 38s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 14s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 37m 30s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733677/YARN-3411-YARN-2928.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 463e070 | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/7986/artifact/patchprocess/diffJavadocWarnings.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7986/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7986/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7986/console | This message was automatically generated. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549563#comment-14549563 ] Sangjin Lee commented on YARN-3411: --- Thanks for updating the patch [~vrushalic]! A couple of quick comments: (TimelineCollectorManager.java) - It's a good catch! While we're at it, could we also add an override for serviceStart() where it calls writer.start()? (HBaseTimelineWriterImpl.java) - l.71: We need to remove this constructor since the no-arg constructor is the real constructor. Keeping this would be error-prone. For example, the addResource("hbase-site.xml") call is not being done via the no-arg constructor route. We should move any operations that require the configuration to the serviceInit() method. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548383#comment-14548383 ] Vrushali C commented on YARN-3411: -- The patch has an overall -1 due to a couple of javadoc warnings. Will fix those in the next patch. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548379#comment-14548379 ] Vrushali C commented on YARN-3411: -- Hi [~djp] Thanks for the initial quick feedback! Some responses below: bq. Do we need to make sure data in each row get updated consistently? I was thinking it is not necessary since the entity information would come in a more streaming fashion, one update at a time anyways. If say, one column is written and other is not, the callee can retry again, hbase put will simply over-write existing value. bq. We shouldn't swallow exception in updating data to HBase, just log.error() may not be enough. Okay, let me look through and modify that. bq. We need to check null in writing TimelineEntity to HBase, as TimelineEntity could include null events/configurations/metrics, that could make foreach later throw NPE exception I have added some null checks, I will go over the code again and update it to ensure I have null checks for entity class members like configurations, metrics etc. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548335#comment-14548335 ] Junping Du commented on YARN-3411: -- bq. No, we will never drop the last value. MIN_VERSIONS and TTL are set such that last value is always retained. I am setting the MAX_VERSIONS for now to 200, but we can revisit this when we determine how exactly the timeseries data is going to be handled. And of course it can be made configurable. I meant the earliest (oldest) value not the latest. Agree that we can revisit the value later for other cases that I mentioned above, but just want to double check we don't have other options, i.e. making time series data as different rows or columns rather than different timestamps/versions here. bq. Wondering why we would aggregate data in one timeseries for one metric over time? That's because the interested interval (present to enduser) is not always the same interval for gathering timeline metrics data. Let's saying we received container metrics data from NodeManager every second, but the aggregated data user interested is per minutes, then we need to aggregate 60 seconds data for one single metrics. Make sense? Thanks for updating the patch. Just quickly check latest patch, a few comments so far: 1. Sounds like we don't leverage single row transaction of HBase feature, as we are updating different column families (events, configurations, metrics, etc.) separately. Do we need to make sure data in each row get updated consistently? 2. We shouldn't swallow exception in updating data to HBase, just log.error() may not be enough. 3. We need to check null in writing TimelineEntity to HBase, as TimelineEntity could include null events/configurations/metrics, that could make foreach later throw NPE exception. Comments with more details could come later. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547959#comment-14547959 ] Hadoop QA commented on YARN-3411: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 23s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 0s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 10m 1s | The applied patch generated 2 additional warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 15s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 42s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 38s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 1m 16s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 38m 26s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733531/YARN-3411-YARN-2928.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 463e070 | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/7966/artifact/patchprocess/diffJavadocWarnings.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7966/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7966/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7966/console | This message was automatically generated. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547730#comment-14547730 ] Hadoop QA commented on YARN-3411: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 5s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 3m 18s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733480/YARN-3411-YARN-2928.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 463e070 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7965/console | This message was automatically generated. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546176#comment-14546176 ] Vrushali C commented on YARN-3411: -- Hi [~jrottinghuis] Thanks for the review! My response below. bq. On class range, the start and end are final right? yes bq. Wouln't it be better to do this in somewhat more of a streaming manner? Agreed, modifying the code accordingly right now. Also we discussed today about checking out what the hbase batching/flushing policy and accordingly considering emitting puts as we see them. bq. More general question: for TimelineWriter and its implementations, is there an expectation set around concurrency? Good point, I have filed YARN-3650 to discuss that bq. it will be super-usefull to be able to have one single config key to prefix all tables. Agreed. I have filed YARN-3649 to work on that bq. getRowKeyPrefix should cleanse input arguments to strip EntityTableDetails.ROW_KEY_SEPARATOR_BYTES yes, updating the code now. bq. Overall, we should fill in the javadoc, describe inputs, explain what we expect etc. Yes, will update the documentation as much as I can along the lines of what we discussed today. bq. EntityColumnDetails probably needs a public static method that will take bytes, iterates over the enum and returns an enum from those bytes. Yes, I will add that in later if/when we decide to go forward with this method. thanks Vrushali > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546166#comment-14546166 ] Vrushali C commented on YARN-3411: -- Hi [~djp] Thanks for the review! bq. In our case, the metrics data could be updated much more frequently and we cannot drop the earliest data No, we will never drop the last value. MIN_VERSIONS and TTL are set such that last value is always retained. I am setting the MAX_VERSIONS for now to 200, but we can revisit this when we determine how exactly the timeseries data is going to be handled. And of course it can be made configurable. bq. It sounds like it may not be enough for cases, With hbase 1.x , we can set per cell level TTLs so we could go more or less depending on the application/ requirement. bq. Do we have an estimation for performance cost for aggregation metrics data over cell versions vs. rows? Not sure if I understand aggregation for metrics timeseries data. Wondering why we would aggregate data in one timeseries for one metric over time? But we can discuss more about timeseries later I think, based on [~gtCarrera9]'s reply below. thanks! Vrushali > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545857#comment-14545857 ] Li Lu commented on YARN-3411: - Hi [~djp]! I totally agree with your concerns. However, for now the Phoenix storage system still doesn't have the support to timeline series data. This said, maybe we can firstly focus on the storage of "single data", and then address the problems for timeline series in a separate JIRA? I'd appreciate your suggestions on this. Thanks! > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545788#comment-14545788 ] Junping Du commented on YARN-3411: -- Hi [~vrushalic] and all, there is one question I had for a long time: Looks like we are updating time series metrics data as versions of metrics cell instead of inserting of a row (with time serve as a column). Do we have an estimation for performance cost for aggregation metrics data over cell versions vs. rows? The original usecase for cell versions coming from BigTable is to store web pages which could only be limited number of versions (too stale versions could be dropped). In our case, the metrics data could be updated much more frequently and we cannot drop the earliest data. I see we are setting max version to be 200, in "cf3.setMaxVersions(EntityTableDetails.ENTITY_TABLE_METRICS_MAX_VERSIONS_DEFAULT);". It sounds like it may not be enough for cases, like: long running services, container reuse, etc., this even sounds challenge for normal container (assume appCollector doesn't cache metrics data locally, the interval default is: 1 sec). Thoughts? > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544756#comment-14544756 ] Vrushali C commented on YARN-3411: -- Thanks [~sjlee0] and [~jrottinghuis] for the feedback! I will make as many changes as possible for now. For some of Joep's questions/comments I might spin off a jira or two to work on those later. Also, I will respond to some of the points brought up after thinking over it for a while. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544692#comment-14544692 ] Joep Rottinghuis commented on YARN-3411: Ok, admitted that I have not yet read Sangjin's comments from just minutes ago. I also need to spend some more time to fully read and comprehend the key structure. Here is an initial set of comments that I have, please take them for that they are worth. I'll have to spend more time to dig into more details. Overall, we should fill in the javadoc, describe inputs, explain what we expect etc. EntityColumnDetails probably needs a public static method public static EntityColumnDetails createEntityColumnDetails(byte[] bytes) that will take bytes, iterates over the enum and returns an enum from those bytes. You'll need to decide if you want a default, or throw an exception if bytes does not map to any enum. EntityTableDetails.java getRowKeyPrefix should cleanse input arguments to strip EntityTableDetails.ROW_KEY_SEPARATOR_BYTES (at least the string representative of that, may be good to split that definition into two). The reverse method that takes a rowkey and splits into components should probably not reverse this. For example, if we use a ! for separator and translate that to underscore, do we translate all underscores back to bangs? In hRaven we did not, but stripping the separators is essential to avoid parsing errors during read. Instead of just using DEFAULT_ENTITY_TABLE_NAME or overriding the tablename itself, it will be super-usefull to be able to have one single config key to prefix all tables. This way you can easily run a staging, a test, a production and whatever setup in the same HBase instance / w/o having to override every single table in the config. One could simply overwrite the default prefix and you're off and running. For prefix I'm thinking "tst" "prod" "exp" etc. Once can then still override one tablename if needed, but managing one whole setup will be easier. For column prefixes, are you simply using a one character prefix, and the first character is always a prefix? Or do we need to do cleansing on the columns written to ensure that nobody else will write with this column prefix? Also, if there is a prefix, why not make this an enum? More general question: for TimelineWriter and its implementations, is there an expectation set around concurrency? Is any synchronization expected / needed to ensure visibility when calls happen from different threads? How about entities, are they expected to be immutable once passed to the write method? Similarly for the constructor, we're assuming that the configuration object will not be modified while we're constructing a TimelineWriter? For the write method in HBaseTimelineWriterImpl, we're collecting all the puts first, then we're creating a big list of entity puts, and then we're writing them out. For large entities (with large configurations) we end up creating two (if not three I'd have to read more carefully) copies of the entire structure. Wouln't it be better to do this in somewhat more of a streaming manner? For example, first create info puts, then stream those out to the entityTable, then drop the infoPuts Put, then move on to eventPut, configPut etc. Even method like getConfigPut could conceivable by constructed in such a way that we don't have to create one massive copy of the config in a single Put. Perhaps we can pass in the table and create smaller puts, perhaps even one per config key and stream those out. Ditto when iterating over the timeseries.keySet() in getMetricsPut. For serviceStop we should probably handle things a littlebit better than just calling close. Are we expecting to just close the HBase connection, still keep writing and just let the write exceptions shoot up to the top of the writing process? Should we occasionally ( as we're iterating) check the stop method and just bail out in a more elegant way? On class range, the start and end are final right? > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544659#comment-14544659 ] Sangjin Lee commented on YARN-3411: --- The latest patch looks pretty good. I think we're almost there. I went over it one more time bit more closely, and have mostly minor comments. Some are performance-related so you might want to consider them. Hopefully it won't be too difficult to address these... (EntityColumnDetails.java) - let's add Private and Unstable to this (EntityTableDetails.java) - let's add Private and Unstable to this - l.27: this also should be non-public? (HBaseTimelineWriterImpl.java) - l.88: (nit) you must be sick of me saying it, but let's declare row inside the for loop, and also use a more standard declaration (byte[] row = ...) - l.165: it might be a minor optimization, but we might want to reuse bytes for te.getType() and te.getId() for other calls that need them (e.g. getInfoPut) - l.234: we should get the bytes for the key *once* outside the inner for loop and reuse it - l.274: we should get the bytes for the id *once* outside the inner for loop and reuse it (Range.java) - let's add Private and Unstable to this (TimelineSchemaCreator.java) - l.137: I think we should use LOG.error() as we have the logger - l.144-147: I'm unsure why we're using the log4j logger when we're using the Apache Commons logger; is this for HBase debug logging? (TimelineWriterUtil.java) - l.53: do we need to handle the case where a component byte array is empty? - l.53: it's not critical but it could be good to handle the case where components.length = 1 (just return it) - l.181,197: what are "taskValues"? (TestHBaseTimelineWriterImpl.java) - l.56-60: the javadoc should be before the class name - l.56: (nit) remove PoC - l.62: (nit) it should be util (not upper case) - l.141: (nit) it should call init() (best practice) - l.147: (nit) should be long, not Long - l.212: (nit) it should call stop() (best practice) > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544612#comment-14544612 ] Hadoop QA commented on YARN-3411: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 13s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 45s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 14s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 37s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 1m 14s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 37m 32s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732954/YARN-3411-YARN-2928.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 463e070 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7942/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7942/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7942/console | This message was automatically generated. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544190#comment-14544190 ] Sangjin Lee commented on YARN-3411: --- Vrushali and I spoke offline on the findbugs issues and others. Here is some more feedback on the existing patch. I would try to limit the public surface area as good hygiene (making classes non-public by default unless/until necessary). It applies to EntityColumnDetails, EntityTableDetails, and Range. (HBaseTimelineWriterImpl.java) - l.142: let's declare value inside the loop as well - l.200: Object value => String value? - l.228: let's declare key inside the loop - l.267: let's declare id inside the outer for loop - l.272: let's declare key inside the inner for loop (Range.java) - l.21-22: both can/should be final (TimelineWriterUtils.java) - let's add Private and Unstable annotations > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543353#comment-14543353 ] Hadoop QA commented on YARN-3411: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 36s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 9m 40s | The applied patch generated 11 additional warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 14s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 0m 42s | The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 7m 50s | Tests failed in hadoop-yarn-server-timelineservice. | | | | 44m 34s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-timelineservice | | | org.apache.hadoop.yarn.server.timelineservice.storage.EntityColumnDetails.getInBytes() may expose internal representation by returning EntityColumnDetails.inBytes At EntityColumnDetails.java:by returning EntityColumnDetails.inBytes At EntityColumnDetails.java:[line 48] | | | org.apache.hadoop.yarn.server.timelineservice.storage.EntityTableDetails.COLUMN_FAMILY_CONFIG_BYTES should be package protected At EntityTableDetails.java: At EntityTableDetails.java:[line 48] | | | org.apache.hadoop.yarn.server.timelineservice.storage.EntityTableDetails.COLUMN_FAMILY_INFO_BYTES should be package protected At EntityTableDetails.java: At EntityTableDetails.java:[line 34] | | | org.apache.hadoop.yarn.server.timelineservice.storage.EntityTableDetails.COLUMN_FAMILY_METRICS_BYTES should be package protected At EntityTableDetails.java: At EntityTableDetails.java:[line 41] | | | org.apache.hadoop.yarn.server.timelineservice.storage.EntityTableDetails.COLUMN_PREFIX_EVENTS_BYTES should be package protected At EntityTableDetails.java: At EntityTableDetails.java:[line 55] | | | org.apache.hadoop.yarn.server.timelineservice.storage.EntityTableDetails.COLUMN_PREFIX_IS_RELATED_TO_BYTES should be package protected At EntityTableDetails.java: At EntityTableDetails.java:[line 69] | | | org.apache.hadoop.yarn.server.timelineservice.storage.EntityTableDetails.COLUMN_PREFIX_RELATES_TO_BYTES should be package protected At EntityTableDetails.java: At EntityTableDetails.java:[line 62] | | | org.apache.hadoop.yarn.server.timelineservice.storage.EntityTableDetails.ROW_KEY_SEPARATOR_BYTES should be package protected At EntityTableDetails.java: At EntityTableDetails.java:[line 79] | | Failed unit tests | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl | | | hadoop.yarn.server.timelineservice.storage.TestPhoenixTimelineWriterImpl | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732791/YARN-3411-YARN-2928.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / b059dd4 | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/7932/artifact/patchprocess/diffJavadocWarnings.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7932/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-timelineservice.html | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7932/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7932/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7932/console | This message was automatically generated. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-ta
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542980#comment-14542980 ] Sangjin Lee commented on YARN-3411: --- Here is some quick initial feedback: (HBaseTimelineWriterImpl.java) - l.73: as a rule, we should call super.serviceInit(conf) - l.98-99: lines longer than 80 characters - l.203-204: nit: again I would prefer putting the declarations inside the for loop (the same for other loops) - l.227: since we're accessing both keys and values, it would be slightly more efficient to use entrySet() rather than keySet() - l.238: use LOG.error(message, throwable) - l.286: as a rule, we should call super.serviceStop() (Range.java) - indentation didn't come out right > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542950#comment-14542950 ] Sangjin Lee commented on YARN-3411: --- [~vrushalic], could you ensure the patch applies cleanly? You can try it by checking out YARN-2928 first and then {{git apply --check}}. Also, you'll need to rename the patch to {{YARN-3411-YARN-2928.001.patch}}. Otherwise, jenkins will try it on the trunk, and it won't work. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542919#comment-14542919 ] Hadoop QA commented on YARN-3411: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | patch | 0m 1s | The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. | | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732709/YARN-3411.poc.7.txt | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 281d47a | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7927/console | This message was automatically generated. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541730#comment-14541730 ] Junping Du commented on YARN-3411: -- Sure. Cancel the patch until we have new version. Thx! > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540383#comment-14540383 ] Vrushali C commented on YARN-3411: -- Sounds good, thanks [~djp]! But, would like to request you to wait before reviewing again, I am planning to upload a new patch later today. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540373#comment-14540373 ] Junping Du commented on YARN-3411: -- Sure. I start this rounds of review start several days ago, but cannot get finished mostly until today. Sorry for missing the new version of patch, will check it quite soon. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540303#comment-14540303 ] Vrushali C commented on YARN-3411: -- bq. Also comments on latest (v5) patch Thanks [~djp] for the review but the latest patch is v6. So some of the code lines are no longer there but I will make the other changes like class name rename and making the table name consistent with Phoenix writer table names and the code changes for creating the connection etc. I am working on updating the patch with your review suggestions and the patch in YARN-3529 which will help me build. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540297#comment-14540297 ] Vrushali C commented on YARN-3411: -- Ah, I haven't added that patch, thanks [~gtCarrera9] let me give that a try. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540220#comment-14540220 ] Junping Du commented on YARN-3411: -- Also comments on latest (v5) patch: {code} +public class CreateSchema { {code} Can we rename it to a more concrete name, something like: TimelineSchemaCreator? {code} + private static int createTimelineEntityTable() { +try { + Configuration config = HBaseConfiguration.create(); + // add the hbase configuration details from classpath + config.addResource("hbase-site.xml"); + Connection conn = ConnectionFactory.createConnection(config); + Admin admin = conn.getAdmin(); ... {code} All of these code should be reused by create other tables. May be we should move it out of createTimelineEntityTable() and make it as static part of Class? {code} + if (admin.tableExists(table)) { +// do not disable / delete existing table +// similar to the approach taken by map-reduce jobs when +// output directory exists +LOG.error("Table " + table.getNameAsString() + " already exists."); +return 1; + } {code} We would like to throw exception here so user can get notified the failed reason immediately? {code} + // TTL is 30 days, need to make it configurable perhaps + cf3.setTimeToLive(2592000); {code} We shouldn't have a hard code value here. At least, add a "TODO" in comment to fix it later. In HBaseTimelineWriterImpl.java, {code} +// TODO right now using a default table name +// change later to use a config driven table name +entityTableName = TableName +.valueOf(EntityTableDetails.DEFAULT_ENTITY_TABLE_NAME); {code} Shall we consistent with table name of Pheonix writer if haven't make it configurable? Or we intent to do so for some reasons? {code} + if (entityPuts.size() > 0) { +LOG.info("Storing " + entityPuts.size() + " to " ++ this.entityTableName.getNameAsString()); +entityTable.put(entityPuts); + } else { +LOG.warn("empty entity object?"); + } {code} The first log should be DEBUG level and wrap with if block of "LOG.isDebugEnabled()" which help performance. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540211#comment-14540211 ] Li Lu commented on YARN-3411: - Hi [~vrushalic], sorry to hear the trouble. Have you tried to apply the latest patch in YARN-3529? In that JIRA I'm trying to make the Phoenix writer works with the snapshot version of Phoenix, which lives happily with HBase 1.0.1. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540078#comment-14540078 ] Junping Du commented on YARN-3411: -- PHOENIX-1757 should fix this problem, but the target release is 4.4.0 which sounds like still in voting process (start in May.2). Can we downgrade HBase version to 0.98 here for short term POC and upgrade to 1.0.1 together with PHOENIX 4.4 later? > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540035#comment-14540035 ] Junping Du commented on YARN-3411: -- Sorry. I mean I met a similar problem before which is caused by JDK version. Here it sounds like phoenix 4.3.0 depends on hbase version of 0.98 and we are depends on 1.0.1? > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539552#comment-14539552 ] Junping Du commented on YARN-3411: -- bq. I get "Some Enforcer rules have failed" errors. I am working on resolving those. Any more eyes on this build error would help! Hi [~vrushalic], I meet the same problem and I thought it's my JDK version issue before but actually not. Will also work on resolving this. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539425#comment-14539425 ] Hadoop QA commented on YARN-3411: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | patch | 0m 1s | The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. | | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732170/YARN-3411.poc.6.txt | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 987abc9 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7879/console | This message was automatically generated. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539303#comment-14539303 ] Vrushali C commented on YARN-3411: -- Yes, great to have the Phoenix patch in! I am updating the code with the review suggestions and will upload a new patch very soon. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536225#comment-14536225 ] Sangjin Lee commented on YARN-3411: --- Thanks [~vrushalic] for the latest patch. I took a quick look at it, and have a few comments. I think we should fill in those a couple of to-do's and get a patch ready. (TimelineServicePerformanceV2.java) - you might need to update the patch to be based on the latest from the branch; e.g addTImeSeries() -> addValues() (pom.xml) - it might be good to follow the standard practice (AFAIK) and specify the hbase dependency version in hadoop-project/pom.xml (CreateSchema.java) - l.57: it's not necessary to add hbase-site.xml manually if you're creating an explicit HBaseConfiguration. It does it by default - l.58: the connection should be closed at the end - We don't have to do this right now, but it might be good to expand this tool to be able to drop the schema and recreate it (drop and create) (HBaseTimelineWriterImpl.java) - l.71: let's make this a debug logging statement (also with a isDebugEnabled() call) - l.75: no need to override init() (the base implementation calls serviceInit() anyway) - l.146: since you're getting both key and value, using entrySet() vs. keySet() is a little more efficient (the same for other iterations) - l.225: style nit: you can simply define String key and Object value inside the loop (there is no real savings for reusing the variables): {code} for (Map.Entry entry : configs.entrySet()) { String key = entry.getKey(); Object value = entry.getValue(); ... } {code} - l.244: you want to check the type of the metric (single value v. time series) first and handle them differently, right? (TimelineWriterUtils.java) - l.123: it seems like Range should be a top level class rather than an inner class for EntityTableDetails; it's shared by several classes already > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533604#comment-14533604 ] Junping Du commented on YARN-3411: -- Thank you, [~vrushalic]! :) > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533033#comment-14533033 ] Vrushali C commented on YARN-3411: -- bq. Given there is only very limited Ruby code in current patch, shouldn't be very painful to replace it with Java. Isn't it? Okay, let me rework it to java. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Labels: BB2015-05-TBR > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532992#comment-14532992 ] Junping Du commented on YARN-3411: -- Thanks [~vrushalic] for updating the patch. I will review and comment based on your updated patch in plan. bq. Sure, I can change it to java but I think, as such there should not be any problem running ruby code with hbase, it does not add to dependencies. In fact, the HBase Shell, which is a command line interpreter for HBase is written in Ruby. I see. Thanks for sharing this prospective. However, accepting code with a new language into hadoop community is not only mvn dependency issue but also a maintenance headache according to issues like: setting up new code convention for new language, etc. which could take more time to reach into a consensus. Given there is only very limited Ruby code in current patch, shouldn't be very painful to replace it with Java. Isn't it? > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Labels: BB2015-05-TBR > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530990#comment-14530990 ] Vrushali C commented on YARN-3411: -- update: I will generate another patch, thanks [~gtCarrera9] for checking the patch and observing that flowVersion is not being stored. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Labels: BB2015-05-TBR > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522569#comment-14522569 ] Sangjin Lee commented on YARN-3411: --- +1 on 1.0.1. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522556#comment-14522556 ] Li Lu commented on YARN-3411: - Awesome, thanks! > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522539#comment-14522539 ] Li Lu commented on YARN-3411: - Hi [~vrushalic], one quick thing: do you have a rough estimation on how hard it will be to bump the version of hbase to 1.0.1? Phoenix 4.4.0-SNAPSHOT may rely on 1.0.1 now. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520291#comment-14520291 ] Li Lu commented on YARN-3411: - Hi [~vrushalic] [~zjshen], just a quick thing to confirm that we want to use byte arrays for config and info fields in both of our storage. I'll convert the type for config and info in the Phoenix implementation to VARBINARY to be consistent with this design. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520115#comment-14520115 ] Li Lu commented on YARN-3411: - Thanks [~stack] for the quick info! Yes let's go with HBase 1. We can figure out a solution for Phoenix later. On the worst case, we can rely on the snapshot version of Phoenix, which already works with HBase 1. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519858#comment-14519858 ] stack commented on YARN-3411: - bq. So there are some major changes between hbase 0.98 and hbase like the client facing APIs (HTableInterface, etc) have been deprecated and replaced with new interfaces. It would be a pity if you fellas were stuck on the 0.98 APIs. Phoenix is shaping up to do an RC that will work w/ hbase 1.x. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517838#comment-14517838 ] Zhijie Shen commented on YARN-3411: --- bq. Hmm. So, hbase stores values in bytes. Reading back a value written as a Long using Bytes.toInteger does not work. Hence, there needs to be a way to know what the data type is going to be. Maybe we don't need to take care of object to bytes ser/des ourselves. Please check {{GenericObjectMapper}}, which wraps over jackson object reader and writer. We used to use it to store bytes of the object into leveldb and read bytes fro leveldb and convert them to the object. I think HBase backend implementation can make use of it too, and we can make ser/des at the frontend interface and at the backend persistency consistent. I suggest Phoenix using the same approach too. /cc [~gtCarrera9]. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517347#comment-14517347 ] Vrushali C commented on YARN-3411: -- Hi [~gtCarrera9] Thanks for the feedback! bq. About null checks: so far we do not have a fixed standard on if and where we need to do null checks. I noticed you assumed info, config, event, and other similar fields are not null. Maybe we'd like to explicitly decide when all those fields can be null or empty. Yes, I think it is a very good idea to always do null checks in the writer. Let me update my patch with null checks, thanks! bq. Maybe we'd like to change TimelineWriterUtils to default access modifier? I think it would be sufficient to make it visible in package? Okay, will change the modifier in the next patch. bq. One thing I'd like to open a discussion is on deciding the way to store and process metrics. Currently, in the hbase patch, startTime and endTime are not used. In the Phoenix patch, I store time series as a flattened, non-queryable strings. I think this part also requires some hint from the time-based aggregation Hmm, so I was actually confused about the startTime and endtTime members in that class but let me see if I can catch you offline to understand what they stand for. I would like to ensure we have all the information represented in the backend. bq. Another thing I'd like to discuss here is if and how we'd like to set up a separate "fast path" for metric only updates. On the storage layer, I'd strongly +1 for a separate fast path such that we can only touch the (frequently updated) metrics table. Any proposals everyone? Yes, for the fast path, how about this (I think we discussed it briefly but haven't noted it): the caller will create a small timeline entity object with all other fields as null/empty. In that case, no other writes except the metrics update need to be executed. Also, the metrics object will be lightweight in the sense it will have only that latest timestamp and value in it. How does that sound? > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517334#comment-14517334 ] Vrushali C commented on YARN-3411: -- Hi [~djp] Thanks for the feedback! bq. Do each put in flow table sounds a little expensive especially when putting activity is very frequently. May be we should do some batch mode here? Yes I think we do need batching, but I think that would be at the CollectorManager level. This writer would always do a complete entity write as a synchronous operation to the backend by design. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517331#comment-14517331 ] Vrushali C commented on YARN-3411: -- Hi [~zjshen] >> For metrics, shall we be more generalized to support all kinds of numeric >> value, boolean and so on? Hmm. So, hbase stores values in bytes. Reading back a value written as a Long using Bytes.toInteger does not work. Hence, there needs to be a way to know what the data type is going to be. If metrics will be numbers and booleans, could we say we will always write them as longs? Boolean values can be represented as 0 and 1 for false and true and all numbers can be written out as longs, do you think would that be fine? I have to check what [~gtCarrera9]'s patch writes them as. For the config, I went with the background of hadoop configs where in everything is like a string in the xml, even a number or a ip address etc. thanks Vrushali > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517323#comment-14517323 ] Vrushali C commented on YARN-3411: -- Hi Sangjin, Thank you for the comments! bq. I’m sure you're aware, but please don't forget to add the license to the new files later. Ah, did I miss adding the license, let me ensure I add the licenses in the next patch. bq. l.77: So do we need to depend on hbase-server? That's bit unexpected? Let me try to work this out again, I had added it in as I was working through compilation and runtime errors, but will look into this again to confirm/modify. bq. l.7: I think we're now moving away from the acronym "ats". Perhaps we should simply use "timeline.entity"? Yes, will correct the literal value in the constant. bq. (HBaseTimelineWriterImpl.java) l.40: Just curious, does this mean a single writer instance has only on HBase client connection? We should be able to have multiple connections? What is your thought on this? So, I have been thinking about this too. We would right now have one instance per NodeManager in this version, right? So that is already a lot of connections for cluster with thousands of nodes. I am wondering if we may not even want those many. Later on, when we move to having a timeline service writer per application, even then, say an app that does many writes would simply flood the writes to the region servers if it had a pool of connections at it’s disposal. Perhaps having one per app would help to rate limit it somewhat? I am still thinking out loud about this, haven’t decided which is better or worse. bq. Also, the initialization operations inside the constructor should probably belong in serviceInit() or serviceStart()? Okay, will move it. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)