[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710638#comment-14710638 ]
Varun Saxena commented on YARN-4053: ------------------------------------ There was a suggestion that we can support only longs. Would supporting only longs not cause any impact to potential users of ATS ? longs however should cover most of the metrics(as of now I can’t think of any where decimals would be of great importance). If we do this, I think TimelineMetric object should be changed to accept only java.lang.Long and not java.lang.Number… Looping [~vinodkv] to get his opinion on this as well. Although, is it unfair to ask client to send values consistently ? Can’t we document this and enforce this restriction. And if client does not comply, it cannot expect consistent results. This can be the contract between ATS and its clients. Major concern here though would be that it won’t be possible to enforce this restriction programmatically, neither at the client side nor at the server side. *Possible Solution :* There is one possible solution though if enforcing this restriction is not viable. The real problem in both the solutions would come in applying metric filters, if data is inconsistent. So for this, we can use approach 2(include type in column qualifier) and then insert OR filters covering both the column qualifiers for same metric. I will elaborate this with an example. Let us say we have a metric called JOB_ELAPSED_TIME and client can report both integral and floating point values for it(say). With approach 2, we will have 2 column qualifiers for this metric i.e. “ JOB_ELAPSED_TIME=L” (for longs) and “JOB_ELAPSED_TIME=D” (for doubles). Now, when a query comes with metric filter value in integer format i.e. something like JOB_ELAPSED_TIME > 40 can be transformed to corresponding HBase filter of the form (“JOB_ELAPSED_TIME=L” > 40 OR “JOB_ELAPSED_TIME=D” > 40.0). i.e. a filter list of the form (“m1” > 10 AND “m2” < 5 AND “m3”=4) would be transformed to ((“m1=L” > 10 OR “m1=D” > 10.0) AND (“m2=L” < 5 OR “m2=D” < 5.0) AND (“m3=L” = 4 OR “m3=D” = 4.0)). If filter value is in decimal format then we will have to make additional changes. If filter is something like JOB_ELAPSED_TIME > 40.75 it will have to be converted to (“JOB_ELAPSED_TIME=L” >= 41 OR “JOB_ELAPSED_TIME=D” > 40.75). As you can see here, while matching a double value against column qualifier storing longs, I would need increase the value to closest integer and change filter to >=. Likewise changes will be required for < (less than) and equal to(=) comparison as well. However, I am not sure whether adding too many filters will cause any performance issue for HBase or not. Because with this solution, we will in essence be doubling the size of metric filters. One thing we need to note though is that if we do adopt approach 2(including type in column qualifier), regex comparison might become an issue. Because theoretically regular expressions can become quite complex, so programmatically interpreting a regex and transforming it in a manner where it takes both long related column qualifier and double related column qualifier might induce bugs. Maybe we can just support wildcard match(\*) or just do with prefix and substring filters. Thoughts ? However, we may want to match against only the latest version of the value for a metric. In that case, the solution suggested above won’t work. > Change the way metric values are stored in HBase Storage > -------------------------------------------------------- > > Key: YARN-4053 > URL: https://issues.apache.org/jira/browse/YARN-4053 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Affects Versions: YARN-2928 > Reporter: Varun Saxena > Assignee: Varun Saxena > Attachments: YARN-4053-YARN-2928.01.patch > > > Currently HBase implementation uses GenericObjectMapper to convert and store > values in backend HBase storage. This converts everything into a string > representation(ASCII/UTF-8 encoded byte array). > While this is fine in most cases, it does not quite serve our use case for > metrics. > So we need to decide how are we going to encode and decode metric values and > store them in HBase. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)