[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710638#comment-14710638
 ] 

Varun Saxena commented on YARN-4053:
------------------------------------

There was a suggestion that we can support only longs. Would supporting only 
longs not cause any impact to potential users of ATS ?
longs however should cover most of the metrics(as of now I can’t think of any 
where decimals would be of great importance).
If we do this, I think TimelineMetric object should be changed to accept only 
java.lang.Long and not java.lang.Number…
Looping [~vinodkv] to get his opinion on this as well.
 
Although, is it unfair to ask client to send values consistently ?
Can’t we document this and enforce this restriction. And if client does not 
comply, it cannot expect consistent results. This can be the contract between 
ATS and its clients.
Major concern here though would be that it won’t be possible to enforce this 
restriction programmatically, neither at the client side nor at the server side.
 
*Possible Solution :*
There is one possible solution though if enforcing this restriction is not 
viable. The real problem in both the solutions would come in applying metric 
filters, if data is inconsistent.
So for this, we can use approach 2(include type in column qualifier) and then 
insert OR filters covering both the column qualifiers for same metric.
 
I will elaborate this with an example.
Let us say we have a metric called JOB_ELAPSED_TIME and client can report both 
integral and floating point values for it(say). With approach 2, we will have 2 
column qualifiers for this metric i.e.  “ JOB_ELAPSED_TIME=L” (for longs) and 
“JOB_ELAPSED_TIME=D” (for doubles).
Now, when a query comes with metric filter value in integer format i.e. 
something like JOB_ELAPSED_TIME > 40 can be transformed to corresponding HBase 
filter of the form (“JOB_ELAPSED_TIME=L” > 40 OR “JOB_ELAPSED_TIME=D” > 40.0).
 i.e. a filter list of the form (“m1” > 10 AND “m2” < 5 AND “m3”=4) would be 
transformed to ((“m1=L” > 10 OR “m1=D”  > 10.0) AND (“m2=L” < 5 OR “m2=D” < 
5.0) AND (“m3=L” = 4 OR “m3=D” = 4.0)).
 
If filter value is in decimal format then we will have to make additional 
changes. If filter is something like JOB_ELAPSED_TIME > 40.75 it will have to 
be converted to (“JOB_ELAPSED_TIME=L” >= 41 OR “JOB_ELAPSED_TIME=D” > 40.75). 
As you can see here, while matching a double value against column qualifier 
storing longs, I would need increase the value to closest integer and change 
filter to >=. Likewise changes will be required for < (less than) and equal 
to(=) comparison as well.
 
However, I am not sure whether adding too many filters will cause any 
performance issue for HBase or not. Because with this solution, we will in 
essence be doubling the size of metric filters.
 
One thing we need to note though is that if we do adopt approach 2(including 
type in column qualifier), regex comparison might become an issue. Because 
theoretically regular expressions can become quite complex, so programmatically 
interpreting a regex and transforming it in a manner where it takes both long 
related column qualifier and double related column qualifier might induce bugs.
Maybe we can just support wildcard match(\*) or just do with prefix and 
substring filters.
 
Thoughts ?

However, we may want to match against only the latest version of the value for 
a metric.
In that case, the solution suggested above won’t work.

> Change the way metric values are stored in HBase Storage
> --------------------------------------------------------
>
>                 Key: YARN-4053
>                 URL: https://issues.apache.org/jira/browse/YARN-4053
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>         Attachments: YARN-4053-YARN-2928.01.patch
>
>
> Currently HBase implementation uses GenericObjectMapper to convert and store 
> values in backend HBase storage. This converts everything into a string 
> representation(ASCII/UTF-8 encoded byte array).
> While this is fine in most cases, it does not quite serve our use case for 
> metrics. 
> So we need to decide how are we going to encode and decode metric values and 
> store them in HBase.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to