[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548335#comment-14548335
 ] 

Junping Du commented on YARN-3411:
----------------------------------

bq. No, we will never drop the last value. MIN_VERSIONS and TTL are set such 
that last value is always retained. I am setting the MAX_VERSIONS for now to 
200, but we can revisit this when we determine how exactly the timeseries data 
is going to be handled. And of course it can be made configurable.
I meant the earliest (oldest) value not the latest. Agree that we can revisit 
the value later for other cases that I mentioned above, but just want to double 
check we don't have other options, i.e. making time series data as different 
rows or columns rather than different timestamps/versions here.

bq.  Wondering why we would aggregate data in one timeseries for one metric 
over time?
That's because the interested interval (present to enduser) is not always the 
same interval for gathering timeline metrics data. Let's saying we received 
container metrics data from NodeManager every second, but the aggregated data 
user interested is per minutes, then we need to aggregate 60 seconds data for 
one single metrics. Make sense?
 
Thanks for updating the patch. Just quickly check latest patch, a few comments 
so far:
1. Sounds like we don't leverage single row transaction of HBase feature, as we 
are updating different column families (events, configurations, metrics, etc.) 
separately.  Do we need to make sure data in each row get updated consistently?

2. We shouldn't swallow exception in updating data to HBase, just log.error() 
may not be enough.

3. We need to check null in writing TimelineEntity to HBase, as TimelineEntity 
could include null events/configurations/metrics, that could make foreach later 
throw NPE exception.

Comments with more details could come later.

> [Storage implementation] explore the native HBase write schema for storage
> --------------------------------------------------------------------------
>
>                 Key: YARN-3411
>                 URL: https://issues.apache.org/jira/browse/YARN-3411
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Vrushali C
>            Priority: Critical
>         Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
> YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
> YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
> YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to