[ 
https://issues.apache.org/jira/browse/YARN-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301193#comment-15301193
 ] 

Joep Rottinghuis commented on YARN-5105:
----------------------------------------

While I agree that we can postpone the decision whether to add more complexity 
later as to adding a time range and/or a count range, I feel we need to leave 
the door open to do so.
So that leads me to a slightly different opinion on adding a single boolean 
attribute isTimeSeries yes/no type of argument.

If we think forward, how would that work with isTimeSeries? Would we then have 
both a time range and we'd mandate multiple values with "isTimeSeries"?
In addition, just the boolean doesn't immediately invoke the sense that if you 
say false that you get 1 value (the latest one) back, versus getting skipping 
metrics altogether. I think we can already do that by specifying fields to 
retrieve.

Read for example the javadoc on TimelineDataToRetrieve:
{code}
* <li><b>isTimeSeries</b> - If fieldsToRetrieve contains METRICS/ALL or
57       * metricsToRetrieve is specified, this boolean flag indicates whether 
a time
58       * series needs to be returned for these metrics. The flag is ignored 
if METRICS
59       * are not to be fetched.</li>
{code}
It isn't quite clear that 1 row is returned if isTimeSeries is false.
Admittedly, TimelineReaderWebServices is a bit more explicit:
{code}
257        * @param timeSeries If specified, defines whether a metric time 
series needs
258        *     to be returned if fields contains METRICS/ALL or 
metricsToRetrieve is
259        *     specified. Ignored otherwise. If value is true, means time 
series will
260        *     be returned. All other values will be treated as false, 
including when
261        *     this parameter is unspecified. In such cases, latest single 
value of
262        *     metric(s) will be returned (Optional query param).
{code}
It still a little confusing.

Given that we already have the concept of limit to limit the # entities we 
return, why don't change the timeseries argument from boolean to a 
timeserieslimit. We'd document that the default is 1 and that -1 means no limit 
(ie retrieve the entire time series). Furthermore we can specify for now that 
the only two values allowed are -1 and 1. In other words, -1 is no limit, or 
else only one record is returned. The query limiting maps relatively neatly to 
the HBase get.
ApplicationEntityReader. getResults
in your latest patch was:
{code}
315         if (getDataToRetrieve().isTimeSeries()) {
316           get.setMaxVersions(Integer.MAX_VALUE);
317         }
{code}
and would become:
{code}
315         if (getDataToRetrieve().getTimeSeriesLimit() >= 0) {
316           get.setMaxVersions(getDataToRetrieve().getTimeSeriesLimit());
317         }
{code}

I agree that we shouldn't try to distinguish between separate limits for 
separate columns for now to keep things simple. 

Now if we were to add the time range to further give flexibility to limit which 
records are retrieved, that would be relatively orthogonal to timeSeriesLimit. 
We'd simply return the last # metrics (per column) that fall within the 
specified range.

> entire time series is returned for YARN container system metrics (CPU and 
> memory)
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-5105
>                 URL: https://issues.apache.org/jira/browse/YARN-5105
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-5105-YARN-2928.01.patch, 
> YARN-5105-YARN-2928.02.patch, YARN-5105-YARN-2928.03.patch
>
>
> I see that the entire time series of the CPU and memory metrics are returned 
> for the YARN containers REST query. This has a potential of bloating the 
> output big time.
> {noformat}
> "metrics": [
> {
>     "type": "TIME_SERIES",
>     "id": "MEMORY",
>     "values": 
> {
>     "1463518173363": ​407539712,
>     "1463518170347": ​407539712,
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to