[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15512079#comment-15512079
 ] 

Vrushali C edited comment on YARN-5585 at 9/22/16 4:23 AM:
-----------------------------------------------------------

I have been thinking more on this. I think if there is a concern about having 
the same entity data in two tables, what we could do is, set a TTL (time to 
live) on the cells in the auxiliary table. That way, for some period of time we 
store data in two places but then it gets cleaned up. 

For example, if the Tez UI queries for data in the auxiliary table for a job 
that ran 1 year back, then say, it does not exist anymore in the auxiliary 
table since it got cleaned up by hbase. Now the Tez UI can try querying the 
regular table. Or the auxiliary REST api call can take a parameter that says if 
data is not found in auxiliary table, please query the regular entity table and 
the rest call would perhaps then take a little longer to return. Since we are 
querying for something that ran 1 year back, I believe we can wait for an extra 
moment for the call to return.

This way, we store data in two tables for a brief time period, rely on hbase to 
clean up cells as per their TTL and provide a way for frameworks to store/query 
their data in harmony with timeline service storage.


was (Author: vrushalic):
I have been thinking more on this. I think if there is a concern about having 
the same entity data in two tables, what we could do is, set a TTL (time to 
live) on the cells in the auxiliary table. That way, for some period of time we 
store data in two places but then it gets cleaned up. 

For example, if Tez UI queries for data in the auxiliary table for a job that 
ran 1 year back, then say, it does not exist anymore in the auxiliary table 
since it got cleaned up by hbase. Now the Tez UI can try querying the regular 
table. Or the auxiliary REST api call can take a parameter that says if data is 
not found in auxiliary table, please query the regular entity table and the 
rest call would perhaps then take a little longer to return. Since we are 
querying for something that ran 1 year back, I believe we can wait for an extra 
moment for the call to return.

This way, we store data in two tables for a brief time period, rely on hbase to 
clean up cells as per their TTL and provide a way for frameworks to store/query 
their data in harmony with timeline service storage.

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to