[jira] [Comment Edited] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

Varun Saxena (JIRA) Thu, 15 Sep 2016 12:09:48 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494251#comment-15494251
 ]


Varun Saxena edited comment on YARN-5585 at 9/15/16 7:08 PM:
-------------------------------------------------------------

Just to summarise the suggestions given for folks to refer to.

* Applications (like Tez) would know best how to interpret their entity IDs' 
and how they can be descendingly sorted. Most entity IDs' seem to have some 
sort of monotonically increasing sequence like app ID. We can hence open up a 
PUBLIC interface which ATSv2 users like Tez can implement to decide how to 
encode and decode a particular entity type so that it is stored in descending 
sorted fashion (based on creation time) in ATSv2. Encoding and decoding similar 
to AppIDConverter written in our code.Because if row keys themselves can be 
sorted, this will be performance wise the best possible solution. Refer to 
[comment | 
https://issues.apache.org/jira/browse/YARN-5585?focusedCommentId=15470803&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15470803]
** _Pros of the approach:_ 
**# Lookup will be fast.
** _Cons of the approach:_ 
**# We are depending on application to provide some code for this to work. 
Corresponding JAR will have to be placed in classpath. Folks in other projects 
may not be pleased to not have inbuilt support for this in ATS.
**# Entity IDs' may not always have a monotonically increasing sequence like 
App IDs'.

* We can keep another table, say EntityCreationTable or EntityIndexTable with 
row key as {{cluster!user!flow!flowrun!app!entitytype!reverse entity creation 
time!entityid}}. We will make an entry into this table whenever created time is 
reported for the entity. The real data would still reside in the main entity 
table. Entities in this table will be sorted descendingly. On read side, we can 
first peek into this table to get relevant records in descending fashion (based 
on limit and/or fromId) and then use this info to query entity table. We can do 
this in two ways. We can get created times from querying this index table and 
apply a filter of created time range. Or alternatively we can try out 
MultiRowRangeFilter. That from javadoc of HBase seems to be efficient. We will 
have to do some processing to determine these multiple row key ranges.  Refer 
to [comment | 
https://issues.apache.org/jira/browse/YARN-5585?focusedCommentId=15472669&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15472669]
** _Note:_  Client should not send different created times for the same entity 
otherwise that will lead to an additional row.  If different created time would 
be reported more than once we will have to consider the latest one.
** _Pros of the approach:_ 
**# Solution provided within ATS.
**# Extra write only when created time is reported.
** _Cons of the approach:_ 
**# Extra peek into the index table on the read side. Single entity read can 
still be served directly from entity table though.

* Another option would be to change the row key of entity table to 
{{cluster!user!flow!flowrun!app!entitytype!reverse entity creation 
time!entityid}} and have another table to map 
{{cluster!user!flow!flowrun!app!entitytype!entityid}} to entity created time.
So for a single entity call (HBase Get) we will have to first peek into the new 
table and then get records from entity table.
** _Cons of the approach:_ 
**# On write side, we will have to first lookup into the index table which has 
the entity created time or on every write client should supply entity created 
time. First would impact write performance and latter may not be feasible for 
client to send.
**# What should be the row key if client does not supply created time on first 
write but supplies the created time on a subsequent write.

cc [~sjlee0], [~vrushalic], [~rohithsharma], [~gtCarrera9]


was (Author: varun_saxena):
Just to summarise the suggestions given for folks to refer to.

* Applications (like Tez) would know best how to interpret their entity IDs' 
and how they can be descendingly sorted. Most entity IDs' seem to have some 
sort of monotonically increasing sequence like app ID. We can hence open up a 
PUBLIC interface which ATSv2 users like Tez can implement to decide how to 
encode and decode a particular entity type so that it is stored in descending 
sorted fashion (based on creation time) in ATSv2. Encoding and decoding similar 
to AppIDConverter written in our code.Because if row keys themselves can be 
sorted, this will be performance wise the best possible solution. Refer to 
[comment | 
https://issues.apache.org/jira/browse/YARN-5585?focusedCommentId=15470803&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15470803]
** _Pros of the approach:_ 
**# Lookup will be fast.
** _Cons of the approach:_ 
**# We are depending on application to provide some code for this to work. 
Corresponding JAR will have to be placed in classpath. Folks in other projects 
may not be pleased to not have inbuilt support for this in ATS.
**# Entity IDs' may not always have a monotonically increasing sequence like 
App IDs'.

* We can keep another table, say EntityCreationTable or EntityIndexTable with 
row key as {{cluster!user!flow!flowrun!app!entitytype!reverse entity creation 
time!entityid}}. We will make an entry into this table whenever created time is 
reported for the entity. The real data would still reside in the main entity 
table. Entities in this table will be sorted descendingly. On read side, we can 
first peek into this table to get relevant records in descending fashion (based 
on limit and/or fromId) and then use this info to query entity table. We can do 
this in two ways. We can get created times from querying this index table and 
apply a filter of created time range. Or alternatively we can try out 
MultiRowRangeFilter. That from javadoc of HBase seems to be efficient. We will 
have to do some processing to determine these multiple row key ranges.  Refer 
to [comment | 
https://issues.apache.org/jira/browse/YARN-5585?focusedCommentId=15472669&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15472669]
** _Note:_  Client should not send different created times for the same entity 
otherwise that will lead to an additional row.  If different created time would 
be reported more than once we will have to consider the latest one.
** _Pros of the approach:_ 
**# Solution provided within ATS.
**# Extra write only when created time is reported.
** _Cons of the approach:_ 
**# Extra peek into the index table on the read side. Single entity read can 
still be served directly from entity table though.

* Another option would be to change the row key of entity table to 
cluster!user!flow!flowrun!app!entitytype!reverse entity creation time!entityid 
and have another table to map cluster!user!flow!flowrun!app!entitytype!entityid 
to entity created time.
So for a single entity call (HBase Get) we will have to first peek into the new 
table and then get records from entity table.
** _Cons of the approach:_ 
**# On write side, we will have to first lookup into the index table which has 
the entity created time or on every write client should supply entity created 
time. First would impact write performance and latter may not be feasible for 
client to send.
**# What should be the row key if client does not supply created time on first 
write but supplies the created time on a subsequent write.

cc [~sjlee0], [~vrushalic], [~rohithsharma], [~gtCarrera9]

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

Reply via email to