[ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494251#comment-15494251 ]
Varun Saxena commented on YARN-5585: ------------------------------------ Just to summarise the suggestions given for folks to refer to. * Applications (like Tez) would know best how to interpret their entity IDs' and how they can be descendingly sorted. Most entity IDs' seem to have some sort of monotonically increasing sequence like app ID. We can hence open up a PUBLIC interface which ATSv2 users like Tez can implement to decide how to encode and decode a particular entity type so that it is stored in descending sorted fashion (based on creation time) in ATSv2. Encoding and decoding similar to AppIDConverter written in our code.Because if row keys themselves can be sorted, this will be performance wise the best possible solution. Refer to [comment | https://issues.apache.org/jira/browse/YARN-5585?focusedCommentId=15470803&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15470803] ** _Pros of the approach:_ **# Lookup will be fast. ** _Cons of the approach:_ **# We are depending on application to provide some code for this to work. Corresponding JAR will have to be placed in classpath. Folks in other projects may not be pleased to not have inbuilt support for this in ATS. **# Entity IDs' may not always have a monotonically increasing sequence like App IDs'. * We can keep another table, say EntityCreationTable or EntityIndexTable with row key as {{cluster!user!flow!flowrun!app!entitytype!reverse entity creation time!entityid}}. We will make an entry into this table whenever created time is reported for the entity. The real data would still reside in the main entity table. Entities in this table will be sorted descendingly. On read side, we can first peek into this table to get relevant records in descending fashion (based on limit and/or fromId) and then use this info to query entity table. We can do this in two ways. We can get created times from querying this index table and apply a filter of created time range. Or alternatively we can try out MultiRowRangeFilter. That from javadoc of HBase seems to be efficient. We will have to do some processing to determine these multiple row key ranges. Refer to [comment | https://issues.apache.org/jira/browse/YARN-5585?focusedCommentId=15472669&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15472669] ** _Note:_ Client should not send different created times for the same entity otherwise that will lead to an additional row. If different created time would be reported more than once we will have to consider the latest one. ** _Pros of the approach:_ **# Solution provided within ATS. **# Extra write only when created time is reported. ** _Cons of the approach:_ **# Extra peek into the index table on the read side. Single entity read can still be served directly from entity table though. * Another option would be to change the row key of entity table to cluster!user!flow!flowrun!app!entitytype!reverse entity creation time!entityid and have another table to map cluster!user!flow!flowrun!app!entitytype!entityid to entity created time. So for a single entity call (HBase Get) we will have to first peek into the new table and then get records from entity table. ** _Cons of the approach:_ **# On write side, we will have to first lookup into the index table which has the entity created time or on every write client should supply entity created time. First would impact write performance and latter may not be feasible for client to send. **# What should be the row key if client does not supply created time on first write but supplies the created time on a subsequent write. cc [~sjlee0], [~vrushalic], [~rohithsharma], [~gtCarrera9] > [Atsv2] Add a new filter fromId in REST endpoints > ------------------------------------------------- > > Key: YARN-5585 > URL: https://issues.apache.org/jira/browse/YARN-5585 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader > Reporter: Rohith Sharma K S > Assignee: Rohith Sharma K S > Priority: Critical > Attachments: YARN-5585.v0.patch > > > TimelineReader REST API's provides lot of filters to retrieve the > applications. Along with those, it would be good to add new filter i.e fromId > so that entities can be retrieved after the fromId. > Current Behavior : Default limit is set to 100. If there are 1000 entities > then REST call gives first/last 100 entities. How to retrieve next set of 100 > entities i.e 101 to 200 OR 900 to 801? > Example : If applications are stored database, app-1 app-2 ... app-10. > *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is > no way to achieve this. > So proposal is to have fromId in the filter like > *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to > app-10. > Since ATS is targeting large number of entities storage, it is very common > use case to get next set of entities using fromId rather than querying all > the entites. This is very useful for pagination in web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org