[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514243#comment-15514243
 ] 

Varun Saxena edited comment on YARN-5585 at 9/22/16 7:25 PM:
-------------------------------------------------------------

Summarizing the solution we decided upon in the call.

* We will now return entities from entity table in a lexicographic order of 
entity IDs'
* To achieve a different sort order, we will provide a mechanism for 
applications to provide an entity ID prefix which can be set in the 
TimelineEntity object while publishing the entity.
* This entityId prefix will be part of the row key in entity table. As the name 
suggests, it will be present just before the entity ID. Applications can choose 
to provide no entity ID prefix if they are happy with the lexicographic sort 
order. So the row key now will be 
{{cluster!user!flow!flowrun!app!entitytype!\{entityidprefix\}!\{entityid\}}}
* Entity ID will also be stored under a column qualifier too (being done 
already).
* Entity ID prefix can be a number (say long) as numbers generally provide a 
natural sort ordering. However, this needs to be finalized. Keep it as a string 
?
* When querying multiple entities, we will return the top N entities decided by 
limit in a lexicographic order of entity ID prefix + entity ID (i.e. if entity 
ID prefix is supplied). fromID filter can now be something like fromIDPrefix 
(say) or a similar filter which provides prefix + ID to support pagination.
* While querying a single entity, prefix can be supplied as a query param. If 
supplied, it will be a Get, otherwise we need to have a Scan with 
SingleColumnValueFilter on entity ID (this will be comparatively slower). We 
can have a separate REST endpoint to distinguish between prefix based queries 
and non prefix based queries. We need to distinguish between the case where for 
an entity prefix has not been specified on the write path and prefix not just 
supplied at the read path (even if it was supplied at the write path). This 
needs to be finalized.
* Prefix will also be returned as part of TimelineEntity object in response.

cc [~jrottinghuis], [~sjlee0], [~vrushalic], [~gtCarrera9]. Hope this covers 
everything.

The reason this solution was chosen was that we thought in UI use cases a 
single entity read would typically be followed listing of multiple entities and 
hence prefix would be known. This does not mean however, that we will not 
provide a mechanism to fetch entity if prefix wasn't given. We can use a single 
column value filter then.
Moreover, this solution overall had lesser write or read penalty compared to 
solutions listed above.



was (Author: varun_saxena):
Summarizing the solution we decided upon in the call.

* We will now return entities from entity table in a lexicographic order of 
entity IDs'
* To achieve a different sort order, we will provide a mechanism for 
applications to provide an entity ID prefix which can be set in the 
TimelineEntity object while writing the entity to backend.
* This entityId prefix will be part of the row key in entity table. As the name 
suggests, it will be present just before the entity ID. Applications can choose 
to provide no entity ID prefix if they are happy with the lexicographic sort 
order. So the row key now will be 
{{cluster!user!flow!flowrun!app!entitytype!\{entityidprefix\}!\{entityid\}}}
* Entity ID will also be stored under a column qualifier too (being done 
already).
* Entity ID prefix can be a number (say long) as numbers generally provide a 
natural sort ordering. However, this needs to be finalized. Keep it as a string 
?
* When querying multiple entities, we will return the top N entities decided by 
limit in a lexicographic order of entity ID prefix + entity ID (i.e. if entity 
ID prefix is supplied). fromID filter can now be something like fromIDPrefix 
(say) or a similar filter which provides prefix + ID to support pagination.
* While querying a single entity, prefix can be supplied as a query param. If 
supplied, it will be a Get, otherwise we need to have a Scan with 
SingleColumnValueFilter on entity ID (this will be comparatively slower). We 
can have a separate REST endpoint to distinguish between prefix based queries 
and non prefix based queries. We need to distinguish between the case where for 
an entity prefix has not been specified on the write path and prefix not just 
supplied at the read path (even if it was supplied at the write path). This 
needs to be finalized.
* Prefix will also be returned as part of TimelineEntity object in response.

cc [~jrottinghuis], [~sjlee0], [~vrushalic], [~gtCarrera9]. Hope this covers 
everything.

The reason this solution was chosen was that we thought in UI use cases a 
single entity read would typically be followed listing of multiple entities and 
hence prefix would be known. This does not mean however, that we will not 
provide a mechanism to fetch entity if prefix wasn't given. We can use a single 
column value filter then.
Moreover, this solution overall had lesser write or read penalty compared to 
solutions listed above.


> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to