[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15455257#comment-15455257
 ] 

Varun Saxena edited comment on YARN-5585 at 9/1/16 12:40 PM:
-------------------------------------------------------------

So I though a little bit over it and I think there is a solution possible for 
fetching apps within a cluster without much of performance impact. Because this 
seems to be your use case.

What we can do is that  we can get the required App IDs' from App to flow table 
first as app ids' in this table are sorted and extract applicable flows from 
there. And then get data from the application table using these unique flows to 
get more specific information about the apps.  We have something called 
MultiRowRangeFilter in HBase which can help us specify multiple row key ranges.
We can only return those apps which we found from app to flow table. 
And from a performance viewpoint we can assume there will always be a 
reasonable limit specified.
 
_Example:_
Assume, in a cluster we have applications from application_1111111_0001 to 
application_1111111_0034 (running or completed).
These apps will be stored in a descending order in app to flow table. 
Let us say you want to get latest 10 apps (i.e. limit in your query is 10).
What we can do is get first 10 apps from app to flow table i.e. 
application_1111111_0034 to application_1111111_0025. We can use PageFilter to 
return only first 10 records. This is the result set we can return back.
Assume application IDs' ending with _0034, _0031 and _0027 belong to flow1 and 
rest to flow2. We can then use this info to query app table.

So to get detailed info for these 10 apps in a single shot from application 
table, what we can do is as under :
* Create a MultiRowRangeFilter
* For flow1. add start row as {{cluster!user!flow1!application_1111111_0034}} 
and stop row as {{cluster!user!flow1!application_1111111_0027}}. We can make 
stop row inclusive. We can then add this start/stop row pair into the multi row 
range filter created.
* And for flow2, start row can be  
{{cluster!user!flow2!application_1111111_0033}} and stop row as  
{{cluster!user!flow2!application_1111111_0024}}. We can then add this 
start/stop row pair into the multi row range filter created.

This would be slower than getting all apps when flow or flow run is specified 
but would be faster than doing full table scan of application table, especially 
when it grows large.

Maybe I can raise a separate JIRA for this and handle it there if this is a 
real use case.


was (Author: varun_saxena):
So I though a little bit over it and I think there is a solution possible for 
fetching apps within a cluster without much of performance impact. Because this 
seems to be your use case.

What we can do is that  we can get the required App IDs' from App to flow table 
first as app ids' in this table are sorted and extract applicable flows from 
there. And then get data from the application table using these unique flows to 
get more specific information about the apps. Say pass a flow to appids' map. 
We have something called MultiRowRangeFilter in HBase which can help us specify 
multiple row key ranges.
We can only return those apps which we found from app to flow table. 
And from a performance viewpoint we can assume there will always be a 
reasonable limit specified.
 
_Example:_
Assume, in a cluster we have applications from application_1111111_0001 to 
application_1111111_0034 (running or completed).
These apps will be stored in a descending order in app to flow table. 
Let us say you want to get latest 10 apps (i.e. limit in your query is 10).
What we can do is get first 10 apps from app to flow table i.e. 
application_1111111_0034 to application_1111111_0025. We can use PageFilter to 
return only first 10 records. This is the result set we can return back.
Assume application IDs' ending with _0034, _0031 and _0027 belong to flow1 and 
rest to flow2. We can then use this info to query app table.

So to get detailed info for these 10 apps in a single shot from application 
table, what we can do is as under :
* Create a MultiRowRangeFilter
* For flow1. add start row as {{cluster!user!flow1!application_1111111_0034}} 
and stop row as {{cluster!user!flow1!application_1111111_0027}}. We can make 
stop row inclusive. We can then add this start/stop row pair into the multi row 
range filter created.
* And for flow2, start row can be  
{{cluster!user!flow2!application_1111111_0033}} and stop row as  
{{cluster!user!flow2!application_1111111_0024}}. We can then add this 
start/stop row pair into the multi row range filter created.

This would be slower than getting all apps when flow or flow run is specified 
but would be faster than doing full table scan of application table, especially 
when it grows large.

Maybe I can raise a separate JIRA for this and handle it there if this is a 
real use case.

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to