[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550930#comment-15550930
 ] 

Rohith Sharma K S commented on YARN-5585:
-----------------------------------------

bq. We also need to be *crystal clear* that timeline clients *must* provide the 
same prefix for all subsequent updates of the same entity. I cannot stress that 
point enough. Rohith, could you confirm that it is not an issue with Tez to 
provide the created time for any subsequent updates for Tez entities?
This is very important point for TimelineClient users who wants to use 
prefixId. Even though I am in minority side of introducing *optional* prefixId, 
convinced myself to go ahead with it because of at least 
optionality(flexibility) is better than predefined storage specific sort order. 
 And knowing the issue is with storage layer which trying to solve popping the 
issue up to API by providing an optionality prefix, which exposing flaw in API 
so that user can mess up the storage which result in inconsistent data while 
retrieving. 
I had offline talk with one of the Tez developer, and he is fine to provide 
prefixId. Some concerns expressed by him are, Firstly about multi JVM which 
makes application programmer to define new protocol for transferring prefixId.  
Secondly, what if users misses providing an prefixId in subsequent updates.? 
This will makes storage mess up with data stored in 2 different entry or it can 
be multiple entry.

bq. I'm also realizing that we might have a bug in how we deal with entity 
id's. I would have thought that we store the entities in the reverse entity id 
order, but it appears that the entity id is encoded into the row key as is 
(EntityRowKey). Am I reading that right? If so, this is a bug to fix.
Sorry I could not get much. Could you explain bit elaborately. Do you mean 
reversing the only entityId i.e if entityId is "12345" then "54321" OR row-key 
itself?

bq. One other thing to deal with is the query by id. There, we need to be able 
to distinguish the case where the data do not have the prefix to begin with and 
that where data do. Ideally we would simply use the row key explicitly in the 
case of data that don't have the prefix to begin with. For those that do have 
the prefix, we cannot use the row key to fetch the row so we need to do 
something different. I don't think this was done in the current patch, but this 
is TBD.
I was thinking to use same REST API for both by using SingleColumnFilter. One 
cons I see is table scan for all the entityType i.e reflect in read performance.

Other comments, let me handle it. And also, I will create patch on YARN-5355 
branch.

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch, 
> YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to