[ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550930#comment-15550930 ]
Rohith Sharma K S commented on YARN-5585: ----------------------------------------- bq. We also need to be *crystal clear* that timeline clients *must* provide the same prefix for all subsequent updates of the same entity. I cannot stress that point enough. Rohith, could you confirm that it is not an issue with Tez to provide the created time for any subsequent updates for Tez entities? This is very important point for TimelineClient users who wants to use prefixId. Even though I am in minority side of introducing *optional* prefixId, convinced myself to go ahead with it because of at least optionality(flexibility) is better than predefined storage specific sort order. And knowing the issue is with storage layer which trying to solve popping the issue up to API by providing an optionality prefix, which exposing flaw in API so that user can mess up the storage which result in inconsistent data while retrieving. I had offline talk with one of the Tez developer, and he is fine to provide prefixId. Some concerns expressed by him are, Firstly about multi JVM which makes application programmer to define new protocol for transferring prefixId. Secondly, what if users misses providing an prefixId in subsequent updates.? This will makes storage mess up with data stored in 2 different entry or it can be multiple entry. bq. I'm also realizing that we might have a bug in how we deal with entity id's. I would have thought that we store the entities in the reverse entity id order, but it appears that the entity id is encoded into the row key as is (EntityRowKey). Am I reading that right? If so, this is a bug to fix. Sorry I could not get much. Could you explain bit elaborately. Do you mean reversing the only entityId i.e if entityId is "12345" then "54321" OR row-key itself? bq. One other thing to deal with is the query by id. There, we need to be able to distinguish the case where the data do not have the prefix to begin with and that where data do. Ideally we would simply use the row key explicitly in the case of data that don't have the prefix to begin with. For those that do have the prefix, we cannot use the row key to fetch the row so we need to do something different. I don't think this was done in the current patch, but this is TBD. I was thinking to use same REST API for both by using SingleColumnFilter. One cons I see is table scan for all the entityType i.e reflect in read performance. Other comments, let me handle it. And also, I will create patch on YARN-5355 branch. > [Atsv2] Add a new filter fromId in REST endpoints > ------------------------------------------------- > > Key: YARN-5585 > URL: https://issues.apache.org/jira/browse/YARN-5585 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader > Reporter: Rohith Sharma K S > Assignee: Rohith Sharma K S > Priority: Critical > Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch, > YARN-5585.v0.patch > > > TimelineReader REST API's provides lot of filters to retrieve the > applications. Along with those, it would be good to add new filter i.e fromId > so that entities can be retrieved after the fromId. > Current Behavior : Default limit is set to 100. If there are 1000 entities > then REST call gives first/last 100 entities. How to retrieve next set of 100 > entities i.e 101 to 200 OR 900 to 801? > Example : If applications are stored database, app-1 app-2 ... app-10. > *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is > no way to achieve this. > So proposal is to have fromId in the filter like > *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to > app-10. > Since ATS is targeting large number of entities storage, it is very common > use case to get next set of entities using fromId rather than querying all > the entites. This is very useful for pagination in web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org