[ https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341427#comment-16341427 ]
Vrushali C commented on YARN-3895: ---------------------------------- Hi [~jlowe] [~jeagles] I discussed with [~lohit] once again this morning. Based on the scale of domain ids, I wanted to revise the storage design. We now propose to have a domain table, the row key being domain id and there will be two columns one for users and another for groups. And for created time and other things that exist in the TimelineDomain object. So at read time, just like ATSv1 does, first get all the entities satisfying the query criteria, then look for domain ids. And for each domain id in the response, check the domain table if the user/group has permissions. For wildcard of ‘*’, no check is necessary, since it means all users and groups have permissions? Similarly if the querying user is an admin, no check is done. Also, all this is not executed in non-secure mode. This will work functionally correctly but this is going to be a bit slow depending on the number of domain ids found in the entity response set. If there is only one domain id, then only one more get request to hbase. With each additional domain id, the query response time will increase slightly. We can batch the gets to domain table but even so, it will be a few seconds tending to minutes depending on number of calls needed, since multiple calls to hbase translate to multiple hdfs calls. I have been scratching my head on this read performance. The only other option I see is, that the collector keeps the domain id & user/groups info in memory and writes it out with each entity. That way we end up with a denormalized dataset and read queries will be as fast as they can get with hbase. The domain table will still exist and the collector can read from it if it happens to go down and comes back up. Which way do you think might end up working better for applications like Tez? Storage scalability wise, I think either of the two options would be fine with hbase. And the expiration / TTL can be set in either case as well. And as such, for optimizing read / write performance, we can pre-split the domain table and try to balance the row keys to ensure that they go to different Region Servers so we don’t end up hot-spotting one single RS for reads and writes of currently running applications. thanks Vrushali > Support ACLs in ATSv2 > --------------------- > > Key: YARN-3895 > URL: https://issues.apache.org/jira/browse/YARN-3895 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Affects Versions: YARN-2928 > Reporter: Varun Saxena > Assignee: Varun Saxena > Priority: Major > Labels: YARN-5355 > > This JIRA is to keep track of authorization support design discussions for > both readers and collectors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org