[ https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336260#comment-16336260 ]
Vrushali C commented on YARN-3895: ---------------------------------- Thanks [Jason Lowe|applewebdata://C29F7BBC-6971-4859-917C-995869395317/jira/secure/ViewProfile.jspa?name=jlowe] for the discussion. Let me summarize some points from our discussions so far. * Goal of jira: Design a way for authorization during reads of timeline entities * Design objectives: * store data in denormalized fashion since hbase reads would work well with that. Avoid joins across tables * Write out ACLs as few times as possible. Ideally once per DAG (once per application) Background: * ATSv1 / 1.5 does read authorization via domain ids. A domain id is published once per DAG or once per application and all entities written with that domain id are authorized at read time accordingly. Current design proposal summary: * ATSv2 uses HBase and if we were to follow a design similar to ATSv1/1.5, then that would mean doing a join across two tables (domain/ACLs table and the entity table). This will not be ideal in terms of read performance. Correctness will not be an issue here, response latencies would be a concern. * To counteract the read latencies, one idea is to do reads from collector at write time. There are few things that might be a concern here. The collector would now open connections to more region servers to read from other tables. When running at scale, we would like the write path needs to be along the lines of “fire-and-forget” . Doing reads from collector would likely causes high latencies during writes as well as increased network connections when running at scale for the yarn cluster as well as the HBase cluster. Also, doing a read then write does not lower the size of data being sent from collector to region server. * There is another thought along the lines of caching the ACLs in the collector and attaching them to each entity while writing it out. The ACLs would also be stored in an ACLs table. Now, in the case of collector going down and coming back up, it can do a read from the ACLs table for the applications it is collecting data from. This read is a one-off case when the collector goes down and comes back up. The ACLs are still stored in a denormalized way with the entity and reads do not query this ACLs table. * This case still does not reduce the size of data being sent with each entity. * Also, for updating ACLs for entities, we plan to provide an API or an admin call which would go over the tables and write out the ACLs again. I will think over this a bit more and discuss with others and get back soon. > Support ACLs in ATSv2 > --------------------- > > Key: YARN-3895 > URL: https://issues.apache.org/jira/browse/YARN-3895 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Affects Versions: YARN-2928 > Reporter: Varun Saxena > Assignee: Varun Saxena > Priority: Major > Labels: YARN-5355 > > This JIRA is to keep track of authorization support design discussions for > both readers and collectors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org