[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341427#comment-16341427
 ] 

Vrushali C commented on YARN-3895:
----------------------------------

 Hi [~jlowe]  [~jeagles]

I discussed with [~lohit] once again this morning.  Based on the scale of 
domain ids, I wanted to revise the storage design. We now propose to have a 
domain table, the row key being domain id and there will be two columns one for 
users and another for groups.  And for created time and other things that exist 
in the TimelineDomain object.

So at read time, just like ATSv1 does, first get all the entities satisfying 
the query criteria, then look for domain ids. And for each domain id in the 
response, check the domain table if the user/group has permissions.

For wildcard of ‘*’, no check is necessary, since it means all users and groups 
have permissions?

Similarly if the querying user is an admin, no check is done.  Also, all this 
is not executed in non-secure mode.

This will work functionally correctly but this is going to be a bit slow 
depending on the number of domain ids found in the entity response set. If 
there is only one domain id, then only one more get request to hbase. With each 
additional domain id, the query response time will increase slightly. We can 
batch the gets to domain table but even so, it will be a few seconds tending to 
minutes depending on number of calls needed, since multiple calls to hbase 
translate to multiple hdfs calls. 

I have been scratching my head on this read performance. The only other option 
I see is, that the collector keeps the domain id  & user/groups info in memory 
and writes it out with each entity. That way we end up with a denormalized 
dataset and read queries will be as fast as they can get with hbase. The domain 
table will still exist and the collector can read from it if it happens to go 
down and comes back up.

Which way do you think might end up working better for applications like Tez?

Storage scalability wise, I think either of the two options would be fine with 
hbase.  And the expiration / TTL can be set in either case as well. And as 
such, for optimizing read / write performance, we can pre-split the domain 
table and try to balance the row keys to ensure that they go to different 
Region Servers so we don’t end up hot-spotting one single RS for reads and 
writes of currently running applications.

thanks

Vrushali

> Support ACLs in ATSv2
> ---------------------
>
>                 Key: YARN-3895
>                 URL: https://issues.apache.org/jira/browse/YARN-3895
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>            Priority: Major
>              Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to