[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-05-17 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479454#comment-16479454
 ] 

Vrushali C commented on YARN-3895:
--

To recap some of the discussion in the weekly call today between 
[~rohithsharma] [~haibochen] and me:

- For Application level data, Application ACLS are to be used for read 
authorization 
- For System entities like container events, Application ACLs to be used for 
read authorization
- For User entities, timeline domain information to be used for read 
authorization





> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Vrushali C
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-02-07 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355850#comment-16355850
 ] 

Vrushali C commented on YARN-3895:
--

Thanks [~jlowe] !

 
{quote}I am a bit confused about the application_id column for a domain table 
entry. What if the domain doesn't apply to the entire application (i.e.: just 
to one DAG within a multi-DAG app) 
{quote}
Ah yes, good catch. The app id should be excluded from the domain table.

I had added it in thinking it would be easier to know which app ids / entities 
would need to be updated in case of updates to a particular domain.  But I 
think we do not *absolutely need* the app ids there; we could do a table scan 
to update the info in entities. I will think over a bit more about this case 
when we have to update the domain info, but I think we can presently move ahead 
along the lines of thought that updates to a domain is an infrequent occurrence 
that does not require a super fast response time. 

 

 

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Vrushali C
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-02-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353974#comment-16353974
 ] 

Jason Lowe commented on YARN-3895:
--

Thanks for the detailed writeup, Vrushali!

I am a bit confused about the application_id column for a domain table entry.  
What if the domain doesn't apply to the entire application (i.e.: just to one 
DAG within a multi-DAG app) or what if a domain applies to more than one 
application?  Tez does not use a domain across applications, but I'm curious if 
this design will preclude a domain crossing applications (e.g.: a domain is 
setup for an entire Oozie flow, and all applications in that flow use that one 
domain for ACLs).


> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Vrushali C
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-02-05 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353115#comment-16353115
 ] 

Vrushali C commented on YARN-3895:
--

 

Here is the design after several rounds of discussions in the community. Thanks 
[~jlowe] , [~jrottinghuis] [~lohit] for discussing with us (me, [~rohithsharma] 
and [~varun_saxena]). 

- We will go with the domain concept as in ATSv1. Entities will be written with 
a TimelineDomain (like in ATSv1) and there will be putDomain calls just like 
ATSv1. 

- The domain information will be persisted to the backend in a  domain table. 

- The domain information will also be retained in the TimelineCollector. This 
now makes the Timeline Collector stateful.

- If a timeline collector goes down (for whatever reason) and comes back up, it 
knows which app ids it had in memory. The collector will in this specific case, 
“refresh” it’s ACLs state by reading back from HBase, the domain ids for those 
app ids. 

- Each time an entity is received by the collector, it looks up the app id + 
domain id in it’s memory and appends the TimelineDomain to entity. 

- The entity when written to HBase has not only the domain id but also the 
Timeline Domain information.

- Thus, each row in HBase will have the ACLs info which can be used for 
filtering at read time.

- When a read request comes in, the user and user’s group will be sent to the 
HBase cluster in the scan/get request and a check will be performed on the 
region server to determine if this user is allowed to read that entity or not 
based on the user & group membership. 

- Since we want to evaluate group of group memberships, this check will be a 
UserGroupInformation check just like it’s done in any other yarn ACL 
evaluation. This implies, the yarn cluster AND the HBase cluster have to have 
the same username & group ldap mappings so that evaluation checks will work as 
expected.

- I believe this would be done within a coprocessor but I will check if there 
is any other way to run java code as part of scan column value filter 
operation. 

- If the querying user is an yarn admin, then no checks are necessary. 

- In case the ACLs for a domain ids need to be updated, that will mean scanning 
through the set of entities for that application id and updating the domain 
information for those. 

- The domain table will have domain id as row key and other fields in the 
TimelineDomain object as columns. Perhaps only one column family is fine.

Details per table in HBase:

- Domain table schema

Rowkey : domain id

ColumnFamily: i (stands for info)

Columns: (listing a few here, there can be others)

- application_id

- created time

- description

- modified time

- owner

- readers

- writers (not used but can be stored for completeness) 

 

We can consider setting compression for this table at a high level, since we do 
not anticipate reading frequently from this table. 

- Entity table, SubApplication table, Application table. can store the domain 
id as a column and the fields in the domain object as separate columns. 

- FlowRun table. We can start with doing a union of ACLs for all applications 
within a flow run. 

- FlowActivity table. We can start by doing a union of ACLs for all runs in a 
flow in that time frame. This may turn out to a bit more involved. Let’s 
discuss on the jira we file for this. 

thanks

Vrushali

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Vrushali C
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-26 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341427#comment-16341427
 ] 

Vrushali C commented on YARN-3895:
--

 Hi [~jlowe]  [~jeagles]

I discussed with [~lohit] once again this morning.  Based on the scale of 
domain ids, I wanted to revise the storage design. We now propose to have a 
domain table, the row key being domain id and there will be two columns one for 
users and another for groups.  And for created time and other things that exist 
in the TimelineDomain object.

So at read time, just like ATSv1 does, first get all the entities satisfying 
the query criteria, then look for domain ids. And for each domain id in the 
response, check the domain table if the user/group has permissions.

For wildcard of ‘*’, no check is necessary, since it means all users and groups 
have permissions?

Similarly if the querying user is an admin, no check is done.  Also, all this 
is not executed in non-secure mode.

This will work functionally correctly but this is going to be a bit slow 
depending on the number of domain ids found in the entity response set. If 
there is only one domain id, then only one more get request to hbase. With each 
additional domain id, the query response time will increase slightly. We can 
batch the gets to domain table but even so, it will be a few seconds tending to 
minutes depending on number of calls needed, since multiple calls to hbase 
translate to multiple hdfs calls. 

I have been scratching my head on this read performance. The only other option 
I see is, that the collector keeps the domain id  & user/groups info in memory 
and writes it out with each entity. That way we end up with a denormalized 
dataset and read queries will be as fast as they can get with hbase. The domain 
table will still exist and the collector can read from it if it happens to go 
down and comes back up.

Which way do you think might end up working better for applications like Tez?

Storage scalability wise, I think either of the two options would be fine with 
hbase.  And the expiration / TTL can be set in either case as well. And as 
such, for optimizing read / write performance, we can pre-split the domain 
table and try to balance the row keys to ensure that they go to different 
Region Servers so we don’t end up hot-spotting one single RS for reads and 
writes of currently running applications.

thanks

Vrushali

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-26 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341310#comment-16341310
 ] 

Vrushali C commented on YARN-3895:
--

I see, thanks [~jlowe] and [~jeagles] . I was earlier under the beliefe that 
the list of domain ids may not be that big. But if it's approximately one per 
dag or one app, then that is a lot of domain ids and putting so many in one 
hbase cell value is not going to work well. 

Let me rethink the backend storage layout for domain ids in this case. 

One question, in a common scenario, do you have the user (the doAs DAG user) 
who wrote the entity be the one to query it? Or is another user in that group 
query for the entity a more common occurence? Put another way, is the writer 
user frequently the same as the reader user?

If so, perhaps querying the groups_domain table for domain id can be deferred 
to after we get the entities.

It looks like getting the domain id from the entity and checking for that 
domain id for the user / group is perhaps a better way to query data.

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



***UNCHECKED*** [jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-26 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341258#comment-16341258
 ] 

Jason Lowe commented on YARN-3895:
--

After chatting about this with [~jeagles] offline, we think the proposal could 
work well but only if the number of domains remains small.  The only way we see 
that happening is if domains are de-duplicated if they reference an equivalent 
set of ACLs so the total number of domains remains small.  It's not clear yet 
how this de-duplication would occur, especially if the write path can never do 
reads and if domains are allowed to be updated asynchronously (e.g.: admin 
wants to add another user to an existing domain).

Wildcard ACLs could be solved by treating every user as being in the '*' group 
as well and always adding every de-duplicated domain ID to the '*' group when 
created.

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-26 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341217#comment-16341217
 ] 

Jason Lowe commented on YARN-3895:
--

bq. How many domains would there be?

I would expect application frameworks to use domains just as they use ATS v1 
domains today.  That means one domain per application (or sub-DAG if they are 
switching ACLs per DAG like the server-user-on-behalf-of-multiple-users case).  
So there are going to be a lot of them.  I suspect frameworks are just going to 
create a new domain for their specific ACLs rather than searching for an 
existing domain that matches their ACL needs.  That also avoids the problem of 
someone later updating the reused domain thinking they were just updating the 
original app ACLs and inadvertently changed the ACLs of newer apps that reused. 
 That may or may not be desired.  A 1-to-1 mapping of domain per app (or 
sub-DAG) is a natural fit to the granularity of ACL control on the YARN side.

bq. Gets back a list of domain ids this group has permissions for. This may be 
pretty big?

Yeah, this result is going to be huge in practice.  Also how would wildcard 
ACLs in a domain be supported, or are they not allowed?

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-25 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340325#comment-16340325
 ] 

Vrushali C commented on YARN-3895:
--

Hi [~jlowe]

We discussed this between [~rohithsharma], [~lohit],  [~varun_saxena] and I. It 
basically comes down to whether we want to take a performance hit at read time 
or write time. Given that writing out extra details at write time seems like 
the  worse option when running at scale, we thought of taking the approach 
which may be a slight hit on the read path but has some optimizations.

Here is our proposal. 

Extremely short summary:

We will go with the domain concept that comes with ATSv1. So each entity is 
written with a domain id. At read time, the check is made to ensure the 
querying user has permissions to read the data based on domain id.

  

Design Details:

Domain ID storage:

- domains are published by the AM, just as they are done in ATSv1.

- subsequent entity writes include the domain id per write, same as ATSv1.

- domain ids are written to two tables in hbase.

- one table is user_domain table and the other is groups_domain table.

- the user_domain table has the rowkey as cluster id + username and a column 
whose value is the list of domain ids for that user. 

- Similarly the groups_domain table  rowkey of cluster id + group name and a 
column whose value stores the list of domain ids for that group. 

So, for each user or group in the timeline domain object who is a reader or the 
owner, the domain id is added to that user's row in the user_domain or 
groups_domain table. The domain id is first written to the cell with tags. Now, 
there will be a coprocessor which checks if the domain id already exists in the 
value in the domain column. If yes, no-op, nothing to do. If the domain id does 
not already exist, meaning it is a new one, it will be appended to the value 
list.

- Expiration/ removal of domain ids.

If this list of domain ids has the potential to grow very big, we can consider 
storing a TTL for each domain id. We can store the TTLs per domain id in these 
user_domain and group_domain tables and have the coprocessor look at cleanup at 
the time of major compaction.

If the list of domain ids is small enough, expiration / TTL is not required to 
be implemented.  What do you think? How many domains would there be?

 

Read Query time:

We propose to have the reader api authorization to work in the following 
fashion.

- A read query for an entity comes in from a user.

- The timeline reader will create 3 threads and issue three parallel requests 
to hbase.

- One request is a Get from the user_domain table for this querying user. Gets 
back a list of domain ids this user has permissions for.

- Another request is a Get from the  groups_domain table for the group that 
this querying user belongs to. Gets back a list of domain ids this group has 
permissions for. This may be pretty big?

- Third request is to get the entities that are being asked for . 

Now, given the domain ids in the entity response, a check is made if the domain 
id exists in the user_domain response or the groups_domain response.

This  dataset is accordingly returned as the query response. I believe ATSv1 
does a get all entities and then queries the domain table to see if this domain 
id relates to that querying user. This model may not work efficiently in hbase 
in case of multiple domain ids, doing too many gets will make the timeline 
reader response slow.

But, as an additional api option, if the domain id is passed into the query, we 
can check for existence of that domain id directly in the user_domain or 
groups_domain table and proceed accordingly.

Also, if the user who is querying is an admin user, we can skip all the checks 
and just get the entities. And of course, if security is not enabled, no 
additional gets from user_domain and groups_domain table are required. 

What do you think of this approach? 

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-23 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336260#comment-16336260
 ] 

Vrushali C commented on YARN-3895:
--

Thanks [Jason 
Lowe|applewebdata://C29F7BBC-6971-4859-917C-995869395317/jira/secure/ViewProfile.jspa?name=jlowe]
 for the discussion. Let me summarize some points from our discussions so far.
 * Goal of jira: Design a way for authorization during reads of timeline 
entities 
 * Design objectives:
 * store data in denormalized fashion since hbase reads would work well with 
that. Avoid joins across tables
 * Write out ACLs as few times as possible. Ideally once per DAG (once per 
application)

Background:
 * ATSv1 / 1.5 does read authorization via domain ids. A domain id is published 
once per  DAG or once per application and all entities written with that domain 
id are authorized at read time accordingly. 

Current design proposal summary:
 * ATSv2 uses HBase and if we were to follow a design similar to ATSv1/1.5, 
then that would mean doing a join across two tables (domain/ACLs table and the 
entity table). This will not be ideal in terms of read performance. Correctness 
will not be an issue here, response latencies would be a concern.
 * To counteract the read latencies, one idea is to do reads from collector at 
write time. There are few things that might be a concern here. The collector 
would now open connections to more region servers to read from other tables. 
When running at scale, we would like the write path needs to be along the lines 
of “fire-and-forget” .  Doing reads from collector would likely causes high 
latencies during writes as well as increased network connections when running 
at scale for the yarn cluster as well as the HBase cluster.  Also, doing a read 
then write does not lower the size of data being sent from collector to region 
server.
 * There is another thought along the lines of caching the ACLs in the 
collector and attaching them to each entity while writing it out.  The ACLs 
would also be stored in an ACLs table. Now, in the case of collector going down 
and coming back up, it can do a read from the ACLs table for the applications 
it is collecting data from. This read is a one-off case when the collector goes 
down and comes back up. The ACLs are still stored in a  denormalized way with 
the entity and reads do not query this ACLs table.
 * This case still does not reduce the size of data being sent with each entity.
 * Also, for updating ACLs for entities, we plan to provide an API or an admin 
call which would go over the tables and write out the ACLs again.

I will think over this a bit more and discuss with others and get back soon.

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336059#comment-16336059
 ] 

Rohith Sharma K S commented on YARN-3895:
-

bq. If desired this could be changed from a read-time lookup to a write-time 
lookup
I vaguely remember that decision made for improving write performance is not to 
do additional look up to backend from collectors. 

bq. The collector could then cache these ACL IDs so very few writes would 
require a lookup.
This is one of the option we were discussing but currently fault tolerance for 
collectors are not there. IAC, NM restart will loose cached ACLs. To recover 
this, collectors need to read from back end which complexity increase from 
collectors. Currently collectors are write only module. May be only ACLs 
details can be stored in LocalFS and recovered. However, if NM node is lost new 
AM will be launched and new set of ACLs are written from AM. 

bq. what's the plan to update ACLs after the application completed?
We discussed this bit  and thought to introduce new REST end point in 
TimelineReader for update ACLs for completed applications. This could be 
performed only by TimelineReader admin. As far as Acls story is concerned we 
kept this as low priority.

bq. Isn't this essentially sending the ACLs on most posts? If we need to avoid 
HBase double lookups on reads then the ACL has to be in the entity row data, 
correct?
Its true that most of the time new entities like vertex, vertex-attempts are 
published. Keeping ACLs in row key in existing hbase tables such as 
entity_table or sub__application_table increases complexities of building row 
key at  write and read time. Currently these tables  has combination of 5-7 
keys. I would be lenient for double look up at read time than keeping ACLs 
details in row key. 





> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-23 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335849#comment-16335849
 ] 

Jason Lowe commented on YARN-3895:
--

bq. Yes, doing a lookup in two tables at read time (regular entity table and 
'domain' or 'ACLs'  table)  would be very slow in HBase.

If desired this could be changed from a read-time lookup to a write-time 
lookup.  In other words, the collector could be responsible for 
translating/expanding the ACL identifier into the actual ACLs when writing the 
row.  The collector could then cache these ACL IDs so very few writes would 
require a lookup.  It is _very_ likely that the ACL ID isn't changing between 
entity posts.  This would mean that ACLs could not be easily updated once 
specified, as all existing rows would need to be updated, but that's going to 
be true even if we don't have a domain/ACL ID for indirection on writes given 
the proposal to replicate it on each entity row.

bq. How much big would be ACL's size?

ACLs aren't going to be hundreds of kilobytes, but it could get larger than 
what is typical if it is an explicit list of many users and/or groups.  That's 
one of the reasons ATS v1 made this indirect via domains, so ACLs are only sent 
once per DAG and a very small bit of info for each post ties the entity to its 
corresponding ACL.

Also, as alluded to above, what's the plan to update ACLs after the application 
completed?  I assume this would have to be a full rewrite of every ACL column 
on every entity posted by the application.  I don't expect that to be a common 
occurrence, but will it be supported or only via HBase admin intervention to 
doctor the database?

bq. The ACLs details need to sent one time per entity-id. ACLs object will 
contains only reader details which is similar to TimelineDomain#reader field. 
Any update for entity-id need not to send acls details again.

Isn't this essentially sending the ACLs on most posts?  If we need to avoid 
HBase double lookups on reads then the ACL has to be in the entity row data, 
correct?  For Tez I believe a large chunk of the posting is going to be new 
entities and not updates to existing ones.  An application like Tez will end up 
sending full ACLs on about 50% of its posts.  (I think most entities have just 
a start event and a stop event.)


> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-22 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335409#comment-16335409
 ] 

Rohith Sharma K S commented on YARN-3895:
-

{quote}I could see cases where the ACLs are not that small, potentially larger 
than an average entity.
{quote}
I am wondering when this could happen? How much big would be ACL's size?. This 
can be limited to some bytes per entity. In similar situation in MR/Native 
service AM while publishing AM configurations, we restrict per entity object to 
configurable bytes. I think clients should use this methods while publishing 
entities.
{quote}Just as that would be cumbersome to store per cell, it would be 
cumbersome to build and parse per entity.
{quote}
The ACLs details need to sent one time per entity-id. ACLs object will contains 
only reader details which is similar to TimelineDomain#reader field. Any update 
for entity-id need not to send acls details again. Further, any update on ACLs 
can send later as well but either older one will be overwritten or can be 
appended which we can decide it. Sending ACLs info one time per entity-id makes 
bigger impact? From Hbase, storing it in column shouldn't be much issue. 
Benefit we get is we can apply substring filter when acls is enabled in Hbase.

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-22 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335102#comment-16335102
 ] 

Vrushali C commented on YARN-3895:
--

Thanks for your response [~jlowe].
{quote}Having a tidy ID to reference a set of ACLs would eliminate this 
concern, but it would add some necessary indirection lookups on the reader 
side. 
{quote}
Yes, doing a lookup in two tables at read time (regular entity table and 
'domain' or 'ACLs'  table)  would be very slow in HBase. Hence we wanted to 
denormalize it and store per entity. 

But I understand the point about it being too cumbersome while creating each 
entity. I will think over this a bit more and also discuss with 
[~rohithsharma]and [~varun_saxena] and get back on this. 

 

 

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334907#comment-16334907
 ] 

Jason Lowe commented on YARN-3895:
--

{quote}
So, we store these kind of doAs query related entities in a table called subApp 
table. The rowkey in this table contains both the subAppUserId as well as the 
AM user ID. Although we do not check if the AM is allowed to write as some 
user, the entity for this pair of {AM user, subAppUser ID } will be in it’s own 
row. The row key also has the cluster id, entity type, entity id and entity id 
prefix.
{quote}
I think that's reasonable.  One user shouldn't be able to doctor the data of 
another, and it sounds like this will prevent that.

bq. With every timeline entity, we propose to have a TimelineEntityACLs object 
inside it.
My concern here would be the size of the TimelineEntityACLs object.  If that 
object is itself fully definitive of the ACLs without referencing some other 
authoritative object (i.e.: like the domain IDs did in ATS 1), then I could see 
cases where the ACLs are not that small, potentially larger than an average 
entity.  Just as that would be cumbersome to store per cell, it would be 
cumbersome to build and parse per entity.  Having a tidy ID to reference a set 
of ACLs would eliminate this concern, but it would add some necessary 
indirection lookups on the reader side.


> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-22 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334386#comment-16334386
 ] 

Rohith Sharma K S commented on YARN-3895:
-

Hi [~jlowe], does this approach looks reasonable? 

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-18 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331223#comment-16331223
 ] 

Vrushali C commented on YARN-3895:
--

Hi [~jlowe]

 We ([~varun_saxena] [~rohithsharma] and I) had a discussion around these 
points and wanted to share our thoughts. 
{quote} bq. How does the collector authenticate that the AM is allowed to proxy 
as that user, or can any AM forge data as other users simply by stating the 
data is from so-and-so?
{quote}
 So, we store these kind of doAs query related entities in a table called 
subApp table. The rowkey in this table contains both the subAppUserId as well 
as the AM user ID. Although we do not check if the AM is allowed to write as 
some user, the entity for this pair of \{AM user, subAppUser ID } will be in 
it’s own row. The row key also has the cluster id, entity type, entity id and 
entity id prefix.

Sub App Row key format:
{code:java}
{subAppUserId!clusterId!entityType!entityPrefix!entityId!userId}{code}
 Therefore, although a rogue AM could write a lot of data as other doAs users, 
it would still go to it’s own rows.
{quote}bq. It's less clear to me how this is going to work for the case of an 
AM running as one user but working on behalf of multiple other users across 
multiple sub-apps. The YARN application only has one set of ACLs, set when it 
is submitted by the service user.
{quote}
So in the case of AM running as one user and executing doAs queries, we are now 
thinking of the following enhancement to earlier proposal:

- With every timeline entity, we propose to have a TimelineEntityACLs object 
inside it. (This TimelineEntityACLs does not exist yet in the current code.) 

- The AM can populate this TimelineEntityACLs object with the ApplicationACLs 
and it will be part of the TimelineEntity it is writing. When there exist 
additional DAG ACLs as in the case of doAs queries, they can also be added to 
TimelineEntityACLs of that entity.

- In this way, each timeline entity can have it’s allowed users/groups.

At the backend, we think we can store these allowed users and allowed groups as 
column values in the tables per entity and at query time, we can confirm if the 
user making the query is part of allowed users list or is in a group that is 
part of allowed groups list.

We started thinking of storing it in columns rather than cell tags since the 
ACLs would be too much info to store for each cell. Since each timeline entity 
is in it’s own row, each row having columns for allowed users and allowed 
groups should work per entity.

Would appreciate your feedback on this updated approach.

 

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Major
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-12 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324450#comment-16324450
 ] 

Vrushali C commented on YARN-3895:
--

Thanks [~jlowe] , I will think over this and get back.

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324140#comment-16324140
 ] 

Jason Lowe commented on YARN-3895:
--

I think Application ACLs could work fine for the straightforward case of a user 
running their own app.  As you mentioned, it already reflects how YARN handles 
the ACLs for the AHS and log server today.

It's less clear to me how this is going to work for the case of an AM running 
as one user but working on behalf of multiple other users across multiple 
sub-apps.  The YARN application only has one set of ACLs, set when it is 
submitted by the service user.  Those permissions are going to be restricted to 
just the service user, most likely.  Then the service user runs a sub-app 
(e.g.: a DAG) on behalf of another user.  In that case the ACLs may need to 
change (e.g.: be permissive to more groups, etc.).  The YARN app ACL isn't 
changing at this point, it was set at time of submit, so how does the AM inform 
the collector of the ACL change?  Similarly, even if the AM wrapped some of its 
execution in a doAs for the other user, how does the collector know the user 
has changed?  Did the AM somehow disconnect and reconnect to the collector?  
How does the collector authenticate that the AM is allowed to proxy as that 
user, or can any AM forge data as other users simply by stating the data is 
from so-and-so?

I'm not that familiar with HBase, but it looks like the ACLs are per cell and 
then it seems pretty straightforward how ACLs could change across sub-apps and 
implement the proper restrictions on the read path.  It's the write path in the 
multiple-sub-apps-for-multiple-users-by-one-service-user case that I'm not 
seeing how the security works.  If we're basing it on the YARN app ACL, that 
isn't changing across sub-apps but in many cases will need to do so.



> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-11 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323324#comment-16323324
 ] 

Vrushali C commented on YARN-3895:
--

We had a discussion today and wanted to summarize some points (most might be 
repeated from conversations above):

- we will use Application ACLs for getting the user & group information while 
writing the entities.
- this will be stored in hbase within each cell as part of it's cell tags
- each time a query for reading this data comes in, we will use the user ACLs 
at the hbase region server in a coprocessor to determine if the user is allowed 
to read this data or not. 
- admin users are always allowed to read all data
- this would imply coprocessors on each table

[~jlowe] what do you think about this approach for read side authorization? 

This does not make use of any domain concept (as in v1.5). This is along the 
lines of security in yarn via ACLs. 

This should also work in the case of AM running as one user but executing DAGs 
as other users. The callerUGI during the write entity in such situations will 
have both users (AM user and doAs user) and we will store both. So, at ready 
time, query by AM user as well as the doAs user will be allowed for this data. 
Also any other user who is part of that group should be able read it. 

At the backend side, there is the thing about storing this info per cell in 
hbase. It is a lot of repeated information.  IIUC, hbase security and 
visibility labels work with the same logic but in that case, hbase admin 
commands are used to grant permissions to specific hbase users/labels.  I will 
think over if we can optimize how many times this is stored per Column Family. 


> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2018-01-10 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321429#comment-16321429
 ] 

Vrushali C commented on YARN-3895:
--

Hello [~rohithsharma] [~varun_saxena] [~haibo.chen]

I was thinking a little bit about ACLs and read side authorization. I have some 
thoughts and wanted to share them. Everything is not fully hashed out perfectly 
but I think this might work. 

When the data is written, at that time, we can use hbase cell tags to store the 
allowed users as well as groups. Just like we are storing things right now for 
flow run, we will do the same for entities and applications & subapps. 

While querying, we can pass in the querying user/group info via “Attributes” in 
the Get/Scan. This can be accessed in the coprocessor via “getAttributes” of 
the Get/Scan. Then the coprocessor checks if current user who is querying is 
equal to allowed user or if the current group is part of allowed groups list in 
the cell tags.

We can default to read allowed for all if no tags are present. Also, we could 
indicate that the user who is querying is a yarn_admin user, so allow all 
reads.  

This should work for all our regular tables like entity, application as well as 
sub-application. 

For sub app table, we store AM user as well as do-As user (and their groups) in 
the cell tags. So at query time, we can see if the querying user is one of AM 
user or doAs user. That way we protect the data from other users even if they 
run with the same AM user. 

For the flow run table, we can perhaps do a union or something across all 
entries. I am still thinking over it. 

Here is an old thread in the hbase-users mailing list in which James Taylor 
from Phoenix has also mentioned that Phoenix is (or at least was) doing the 
same thing 
https://mail-archives.apache.org/mod_mbox/hbase-user/201302.mbox/browser

We can later check with the HBase folks if this much extra data in the cell 
tags could be a concern but my gut feeling is that it’s not. Cell tags are used 
by hbase security as well as Phoenix for passing around information and making 
decisions at server side.



> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2017-12-20 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16299641#comment-16299641
 ] 

Vrushali C commented on YARN-3895:
--

We can consider application ACLs in the submission context. These ACLs will be 
at application level (not applicable for offline collectors).

We can allow all writes but only allowed readers will be able to read. Since 
only authorized users can write. 

Let us try to target 3.1 for this. 

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2017-09-14 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167358#comment-16167358
 ] 

Varun Saxena commented on YARN-3895:


Yeah we can think about doing this by 3.1
Lets finalize design in a week or two.

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2017-09-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167342#comment-16167342
 ] 

Rohith Sharma K S commented on YARN-3895:
-

I just got update from Vinod that 3.1 release plan is Dec end or Jan mid! 
Discussion thread is [3.1 
ReleaseDiscussion|https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg27705.html]
 and [3.1 
ReleaseWiki|https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates]
 
Considering targeting for 3.1 release, we have 2.5 months to code freeze. Its 
good time to start of it!

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2017-09-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167326#comment-16167326
 ] 

Rohith Sharma K S commented on YARN-3895:
-

thanks [~varun_saxena] for summarizing the discussions. I could not able to 
attend the call :-(
bq. Most importantly, considering GA would be released on Nov 1, we would need 
to get this in by 15th October. Do we have enough time? This is more like an 
additional feature. Or delay it till 3.1?
This detailed discussion helped us to estimate minimum effort required to 
complete this feature. We might not be able to make out for GA as it is very 
nearer hardly we get 15 days!. But as you all know that 3.1 release is planned 
somewhere mid Mar/April, 2018. At least we need to target for it. If we can 
start brain storming on it from now and put up a documentation, we get 
sufficient time and end up in feasible solution. Request to keep up this 
momentum as-is and will discuss it on weekly calls. [~varun_saxena] shall we 
create shared documentation to add brain stormed details? 


> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2017-09-14 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166816#comment-16166816
 ] 

Varun Saxena commented on YARN-3895:


Points to consider including what we discussed on the call.

# Other than read ACLs'. I think we need to have ACLs' restricting modifying an 
entity as well i.e. on the write side. Otherwise we may allow some other client 
to modify an entity it does not own and change its read ACLs'.
# We can use the application ACLs' passed in AM launch context(available in NM) 
and store it in App collector. Will have to pass this info when collector runs 
outside NM. If ACLs' are not provided during entity publish, we can 
automatically use these app ACLs' as ACLs' for entities. So for MR kind of use 
cases, application ACLs' might suffice while for Tez DAGs' in Hive LLAP use 
case, entities within an application may have different ACLs' which can be 
specified during entity publish. 
# If we store ACLs' with each entity, storage size would increase because of 
repetition of ACLs'. Should we store some, possibly short ID? Who will generate 
a unique ID in the cluster?
# The suggestion above is along the lines of TimelineDomain in ATSv1. Refer to 
[link|https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/yarn/api/records/timeline/TimelineDomain.html].
 Domain encapsulates set of reader and writer ACLs' and follows the same format 
as other YARN ACLs' i.e. if a user belongs to a group and group is within the 
same domain to which entity belongs to, we will have access to the entity. 
Domain is like a group of user groups and users.
# We can have domain or ACL table in HBase with id as row key. Domains should 
be created beforehand i.e. before publishing entities.
# Domain ID can be used as ACL ID but as I said above it will be responsibility 
of client to generate a unique ID and then use it consistently while publishing 
entities.
# We should consider caching these ACLs' otherwise querying domain table every 
time might be suboptimal.
# How to decide flow run/flow level ACLs'? Union of app ACLs' maybe. This needs 
some thought.  Typically all apps within a flow should have same ACL though.
# A point brought up by Joep. For federation use case, some special handling 
required when containers run across clusters? Not sure. 

Most importantly, considering GA would be released on Nov 1, we would need to 
get this in by 15th October. Do we have enough time? This is more like an 
additional feature. Or delay it till 3.1?

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2017-09-14 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166151#comment-16166151
 ] 

Varun Saxena commented on YARN-3895:


Actually, we were thinking of including this in 3.1.0. If we do need it in 3.1, 
let's brainstorm about the approach ASAP.
We would need something like ACL groups for sure. That would make it easy to 
specify ACLs'. A certain kind of entities would typically have same kind of 
user access, in addition to default access given to application owner, query 
executor, etc.

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2017-09-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166135#comment-16166135
 ] 

Rohith Sharma K S commented on YARN-3895:
-

I think we should target this for GA! Though YARN-6820 provides basic 
whitelisting users for read access, it is not full solution.  
Request folks to put up your approaches for discussions!

Primarily I can think of couple of approaches which need to be discus 
complexities in detail!
#  User can submit acls during submission of application only which is 
currently supported for application. The same acls can be stored under 
application table which can be referred while reading entities. These acls 
belong to per application entities. All the entities under application have 
these acls. This approach works well for flow model but not for Tez kind of 
model. 
#  How about accepting ACLs via TimelineEntity itself.? Each entity has ACLS 
who should be read! Note that acls is for reading data only.  
#  At last, atsv2 can also have group concept where in each group of entities 
has their own acls. To to this way, probably let introduce new API that accept 
acls per group to store acls at back end. The concern is how are we going to 
store at back end? What should be the row key for new table!!
cc :/ [~jlowe] [~vrushalic] [~varun_saxena] [~jianhe] [~vinodkv] 
[~jrottinghuis] [~haibo.chen]

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3895) Support ACLs in ATSv2

2017-07-13 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086424#comment-16086424
 ] 

Vrushali C commented on YARN-3895:
--

Filed YARN-6820 for adding in a basic read size restriction. 

> Support ACLs in ATSv2
> -
>
> Key: YARN-3895
> URL: https://issues.apache.org/jira/browse/YARN-3895
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: YARN-5355
>
> This JIRA is to keep track of authorization support design discussions for 
> both readers and collectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org