[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-04-25 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642387#comment-13642387
 ] 

Bikas Saha commented on YARN-613:
-

Looks related to YARN-617 but not a duplicate. [~vinodkv] Can you please check?

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-04-25 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642435#comment-13642435
 ] 

Vinod Kumar Vavilapalli commented on YARN-613:
--

bq. I wanted to do it all together at YARN-571, but in retrospect, I think we 
should keep it separate.
Apologies, I meant YARN-617.

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Vinod Kumar Vavilapalli
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-04-29 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644809#comment-13644809
 ] 

Daryn Sharp commented on YARN-613:
--

Question: How do you plan for NMs to authenticate the AM tokens?

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Vinod Kumar Vavilapalli
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644816#comment-13644816
 ] 

Vinod Kumar Vavilapalli commented on YARN-613:
--

bq. Question: How do you plan for NMs to authenticate the AM tokens?
I thought I covered it but missed stating that - RM will share the underlying 
secret key corresponding to AM tokens as part of node-registration just like 
the one corresponding to ContainerTokens.

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Vinod Kumar Vavilapalli
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-04-30 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645852#comment-13645852
 ] 

Daryn Sharp commented on YARN-613:
--

I assumed that was the implementation.  Does a global AM secret degrade the 
security of yarn by allowing one rogue node to begin fabricating tokens?

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Vinod Kumar Vavilapalli
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-04-30 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645859#comment-13645859
 ] 

Daryn Sharp commented on YARN-613:
--

How about startContainer passes the app token that will be used to later 
authorize stopContainer/getContainerStatus?

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Vinod Kumar Vavilapalli
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-04-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645890#comment-13645890
 ] 

Vinod Kumar Vavilapalli commented on YARN-613:
--

bq. I assumed that was the implementation. Does a global AM secret degrade the 
security of yarn by allowing one rogue node to begin fabricating tokens?
NMs are trusted. They are kerberos authenticated, and we also have the service 
level authorization to enforce only some principals. Is that not enough?

The better argument perhaps is crunching through a lot of AMTokens to figure 
out the key, but we rollover keys every so often to avoid that case.

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Vinod Kumar Vavilapalli
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646558#comment-13646558
 ] 

Daryn Sharp commented on YARN-613:
--

I just have general concerns with assuming the entire hadoop environment is 
trusted and thus introducing weaknesses at a global level .  Ex. A weakness is 
introduced every time one entity shares a secret to validate a token created by 
another entity.  Compromising one of hundreds or thousands of node shouldn't 
put the entire cluster at risk.  If I can gain access to one NM host and its 
keytab, I believe I can secretly launch a malicious NM?  NMs currently share a 
global key container token secrets, but there is a jira to move to per-NM 
secrets so sharing a global AM secret would be another step backwards.

Exploring alternate avenues to avoid global trust, is passing the allowed am 
token allowed to get status and stop the container with the launch request not 
feasible?

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Vinod Kumar Vavilapalli
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-01 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647263#comment-13647263
 ] 

Vinod Kumar Vavilapalli commented on YARN-613:
--

bq. I just have general concerns with assuming the entire hadoop environment is 
trusted and thus introducing weaknesses at a global level . Ex. A weakness is 
introduced every time one entity shares a secret to validate a token created by 
another entity. Compromising one of hundreds or thousands of node shouldn't put 
the entire cluster at risk.
Agree with you in general. Read on.

bq. If I can gain access to one NM host and its keytab, I believe I can 
secretly launch a malicious NM?
That is true in general.  And I am not sure how we can even contain such a 
break-in. I suppose going the way of DataNode to start the server on privileged 
ports will contain it [1]. If one can get hold of the keytab(owned by YARN 
user), I suppose at that point he can launch the container-executor binary too, 
which will give him root access. So it's all predicated on secure setup to not 
do stupid things.

bq. NMs currently share a global key container token secrets, but there is a 
jira to move to per-NM secrets so sharing a global AM secret would be another 
step backwards.
Agreed.

bq. Exploring alternate avenues to avoid global trust, is passing the allowed 
am token allowed to get status and stop the container with the launch request 
not feasible?
May be it isn't clear in my proposal, but let me state it again anyways, mostly 
repeating what I just commented about on YARN-617.
 - Having the authentication via container-token is forcing us to create a 
connection per-container.
 - MR's ContainerLauncher for example resorts to tricks like creating lots of 
threads, opening and closing connections immediately to avoid hitting ulimits 
etc.
 - Most of that ugliness will go away if we perform all authentication using 
AMTokens for *all* AM-NM APIs and use ContainerTokens for authorization of 
startContainer() requests.

May be we should just do [1] above (previleged ports).

To sum it up, I am open to suggestions. My fundamental requirements are:
 - If possible, AMs should open only one connection - secure one - to each NM. 
Not one per container
 - All connections (all APIs) between AM and NM should be authenticated - 
DIGEST based at best here - and if possible without AMs having to latch on to 
things like ContainerTokens for potentially long periods.

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Vinod Kumar Vavilapalli
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-07 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651427#comment-13651427
 ] 

Vinod Kumar Vavilapalli commented on YARN-613:
--

Side note: When we do this, to use AMTokens as single sign on, 
NMContainerTokenSecretManager.containerIdToKeysMapForThisApp needs to be 
removed.

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Vinod Kumar Vavilapalli
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-10 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655014#comment-13655014
 ] 

Daryn Sharp commented on YARN-613:
--

bq. That is true in general. And I am not sure how we can even contain such a 
break-in. I suppose going the way of DataNode to start the server on privileged 
ports will contain it [1]. If one can get hold of the keytab(owned by YARN 
user) I suppose at that point he can launch the container-executor binary too, 
which will give him root access. So it's all predicated on secure setup to not 
do stupid things

Secure ports would help a bit, but it's another pain point to compensate for 
weakened security.

A keytab might be leaked due to weak permissions, or maybe it's not even the 
keytab in the "official" path, but a copy a SE left sitting in their home dir.  
So I might or might not be the yarn user with my stolen keytab.  Assuming you 
are the yarn user, I'm almost positive you can't get root with the container 
executor - I last looked it had a hardcoded check to reject root.

The main concern I have is any NM will have the power to forge AM tokens for 
all other NMs.  As the number of nodes scales in a cluster, its vulnerability 
increases.  All I have to do is compromise 1 node out of thousands.  I can then 
forge AM and container tokens, and launch jobs on the thousands of other nodes 
as any arbitrary user so I can compromise those hosts too.  I steal those 
users' appTokens from running jobs and now I have access to hdfs and other 
services.  Game over.

So here's how I think we can achieve both our goals:

A node token.  When the RM returns container tokens, it also provides node 
tokens.  The node token is for authentication, the container token authorizes 
the launch request.  Now you can have one AM->NM connection.  You can then 
decide if you want status and stop operations to authenticate and/or authorize 
via other tokens like AM tokens.  If so, pass those tokens in the launch 
request.  Now you've explicitly informed the NM of permitted (AM) tokens, 
instead of giving the NM the power to fabricate other (AM) tokens.

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655072#comment-13655072
 ] 

Vinod Kumar Vavilapalli commented on YARN-613:
--

bq. A keytab might be leaked due to weak permissions, or maybe it's not even 
the keytab in the "official" path, but a copy a SE left sitting in their home 
dir. So I might or might not be the yarn user with my stolen keytab. Assuming 
you are the yarn user, I'm almost positive you can't get root with the 
container executor - I last looked it had a hardcoded check to reject root.
My example of container-executor is wrong I agree. But my intention was in the 
lines of - if some one steals a NM keytab, game is over anyways. Other examples 
(at the least one correct one this time I hope) - Someone steals NM keytab, 
they can do so many things:
 - Starts a custom NM which advertises infinite resources with the RM and keep 
heartbeating often and gobble up all containers
 - Act sane, gobble up some containers, get some containers, get app-ids, guess 
and construct new container-ids and send false reports about other containers 
of the app which are running on other nodes
 - Just keep heartbeating in a loop and bring down RM
You get the idea. Arguably there are minor checks we can do for the last two, 
but not the first one. It is unsolvable in general.

Now coming to your specific solution. It looks like a good idea but needs minor 
clarifications/extensions. Let see if I got what you are proposing:
 - You have 1 NMToken per NM for the whole cluster and all AMs get the same 
NMToken for a given node.
 - Authorization for startContainer is via ContainerTokens
 - Authorization for stopContainer is via AMToken.

Right? That works for startContainer() but won't for stopContainer().

 - startContainer() is good: You use NMToken to authenticate to a node but can 
only start-containers if you have a valid ContainerToken

 - stopContainer() needs a little more help: Again authentication with NMToken 
is good. But we can't just rely on NMToken to allow access to an AM to stop a 
container. Let's start with a simple authz with no acls - AMs can only kill the 
containers that they own. To do this, NM needs to check what the APPID is for 
this App and then allow access for the corresponding containers. Now in order 
to avoid AMs faking AppIds, the NMTokenIdentifier should have NodeId, AppId, 
and may be also user-name for doing more complex app-acls. If we do that, when 
an NM get a stopContainer, it get the user-name and appid and can perform the 
necessary authorization.

So in sum, yes, NMToken sounds a good idea, but we need it to have per AM 
information.

Given above, we should perhaps call this AMNMToken and rename the current 
AMToken to be AMRMToken.

Does that sound good?

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-13 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13656029#comment-13656029
 ] 

Daryn Sharp commented on YARN-613:
--

Agreed that a rogue NM is impossible to prevent entirely, but we can contain 
the damage a NM can do other than DOS attacks.  On that tangent, it would 
probably be a good idea to automatically blacklist nodes that are misbehaving 
by spamming heartbeats.

bq. Starts a custom NM which advertises infinite resources [...]
Shouldn't there be a configurable upper bound on advertised resources, if 
nothing else to prevent a misconfigured NM from harming the cluster?

bq. Act sane, gobble up some containers, get some containers, get app-ids, 
guess and construct new container-ids and send false reports about other 
containers of the app which are running on other nodes

Are there really no checks to prevent this sort of malicious/buggy behavior 
today??

bq. You have 1 NMToken per NM for the whole cluster and all AMs get the same 
NMToken for a given node.
No, I meant per-NM tokens.  I left out the implicit assumption the NM token 
contains the NM's hostname to ensure the AM isn't using the token for the wrong 
host.

bq. the NMTokenIdentifier should have NodeId, AppId, and may be also user-name 
for doing more complex app-acls
This would work fine.  The contrast is NM tokens become per-node per-AM tokens, 
whereas my suggestion is per-node NM tokens used for start container.  While 
the NM token could be used for start/stop, the launch context could contain the 
AM and/or container token (which is already there) as later auth for 
status/stop calls.  The AM token may be preferable to maintain a single AM->NM 
connection for status/stop.  The benefit of passing the AM token in the launch 
request is the NM won't later need the AM's secret since it knows exactly which 
token to allow.

Again though, your approach will work fine as well.

bq.  Given above, we should perhaps call this AMNMToken and rename the current 
AMToken to be AMRMToken.

Agreed.  In any case, renaming AM to AMRMToken makes it easier to understand 
the purpose of the token.  It would be nice if the kind field is also changed.

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13656390#comment-13656390
 ] 

Vinod Kumar Vavilapalli commented on YARN-613:
--

Till now, NMs have always been trusted. That's the reason why we don't have 
those checks. We'll need to add those extra checks if we feel that this 
assumption isn't true.

I'd just do the auth based on AMNMToken and do the authz based on the supplied 
appid and user information. Passing in AMTokens is an unnecessary step once we 
do that.

Definitely going to rename all these tokens to reflect what they are doing.

I talked to [~jnp] and [~sseth] offline even before this discussion. We reached 
the AMToken solution fundamentally because we were all trying to add in a new 
token. But as we are now saying that NMs necessarily cannot be trusted, it 
makes sense to add in the new token like you proposed. I discussed with them 
again and they both agreed.

Let's go ahead with AMNMTokens.

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-14 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657744#comment-13657744
 ] 

Omkar Vinit Joshi commented on YARN-613:


I am just summarizing the changes which we need to make for AMNMToken per AM 
per NM 

AMNMToken will remain valid until application is Alive. So Ideally AM will be 
able to communicated with NM as long as
* It received AMNMToken and at least started one container on the underlying 
Node (NameNode).
* Application has not yet finished.( Because after this NM will no longer 
remember about this AMNMToken master key...)

List of changes..
* RM side
** RM will now have ...RMAMNMTokenSecretManager which will generate token for 
every application per NM. This token creation will happen only once per NM per 
AM. If AM requests and gets new container on same NM then the token will not be 
regenerated. So RM maintains a map of AMNMTokens sent per AM per NM ... 
** RM will share master key with NM in its heartbeat if updated.

* AM side
** AM will now have to remember AMNMTokens per NM which it will get only once 
per NM during allocate call.
** AM will use this token for authentication by updating UGI while 
communicating with NM

* NM side
** NMAMNMTokenSecretManager will remember current and previous master key 
received as a part of heartbeat.
** It will also remember MasterKeyId per AM (appId) (This is to make sure we 
can support long running jobs).
** It will authenticate startContainer, getContainerStatus and stopContainer 
calls using AMNMToken via already saved master key. For very first 
startContainer request for the application using current/previous master key.


> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-15 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658604#comment-13658604
 ] 

Omkar Vinit Joshi commented on YARN-613:


This was discussed offline with [~vinodkv], [~bikassaha] and [~sseth].

There were 2 viable solutions to the problem of sending AMNMToken to AM for 
authenticating with NM.

Below problems need to be addressed
* The token will be generated by RM but how long the AMNMToken should be kept 
alive? How long AM should be able to talk to NM on which it ever launched any 
container during application life cycle.
* If the token doesn't have an expiry time then who will renew the token ? NM 
or RM ?.
* If NM reboots then can the old AMNMToken be reused? ( ideally when NM goes 
down right now containers are also lost, so there is nothing specific to that 
application there in NM after reboot)
* AM might handover the AMNMToken to some other external service ( other than 
AM ...may be another container) which should be able to communicate with NM. 
(problem:- how if implemented renewal will take place?)
* We need to support for long running services.
* When key roles over there should be no spiker in communicating renewed tokens 
if implemented.

Proposed solutions :-
* No AMNMToken renewal
** here RM will generate the token and will handover to AM only if the AM is 
getting the container on underlying NM for the first time otherwise it will not 
send. AM can use this token to talk to NM as long application is alive. So this 
is upper limited by number of applications in the cluster <= number of nodes * 
number of containers per node. 
*** RM will have to remember tokens given to AM per NM
*** NM will have to remember tokens per AM
*** AM will have to anyways remember token per NM
 Problems : If NM reboots then the token is no longer valid in which case 
RM should reissue AM a new token for restarted NM
 Advantages :
* for every container RM doesn't have to generate and send token.
* no need to renew the token. No added overhead. No need to remember past 
keys (other than current and previous master key).
* even if AM hands over token to some other service, that service can keep 
using the same token.
* AMNMToken renewal
** here RM will generate and issue the token to AM during start container. RM 
also remembers which AM has what all tokens. So when key rolls over then RM 
will redistribute renewed tokens to AM for all NM on which it ever started 
container. AM if receives the updated token will have to replace older with new 
token.
*** RM will have to remember all the NMs fro which AM handed over token
*** NM doesn't have to remember tokens per application. It only has to remember 
current and previous key.
*** AM will receive AMNMToken per container request / or all tokens during key 
renewal. It will have to refresh internal tokens with it
 Advantages:
* NM doesn't need to remember the token so there will be no problem across 
NM reboot. (even though token will be valid across NM reboot but still there 
will be nothing on NM for AM before new container starts).
 Problems:
* RM has to either remember or regenerate and send tokens to AM for 
container start call. This can be avoided by just sending it when key rolls 
over.
* AM has to refresh the tokens may be given to some another service for 
monitoring container progress.
* There will be spike at key role over.


> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-15 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658763#comment-13658763
 ] 

Omkar Vinit Joshi commented on YARN-613:


One addition .. good suggestion [~bikassaha]
If RM restarts then we have two scenarios
* If we need to preserve the work, (AM and containers will continue to run) in 
which AM should be able to communicate with NM with older AMNMToken after RM 
start. So if AM gets new container on the NM after RM reboot (RM will send the 
new AMNMToken to AM considering it has no knowledge of the previous AMNMToken - 
information not persisted) then AM should replace the existing token with new 
one. Now if NM gets a different token than the older /stored one it should 
validate the current Token's master key with that of its current/previous 
master key. If this is valid then replace older Token (thereby we can even 
renew token).
* If we don't need to preserve the work, (AM and container will be killed after 
RM restarts) then there will be no problem at all even with above 
implementation in which case as applications are already killed so we can just 
clear the cache on NM.

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-15 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658816#comment-13658816
 ] 

Bikas Saha commented on YARN-613:
-

To be clear, on the AM the behavior is always to take the tokens coming in the 
allocate response and setting them in the UGI (overriding old values). They 
will be picked from the UGI by NMClient during communication.
The behavior on the NM will be to always authenticate based on the current 
master key. This is always the latest correct value and in the majority of the 
use cases, this master key will be identical to the cached appId-MasterKey. If 
the master key matches the incoming token then the master key is used as the 
new value of the cached appId-master-key. If the master key fails to validate 
the token (long running apps), then the appId-master-key is used to validate 
the token.

It would be great to take the solution and break the work into separate jiras. 
e.g. AMRMProtocol addition, NMRM protocol changes, RM server changes, NM server 
changes, AMRMClient changes, nmclient changes.

bq. If we don't need to preserve the work, (AM and container will be killed 
after RM restarts) then there will be no problem at all even with above 
implementation in which case as applications are already killed so we can just 
clear the cache on NM.
If this cache is per appId then it cannot be removed when the appAttempt is 
completes. It will be removed when the application completes. During NM resync 
we should not invalidate the cache. The cache is required for work preserving 
restart and will automatically be refreshed by the above logic for 
non-work-preserving restart.

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-16 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660144#comment-13660144
 ] 

Omkar Vinit Joshi commented on YARN-613:


I am splitting this task into small subtasks.
* Creating AMNMToken master key on RM and sharing it to NM on its heartbeat. 
YARN-692
* Sending AMNMToken to AM on allocate call if container is allocated for the 
first time on underlying NM for given AM. YARN-693 
* AM uses the AMNMToken to authenticate all communication with NM. NM remembers 
and updates token across RM restart. YARN-694


> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-16 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660278#comment-13660278
 ] 

Omkar Vinit Joshi commented on YARN-613:


Small update to RM issuing the AMNMToken to AM :-
* RM will clear its cache after master key is rolled over. Thereby if AM is 
allocated a new container after key rollover then it will be issued new 
AMNMToken
* NM similarly will remember previous key only for twice (to be safer) the key 
activation period of RM.

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-06-18 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687411#comment-13687411
 ] 

Omkar Vinit Joshi commented on YARN-613:


As a part of YARN-694 ContainerManagementProtocolProxy was added to support per 
node manager proxy. As all the things in here are fixed closing it.

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
> Attachments: AMNMToken.docx
>
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira