[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847769#comment-13847769
 ] 

Vinod Kumar Vavilapalli commented on YARN-1495:
-----------------------------------------------

Tx for laying out the use-case.

bq. I don't see a reason that we shouldn't be able to move an app that has been 
submitted, but not accepted, or that is very close to completion.
There is a race condition when scheduler is in the process of accepting an app 
to a queue and a corresponding queue-move request comes in. Like you said, we 
just need to be careful.

Ha, your first question is related to the above.

We have to touch RMApp etc before hitting scheduler as state in RM is 
partitioned inside and outside scheduler. So we may not be able to go directly 
to the scheduler. Typically, we don't block RPCs either - bad in multi-tenant 
clusters. The paradigm followed is a multi-phase request - submitApp and poll 
for its status, kill-app and poll for its status (YARN-1446). You could do 
something like that here too.

The other race condition that just occurred to me is apps and app-attempts. RM 
may be in the process of creating a new app-attempt while the move request 
comes in. Scheduler only knows about App-Attempt today - that could be a hard 
issue. Jian is trying to fix it via YARN-1493. You may need to wait for that to 
avoid those race conditions.

> Allow moving apps between queues
> --------------------------------
>
>                 Key: YARN-1495
>                 URL: https://issues.apache.org/jira/browse/YARN-1495
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: scheduler
>    Affects Versions: 2.2.0
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>
> This is an umbrella JIRA for work needed to allow moving YARN applications 
> from one queue to another.  The work will consist of additions in the command 
> line options, additions in the client RM protocol, and changes in the 
> schedulers to support this.
> I have a picture of how this should function in the Fair Scheduler, but I'm 
> not familiar enough with the Capacity Scheduler for the same there.  
> Ultimately, the decision to whether an application can be moved should go 
> down to the scheduler - some schedulers may wish not to support this at all.  
> However, schedulers that do support it should share some common semantics 
> around ACLs and what happens to running containers.
> Here is how I see the general semantics working out:
> * A move request is issued by the client.  After it gets past ACLs, the 
> scheduler checks whether executing the move will violate any constraints. For 
> the Fair Scheduler, these would be queue maxRunningApps and queue 
> maxResources constraints
> * All running containers are transferred from the old queue to the new queue
> * All outstanding requests are transferred from the old queue to the new queue
> Here is I see the ACLs of this working out:
> * To move an app from a queue a user must have modify access on the app or 
> administer access on the queue
> * To move an app to a queue a user must have submit access on the queue or 
> administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to