[jira] [Commented] (YARN-1495) Allow moving apps between queues

2015-06-03 Thread Ben Podgursky (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571415#comment-14571415
 ] 

Ben Podgursky commented on YARN-1495:
-

Hi,

Are there any plans to let users move jobs between queues via the web UI, like 
with the MR1 fair scheduler?  We found this feature very useful.

 Allow moving apps between queues
 

 Key: YARN-1495
 URL: https://issues.apache.org/jira/browse/YARN-1495
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 This is an umbrella JIRA for work needed to allow moving YARN applications 
 from one queue to another.  The work will consist of additions in the command 
 line options, additions in the client RM protocol, and changes in the 
 schedulers to support this.
 I have a picture of how this should function in the Fair Scheduler, but I'm 
 not familiar enough with the Capacity Scheduler for the same there.  
 Ultimately, the decision to whether an application can be moved should go 
 down to the scheduler - some schedulers may wish not to support this at all.  
 However, schedulers that do support it should share some common semantics 
 around ACLs and what happens to running containers.
 Here is how I see the general semantics working out:
 * A move request is issued by the client.  After it gets past ACLs, the 
 scheduler checks whether executing the move will violate any constraints. For 
 the Fair Scheduler, these would be queue maxRunningApps and queue 
 maxResources constraints
 * All running containers are transferred from the old queue to the new queue
 * All outstanding requests are transferred from the old queue to the new queue
 Here is I see the ACLs of this working out:
 * To move an app from a queue a user must have modify access on the app or 
 administer access on the queue
 * To move an app to a queue a user must have submit access on the queue or 
 administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1495) Allow moving apps between queues

2014-02-11 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898568#comment-13898568
 ] 

Sandy Ryza commented on YARN-1495:
--

bq. From an API point of view, there should be a way for the application at 
run-time/registration time to find out what features are supported or not 
supported by the currently configured scheduler in the RM.
That makes total sense to me.

bq. Now, for moving apps across schedulers - given that it is a client only 
feature and there is no changes required in an application, my previous 
comment's argument does not hold for this feature.
Great.  Will go forward with the merge to branch-2.

 Allow moving apps between queues
 

 Key: YARN-1495
 URL: https://issues.apache.org/jira/browse/YARN-1495
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 This is an umbrella JIRA for work needed to allow moving YARN applications 
 from one queue to another.  The work will consist of additions in the command 
 line options, additions in the client RM protocol, and changes in the 
 schedulers to support this.
 I have a picture of how this should function in the Fair Scheduler, but I'm 
 not familiar enough with the Capacity Scheduler for the same there.  
 Ultimately, the decision to whether an application can be moved should go 
 down to the scheduler - some schedulers may wish not to support this at all.  
 However, schedulers that do support it should share some common semantics 
 around ACLs and what happens to running containers.
 Here is how I see the general semantics working out:
 * A move request is issued by the client.  After it gets past ACLs, the 
 scheduler checks whether executing the move will violate any constraints. For 
 the Fair Scheduler, these would be queue maxRunningApps and queue 
 maxResources constraints
 * All running containers are transferred from the old queue to the new queue
 * All outstanding requests are transferred from the old queue to the new queue
 Here is I see the ACLs of this working out:
 * To move an app from a queue a user must have modify access on the app or 
 administer access on the queue
 * To move an app to a queue a user must have submit access on the queue or 
 administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1495) Allow moving apps between queues

2014-02-11 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898601#comment-13898601
 ] 

Hitesh Shah commented on YARN-1495:
---

[~sandyr] Just to be clear, I have not reviewed the code as such so please do 
not consider my comment as a +1 if that is required for the merge. I am 
assuming there are others who are more familiar with these changes and have 
reviewed/+1'ed it for the merge. 

 Allow moving apps between queues
 

 Key: YARN-1495
 URL: https://issues.apache.org/jira/browse/YARN-1495
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 This is an umbrella JIRA for work needed to allow moving YARN applications 
 from one queue to another.  The work will consist of additions in the command 
 line options, additions in the client RM protocol, and changes in the 
 schedulers to support this.
 I have a picture of how this should function in the Fair Scheduler, but I'm 
 not familiar enough with the Capacity Scheduler for the same there.  
 Ultimately, the decision to whether an application can be moved should go 
 down to the scheduler - some schedulers may wish not to support this at all.  
 However, schedulers that do support it should share some common semantics 
 around ACLs and what happens to running containers.
 Here is how I see the general semantics working out:
 * A move request is issued by the client.  After it gets past ACLs, the 
 scheduler checks whether executing the move will violate any constraints. For 
 the Fair Scheduler, these would be queue maxRunningApps and queue 
 maxResources constraints
 * All running containers are transferred from the old queue to the new queue
 * All outstanding requests are transferred from the old queue to the new queue
 Here is I see the ACLs of this working out:
 * To move an app from a queue a user must have modify access on the app or 
 administer access on the queue
 * To move an app to a queue a user must have submit access on the queue or 
 administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1495) Allow moving apps between queues

2014-02-11 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898666#comment-13898666
 ] 

Karthik Kambatla commented on YARN-1495:


None of the patches introduce any Public-Stable APIs. I think it is reasonable 
to merge them to branch-2. +1. Thanks Sandy. 

 Allow moving apps between queues
 

 Key: YARN-1495
 URL: https://issues.apache.org/jira/browse/YARN-1495
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 This is an umbrella JIRA for work needed to allow moving YARN applications 
 from one queue to another.  The work will consist of additions in the command 
 line options, additions in the client RM protocol, and changes in the 
 schedulers to support this.
 I have a picture of how this should function in the Fair Scheduler, but I'm 
 not familiar enough with the Capacity Scheduler for the same there.  
 Ultimately, the decision to whether an application can be moved should go 
 down to the scheduler - some schedulers may wish not to support this at all.  
 However, schedulers that do support it should share some common semantics 
 around ACLs and what happens to running containers.
 Here is how I see the general semantics working out:
 * A move request is issued by the client.  After it gets past ACLs, the 
 scheduler checks whether executing the move will violate any constraints. For 
 the Fair Scheduler, these would be queue maxRunningApps and queue 
 maxResources constraints
 * All running containers are transferred from the old queue to the new queue
 * All outstanding requests are transferred from the old queue to the new queue
 Here is I see the ACLs of this working out:
 * To move an app from a queue a user must have modify access on the app or 
 administer access on the queue
 * To move an app to a queue a user must have submit access on the queue or 
 administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1495) Allow moving apps between queues

2014-02-10 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897421#comment-13897421
 ] 

Hitesh Shah commented on YARN-1495:
---

The way I see it - it is and should be ok for different schedulers to support a 
different set of features. The behavior should be the same across all 
schedulers if the feature is supported. 

@Karthik, I dont believe it is right to do a half-baked approach regardless of 
which scheduler builds the feature first. The main concern is for an app 
developer and how a new feature or the lack of it affects someone writing their 
own application. 

From an API point of view, there should be a way for the application at 
run-time/registration time to find out what features are supported or not 
supported by the currently configured scheduler in the RM. This allows for 
applications to be written correctly and to make the necessary changes in the 
calls to the RM to work around advanced vs primitive schedulers. If schedulers 
are going to differ in terms of feature support, then an API to find out 
whether a feature is supported or not should be considered a blocker for a 
release. I believe this only holds for APIs affecting application masters for 
now but there may be situations where a client could be affected too.

Now, for moving apps across schedulers - given that it is a client only feature 
and there is no changes required in an application, my previous comment's 
argument does not hold for this feature. ( I assume that Fifo and CS will both 
throw an appropriate UnsupportedOperationException on a move call? ) 


 Allow moving apps between queues
 

 Key: YARN-1495
 URL: https://issues.apache.org/jira/browse/YARN-1495
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 This is an umbrella JIRA for work needed to allow moving YARN applications 
 from one queue to another.  The work will consist of additions in the command 
 line options, additions in the client RM protocol, and changes in the 
 schedulers to support this.
 I have a picture of how this should function in the Fair Scheduler, but I'm 
 not familiar enough with the Capacity Scheduler for the same there.  
 Ultimately, the decision to whether an application can be moved should go 
 down to the scheduler - some schedulers may wish not to support this at all.  
 However, schedulers that do support it should share some common semantics 
 around ACLs and what happens to running containers.
 Here is how I see the general semantics working out:
 * A move request is issued by the client.  After it gets past ACLs, the 
 scheduler checks whether executing the move will violate any constraints. For 
 the Fair Scheduler, these would be queue maxRunningApps and queue 
 maxResources constraints
 * All running containers are transferred from the old queue to the new queue
 * All outstanding requests are transferred from the old queue to the new queue
 Here is I see the ACLs of this working out:
 * To move an app from a queue a user must have modify access on the app or 
 administer access on the queue
 * To move an app to a queue a user must have submit access on the queue or 
 administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1495) Allow moving apps between queues

2014-02-07 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894826#comment-13894826
 ] 

Sandy Ryza commented on YARN-1495:
--

With the bulk of this implemented and tested, I'm planning to merge this to 
branch-2. Will do so tomorrow unless there are objections.

 Allow moving apps between queues
 

 Key: YARN-1495
 URL: https://issues.apache.org/jira/browse/YARN-1495
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 This is an umbrella JIRA for work needed to allow moving YARN applications 
 from one queue to another.  The work will consist of additions in the command 
 line options, additions in the client RM protocol, and changes in the 
 schedulers to support this.
 I have a picture of how this should function in the Fair Scheduler, but I'm 
 not familiar enough with the Capacity Scheduler for the same there.  
 Ultimately, the decision to whether an application can be moved should go 
 down to the scheduler - some schedulers may wish not to support this at all.  
 However, schedulers that do support it should share some common semantics 
 around ACLs and what happens to running containers.
 Here is how I see the general semantics working out:
 * A move request is issued by the client.  After it gets past ACLs, the 
 scheduler checks whether executing the move will violate any constraints. For 
 the Fair Scheduler, these would be queue maxRunningApps and queue 
 maxResources constraints
 * All running containers are transferred from the old queue to the new queue
 * All outstanding requests are transferred from the old queue to the new queue
 Here is I see the ACLs of this working out:
 * To move an app from a queue a user must have modify access on the app or 
 administer access on the queue
 * To move an app to a queue a user must have submit access on the queue or 
 administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1495) Allow moving apps between queues

2014-02-07 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894832#comment-13894832
 ] 

Hitesh Shah commented on YARN-1495:
---

[~sandyr] Are the changes for capacity scheduler also done? If not, I am not 
sure how we can add a feature where the default scheduler does not support it.

 Allow moving apps between queues
 

 Key: YARN-1495
 URL: https://issues.apache.org/jira/browse/YARN-1495
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 This is an umbrella JIRA for work needed to allow moving YARN applications 
 from one queue to another.  The work will consist of additions in the command 
 line options, additions in the client RM protocol, and changes in the 
 schedulers to support this.
 I have a picture of how this should function in the Fair Scheduler, but I'm 
 not familiar enough with the Capacity Scheduler for the same there.  
 Ultimately, the decision to whether an application can be moved should go 
 down to the scheduler - some schedulers may wish not to support this at all.  
 However, schedulers that do support it should share some common semantics 
 around ACLs and what happens to running containers.
 Here is how I see the general semantics working out:
 * A move request is issued by the client.  After it gets past ACLs, the 
 scheduler checks whether executing the move will violate any constraints. For 
 the Fair Scheduler, these would be queue maxRunningApps and queue 
 maxResources constraints
 * All running containers are transferred from the old queue to the new queue
 * All outstanding requests are transferred from the old queue to the new queue
 Here is I see the ACLs of this working out:
 * To move an app from a queue a user must have modify access on the app or 
 administer access on the queue
 * To move an app to a queue a user must have submit access on the queue or 
 administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1495) Allow moving apps between queues

2014-02-07 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894908#comment-13894908
 ] 

Sandy Ryza commented on YARN-1495:
--

The changes for the Capacity Scheduler aren't yet done.  In the past, we've 
never used this as a basis for not including a feature.  For example, 
preemption was supported in YARN and the Fair Scheduler long before it reached 
the Capacity Scheduler.  The Capacity Scheduler is default because we can't 
choose both as the default, but the Fair Scheduler is recommended / first class 
as well.  Even if we decided to change our stance there, I don't think that 
would be a basis for not merging what we have so far into branch-2.

 Allow moving apps between queues
 

 Key: YARN-1495
 URL: https://issues.apache.org/jira/browse/YARN-1495
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 This is an umbrella JIRA for work needed to allow moving YARN applications 
 from one queue to another.  The work will consist of additions in the command 
 line options, additions in the client RM protocol, and changes in the 
 schedulers to support this.
 I have a picture of how this should function in the Fair Scheduler, but I'm 
 not familiar enough with the Capacity Scheduler for the same there.  
 Ultimately, the decision to whether an application can be moved should go 
 down to the scheduler - some schedulers may wish not to support this at all.  
 However, schedulers that do support it should share some common semantics 
 around ACLs and what happens to running containers.
 Here is how I see the general semantics working out:
 * A move request is issued by the client.  After it gets past ACLs, the 
 scheduler checks whether executing the move will violate any constraints. For 
 the Fair Scheduler, these would be queue maxRunningApps and queue 
 maxResources constraints
 * All running containers are transferred from the old queue to the new queue
 * All outstanding requests are transferred from the old queue to the new queue
 Here is I see the ACLs of this working out:
 * To move an app from a queue a user must have modify access on the app or 
 administer access on the queue
 * To move an app to a queue a user must have submit access on the queue or 
 administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1495) Allow moving apps between queues

2014-02-07 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895112#comment-13895112
 ] 

Sandy Ryza commented on YARN-1495:
--

While I would argue that preemption is actually more impactful to users than 
the option to move apps, as they need to make their applications resilient to 
it, it's not the only example.  Strict locality and preemption warnings went 
into the Fair Scheduler before the Capacity Scheduler, blacklisting went into 
the Capacity Scheduler first.  The users for moving applications between queues 
are cluster administrators, who already need to be aware of the operational 
differences between different schedulers.  There are many reasons why moving an 
application between queues may fail, some of them internal to the scheduler, 
such as a violation of resource configurations, some of them external, such as 
an application being in a particular state.  Using a scheduler that doesn't 
support it is just another example.

While having a consistent experience across schedulers is nice, and we should 
be very careful to keep the semantics the same when multiple schedulers support 
it, I think blocking it in one scheduler because the other doesn't support it 
is an unnecessary drag on the pace of development.  



 Allow moving apps between queues
 

 Key: YARN-1495
 URL: https://issues.apache.org/jira/browse/YARN-1495
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 This is an umbrella JIRA for work needed to allow moving YARN applications 
 from one queue to another.  The work will consist of additions in the command 
 line options, additions in the client RM protocol, and changes in the 
 schedulers to support this.
 I have a picture of how this should function in the Fair Scheduler, but I'm 
 not familiar enough with the Capacity Scheduler for the same there.  
 Ultimately, the decision to whether an application can be moved should go 
 down to the scheduler - some schedulers may wish not to support this at all.  
 However, schedulers that do support it should share some common semantics 
 around ACLs and what happens to running containers.
 Here is how I see the general semantics working out:
 * A move request is issued by the client.  After it gets past ACLs, the 
 scheduler checks whether executing the move will violate any constraints. For 
 the Fair Scheduler, these would be queue maxRunningApps and queue 
 maxResources constraints
 * All running containers are transferred from the old queue to the new queue
 * All outstanding requests are transferred from the old queue to the new queue
 Here is I see the ACLs of this working out:
 * To move an app from a queue a user must have modify access on the app or 
 administer access on the queue
 * To move an app to a queue a user must have submit access on the queue or 
 administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1495) Allow moving apps between queues

2014-01-03 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861864#comment-13861864
 ] 

Sandy Ryza commented on YARN-1495:
--

Good point, Bikas.  Filed YARN-1558 for this.

 Allow moving apps between queues
 

 Key: YARN-1495
 URL: https://issues.apache.org/jira/browse/YARN-1495
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 This is an umbrella JIRA for work needed to allow moving YARN applications 
 from one queue to another.  The work will consist of additions in the command 
 line options, additions in the client RM protocol, and changes in the 
 schedulers to support this.
 I have a picture of how this should function in the Fair Scheduler, but I'm 
 not familiar enough with the Capacity Scheduler for the same there.  
 Ultimately, the decision to whether an application can be moved should go 
 down to the scheduler - some schedulers may wish not to support this at all.  
 However, schedulers that do support it should share some common semantics 
 around ACLs and what happens to running containers.
 Here is how I see the general semantics working out:
 * A move request is issued by the client.  After it gets past ACLs, the 
 scheduler checks whether executing the move will violate any constraints. For 
 the Fair Scheduler, these would be queue maxRunningApps and queue 
 maxResources constraints
 * All running containers are transferred from the old queue to the new queue
 * All outstanding requests are transferred from the old queue to the new queue
 Here is I see the ACLs of this working out:
 * To move an app from a queue a user must have modify access on the app or 
 administer access on the queue
 * To move an app to a queue a user must have submit access on the queue or 
 administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1495) Allow moving apps between queues

2013-12-15 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848587#comment-13848587
 ] 

Bikas Saha commented on YARN-1495:
--

The app submission context saved in the store would need to be updated with the 
new queue information, after the scheduler has accepted the move but before the 
user gets notified about move completion.

 Allow moving apps between queues
 

 Key: YARN-1495
 URL: https://issues.apache.org/jira/browse/YARN-1495
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 This is an umbrella JIRA for work needed to allow moving YARN applications 
 from one queue to another.  The work will consist of additions in the command 
 line options, additions in the client RM protocol, and changes in the 
 schedulers to support this.
 I have a picture of how this should function in the Fair Scheduler, but I'm 
 not familiar enough with the Capacity Scheduler for the same there.  
 Ultimately, the decision to whether an application can be moved should go 
 down to the scheduler - some schedulers may wish not to support this at all.  
 However, schedulers that do support it should share some common semantics 
 around ACLs and what happens to running containers.
 Here is how I see the general semantics working out:
 * A move request is issued by the client.  After it gets past ACLs, the 
 scheduler checks whether executing the move will violate any constraints. For 
 the Fair Scheduler, these would be queue maxRunningApps and queue 
 maxResources constraints
 * All running containers are transferred from the old queue to the new queue
 * All outstanding requests are transferred from the old queue to the new queue
 Here is I see the ACLs of this working out:
 * To move an app from a queue a user must have modify access on the app or 
 administer access on the queue
 * To move an app to a queue a user must have submit access on the queue or 
 administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1495) Allow moving apps between queues

2013-12-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847769#comment-13847769
 ] 

Vinod Kumar Vavilapalli commented on YARN-1495:
---

Tx for laying out the use-case.

bq. I don't see a reason that we shouldn't be able to move an app that has been 
submitted, but not accepted, or that is very close to completion.
There is a race condition when scheduler is in the process of accepting an app 
to a queue and a corresponding queue-move request comes in. Like you said, we 
just need to be careful.

Ha, your first question is related to the above.

We have to touch RMApp etc before hitting scheduler as state in RM is 
partitioned inside and outside scheduler. So we may not be able to go directly 
to the scheduler. Typically, we don't block RPCs either - bad in multi-tenant 
clusters. The paradigm followed is a multi-phase request - submitApp and poll 
for its status, kill-app and poll for its status (YARN-1446). You could do 
something like that here too.

The other race condition that just occurred to me is apps and app-attempts. RM 
may be in the process of creating a new app-attempt while the move request 
comes in. Scheduler only knows about App-Attempt today - that could be a hard 
issue. Jian is trying to fix it via YARN-1493. You may need to wait for that to 
avoid those race conditions.

 Allow moving apps between queues
 

 Key: YARN-1495
 URL: https://issues.apache.org/jira/browse/YARN-1495
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 This is an umbrella JIRA for work needed to allow moving YARN applications 
 from one queue to another.  The work will consist of additions in the command 
 line options, additions in the client RM protocol, and changes in the 
 schedulers to support this.
 I have a picture of how this should function in the Fair Scheduler, but I'm 
 not familiar enough with the Capacity Scheduler for the same there.  
 Ultimately, the decision to whether an application can be moved should go 
 down to the scheduler - some schedulers may wish not to support this at all.  
 However, schedulers that do support it should share some common semantics 
 around ACLs and what happens to running containers.
 Here is how I see the general semantics working out:
 * A move request is issued by the client.  After it gets past ACLs, the 
 scheduler checks whether executing the move will violate any constraints. For 
 the Fair Scheduler, these would be queue maxRunningApps and queue 
 maxResources constraints
 * All running containers are transferred from the old queue to the new queue
 * All outstanding requests are transferred from the old queue to the new queue
 Here is I see the ACLs of this working out:
 * To move an app from a queue a user must have modify access on the app or 
 administer access on the queue
 * To move an app to a queue a user must have submit access on the queue or 
 administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1495) Allow moving apps between queues

2013-12-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848006#comment-13848006
 ] 

Sandy Ryza commented on YARN-1495:
--

bq. We have to touch RMApp etc before hitting scheduler as state in RM is 
partitioned inside and outside scheduler.
Sorry, I wasn't clear - definitely agree we need to go through RM app, just was 
wondering whether to do it with events or synchronously.  Thanks for the heads 
up on the race condition - will watch out for that.

bq. The paradigm followed is a multi-phase request
An issue with doing a multi-phase request is that, if the move fails, we would 
like to return an appropriate error message with the reason to the client, and 
the reason can go as far down as the scheduler.  We could give the client a 
request ID that they could come back with to find the result, but that kind of 
seems like overkill to me.  While async/multi-phase requests 100% make sense to 
me in situations like the AMRM protocol where requests come in all the time, 
moves will normally be human-initiated requests that come with very low 
frequency.  I'll write the code with events, which will allow us to take either 
the blocking (with a Future) or non-blocking approach.

 Allow moving apps between queues
 

 Key: YARN-1495
 URL: https://issues.apache.org/jira/browse/YARN-1495
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 This is an umbrella JIRA for work needed to allow moving YARN applications 
 from one queue to another.  The work will consist of additions in the command 
 line options, additions in the client RM protocol, and changes in the 
 schedulers to support this.
 I have a picture of how this should function in the Fair Scheduler, but I'm 
 not familiar enough with the Capacity Scheduler for the same there.  
 Ultimately, the decision to whether an application can be moved should go 
 down to the scheduler - some schedulers may wish not to support this at all.  
 However, schedulers that do support it should share some common semantics 
 around ACLs and what happens to running containers.
 Here is how I see the general semantics working out:
 * A move request is issued by the client.  After it gets past ACLs, the 
 scheduler checks whether executing the move will violate any constraints. For 
 the Fair Scheduler, these would be queue maxRunningApps and queue 
 maxResources constraints
 * All running containers are transferred from the old queue to the new queue
 * All outstanding requests are transferred from the old queue to the new queue
 Here is I see the ACLs of this working out:
 * To move an app from a queue a user must have modify access on the app or 
 administer access on the queue
 * To move an app to a queue a user must have submit access on the queue or 
 administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1495) Allow moving apps between queues

2013-12-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847081#comment-13847081
 ] 

Vinod Kumar Vavilapalli commented on YARN-1495:
---

Hi Sandy, some questions and quick thoughts on this ticket:
 - Any specific use-case? Example where it can be used? To justify this isn't 
feature creep.
 - What happens when scheduling-constraints are violated? The client will just 
get an error? It kind of depends on the type of scheduling constraint.
 - Who initiates the move any regular user or just admins? Given your 
description of ACLs, seems like any one.
 - Only running apps can be moved? There are races w.r.t apps that are 
submitted but not accepted and close-to-completion apps.
 - The ACLs choice seems straightforward and makes sense.

There is some non-trivial stuff that needs ironing out, outside of schedulers.
 - While the move happens,
-- Apps may be in the process of submitting new requests. What happens to 
them? I guess queue-move and new-requests should be synchronized.
-- Preemption monitors will need to be notified. As they kind of know a lot 
about schedulers but sit outside the schedulers.
-- there will be a potential wild-change in the head-room for the 
application.

 Allow moving apps between queues
 

 Key: YARN-1495
 URL: https://issues.apache.org/jira/browse/YARN-1495
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 This is an umbrella JIRA for work needed to allow moving YARN applications 
 from one queue to another.  The work will consist of additions in the command 
 line options, additions in the client RM protocol, and changes in the 
 schedulers to support this.
 I have a picture of how this should function in the Fair Scheduler, but I'm 
 not familiar enough with the Capacity Scheduler for the same there.  
 Ultimately, the decision to whether an application can be moved should go 
 down to the scheduler - some schedulers may wish not to support this at all.  
 However, schedulers that do support it should share some common semantics 
 around ACLs and what happens to running containers.
 Here is how I see the general semantics working out:
 * A move request is issued by the client.  After it gets past ACLs, the 
 scheduler checks whether executing the move will violate any constraints. For 
 the Fair Scheduler, these would be queue maxRunningApps and queue 
 maxResources constraints
 * All running containers are transferred from the old queue to the new queue
 * All outstanding requests are transferred from the old queue to the new queue
 Here is I see the ACLs of this working out:
 * To move an app from a queue a user must have modify access on the app or 
 administer access on the queue
 * To move an app to a queue a user must have submit access on the queue or 
 administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1495) Allow moving apps between queues

2013-12-12 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847108#comment-13847108
 ] 

Sandy Ryza commented on YARN-1495:
--

Thanks for taking a look Vinod.

bq. Any specific use-case? Example where it can be used? To justify this isn't 
feature creep.
Yeah, we've seen requests for this a few times.  I think the most common 
scenario is that someone experiences job slowly because of the queue that it's 
in and the job needs to be placed in a queue where it can complete more 
quickly.  This can occur because it's taking longer than expected and a 
deadline is approaching, the original queue is fuller than expected, the job 
was submitted incorrectly in the first place but has made some progress, or for 
a number of other reasons.

bq. What happens when scheduling-constraints are violated? The client will just 
get an error? It kind of depends on the type of scheduling constraint.
Not sure how this should play out for the Capacity Scheduler, but for the Fair 
Scheduler constraints I mentioned in the description I think the client should 
get an error. I suppose another option would be to kill containers until the 
constraints would be satisfied, but I think this is a lot more work and not 
clearly better behavior.

bq. Who initiates the move any regular user or just admins?
My opinion is any regular user, within ACLs.  I.e. if I could kill my job and 
resubmit it to a different queue, I should be able to move it.

bq. Only running apps can be moved?
I don't see a reason that we shouldn't be able to move an app that has been 
submitted, but not accepted, or that is very close to completion.  In some 
cases we may not need to touch the scheduler.  There are definitely race 
conditions we need to be careful of here.

bq. Apps may be in the process of submitting new requests. What happens to 
them? I guess queue-move and new-requests should be synchronized.
Right.


 Allow moving apps between queues
 

 Key: YARN-1495
 URL: https://issues.apache.org/jira/browse/YARN-1495
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 This is an umbrella JIRA for work needed to allow moving YARN applications 
 from one queue to another.  The work will consist of additions in the command 
 line options, additions in the client RM protocol, and changes in the 
 schedulers to support this.
 I have a picture of how this should function in the Fair Scheduler, but I'm 
 not familiar enough with the Capacity Scheduler for the same there.  
 Ultimately, the decision to whether an application can be moved should go 
 down to the scheduler - some schedulers may wish not to support this at all.  
 However, schedulers that do support it should share some common semantics 
 around ACLs and what happens to running containers.
 Here is how I see the general semantics working out:
 * A move request is issued by the client.  After it gets past ACLs, the 
 scheduler checks whether executing the move will violate any constraints. For 
 the Fair Scheduler, these would be queue maxRunningApps and queue 
 maxResources constraints
 * All running containers are transferred from the old queue to the new queue
 * All outstanding requests are transferred from the old queue to the new queue
 Here is I see the ACLs of this working out:
 * To move an app from a queue a user must have modify access on the app or 
 administer access on the queue
 * To move an app to a queue a user must have submit access on the queue or 
 administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1495) Allow moving apps between queues

2013-12-12 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847134#comment-13847134
 ] 

Sandy Ryza commented on YARN-1495:
--

Also, a coding question you can maybe provide me guidance on?

Ideally, we would like to return the RPC with whether or not the operation 
succeeded.  However, we need to go down through the app, app attempt, and 
finally, scheduler to determine this.   We could achieve this in a couple of 
ways:
* Use an aync event at each level as is the convention (e.g. as is done for 
killing an application).  Have the call in ClientRMService block and wait for 
things to get sorted out lower down before returning.  Not entirely sure what 
we would wait for because the ClientRMService itself doesn't  receive events.  
A Future might be clean.
* Bypass events and go synchronously through to the scheduler.
Is one of these preferred?  Is there a third path I'm missing?

 Allow moving apps between queues
 

 Key: YARN-1495
 URL: https://issues.apache.org/jira/browse/YARN-1495
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 This is an umbrella JIRA for work needed to allow moving YARN applications 
 from one queue to another.  The work will consist of additions in the command 
 line options, additions in the client RM protocol, and changes in the 
 schedulers to support this.
 I have a picture of how this should function in the Fair Scheduler, but I'm 
 not familiar enough with the Capacity Scheduler for the same there.  
 Ultimately, the decision to whether an application can be moved should go 
 down to the scheduler - some schedulers may wish not to support this at all.  
 However, schedulers that do support it should share some common semantics 
 around ACLs and what happens to running containers.
 Here is how I see the general semantics working out:
 * A move request is issued by the client.  After it gets past ACLs, the 
 scheduler checks whether executing the move will violate any constraints. For 
 the Fair Scheduler, these would be queue maxRunningApps and queue 
 maxResources constraints
 * All running containers are transferred from the old queue to the new queue
 * All outstanding requests are transferred from the old queue to the new queue
 Here is I see the ACLs of this working out:
 * To move an app from a queue a user must have modify access on the app or 
 administer access on the queue
 * To move an app to a queue a user must have submit access on the queue or 
 administer access on the queue 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)