[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished

2013-03-29 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-460:
---

Attachment: YARN-460.patch

trunk patch.  After thinking about this more and talking to Arun a bit about 
it. I have made this patch just contain the check for stopped and it returns an 
EMPTY_ALLOCATION similar to what it did before if the application was null when 
calling allocate().  I will file a follow up jira to investigate if the 
AMResponse should have another field so that the RM could send the AM useful 
error information other then just reboot.

> CS user left in list of active users for the queue even when application 
> finished
> -
>
> Key: YARN-460
> URL: https://issues.apache.org/jira/browse/YARN-460
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 0.23.7, 2.0.4-alpha
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Blocker
> Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, 
> YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch, 
> YARN-460.patch
>
>
> We have seen a user get left in the queues list of active users even though 
> the application was removed. This can cause everyone else in the queue to get 
> less resources if using the minimum user limit percent config.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished

2013-03-29 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-460:
---

Attachment: YARN-460-branch-0.23.patch

branch-23 patch with just the stopped check.

> CS user left in list of active users for the queue even when application 
> finished
> -
>
> Key: YARN-460
> URL: https://issues.apache.org/jira/browse/YARN-460
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 0.23.7, 2.0.4-alpha
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Blocker
> Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, 
> YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch
>
>
> We have seen a user get left in the queues list of active users even though 
> the application was removed. This can cause everyone else in the queue to get 
> less resources if using the minimum user limit percent config.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished

2013-03-13 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-460:
---

Priority: Blocker  (was: Critical)

> CS user left in list of active users for the queue even when application 
> finished
> -
>
> Key: YARN-460
> URL: https://issues.apache.org/jira/browse/YARN-460
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 0.23.7, 2.0.4-alpha
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Blocker
> Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, 
> YARN-460.patch, YARN-460.patch, YARN-460.patch
>
>
> We have seen a user get left in the queues list of active users even though 
> the application was removed. This can cause everyone else in the queue to get 
> less resources if using the minimum user limit percent config.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished

2013-03-12 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-460:
---

Attachment: YARN-460-branch-0.23.patch

updated branch-0.23 patch having Allocation return an error and then 
ApplicationMasterService send reboot command to AM.

> CS user left in list of active users for the queue even when application 
> finished
> -
>
> Key: YARN-460
> URL: https://issues.apache.org/jira/browse/YARN-460
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 0.23.7, 2.0.4-alpha
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
> Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, 
> YARN-460.patch, YARN-460.patch, YARN-460.patch
>
>
> We have seen a user get left in the queues list of active users even though 
> the application was removed. This can cause everyone else in the queue to get 
> less resources if using the minimum user limit percent config.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished

2013-03-12 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-460:
---

Attachment: YARN-460.patch

trunk and branch-2 patch.  Unfortunately I couldn't easily come up with a unit 
test to hit the application stopped condition (without hitting the null check)  
due to the data structures being private.  

> CS user left in list of active users for the queue even when application 
> finished
> -
>
> Key: YARN-460
> URL: https://issues.apache.org/jira/browse/YARN-460
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 0.23.7, 2.0.4-alpha
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
> Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, 
> YARN-460.patch, YARN-460.patch, YARN-460.patch
>
>
> We have seen a user get left in the queues list of active users even though 
> the application was removed. This can cause everyone else in the queue to get 
> less resources if using the minimum user limit percent config.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished

2013-03-08 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-460:
---

Attachment: YARN-460-branch-0.23.patch

> CS user left in list of active users for the queue even when application 
> finished
> -
>
> Key: YARN-460
> URL: https://issues.apache.org/jira/browse/YARN-460
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 0.23.7, 2.0.4-alpha
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
> Attachments: YARN-460-branch-0.23.patch, YARN-460.patch, 
> YARN-460.patch
>
>
> We have seen a user get left in the queues list of active users even though 
> the application was removed. This can cause everyone else in the queue to get 
> less resources if using the minimum user limit percent config.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished

2013-03-08 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-460:
---

Attachment: YARN-460.patch

updated patch for trunk/branch-2.  

> CS user left in list of active users for the queue even when application 
> finished
> -
>
> Key: YARN-460
> URL: https://issues.apache.org/jira/browse/YARN-460
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 0.23.7, 2.0.4-alpha
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
> Attachments: YARN-460-branch-0.23.patch, YARN-460.patch, 
> YARN-460.patch
>
>
> We have seen a user get left in the queues list of active users even though 
> the application was removed. This can cause everyone else in the queue to get 
> less resources if using the minimum user limit percent config.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished

2013-03-08 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-460:
---

Attachment: YARN-460.patch

So I think we can simply track if the application gets stopped and then check 
that in the allocate() call before really processing it.  

All the stopping/removing of the application happens in CS.doneApplication and 
the race is really between the calls in that function and the fact that 
allocate() isn't synchronized. No other paths I could find should cause issues 
since most of the other funtions in CS are all synchronized and wouldn't run 
while the doneApplication is happening. 

here is a preliminary patch that I am going to do some more testing on it.  The 
checks for stopped in the SchedulerApp are extra I was just being paranoid.  

> CS user left in list of active users for the queue even when application 
> finished
> -
>
> Key: YARN-460
> URL: https://issues.apache.org/jira/browse/YARN-460
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 0.23.7, 2.0.4-alpha
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
> Attachments: YARN-460.patch
>
>
> We have seen a user get left in the queues list of active users even though 
> the application was removed. This can cause everyone else in the queue to get 
> less resources if using the minimum user limit percent config.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira