[jira] [Updated] (YARN-2175) Container localization has no timeouts and tasks can be stuck there for a long time

2014-07-01 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2175:


Description: 
There are no timeouts that can be used to limit the time taken by various 
container startup operations. Localization for example could take a long time 
and there is no automated way to kill an task if its stuck in these states. 
These may have nothing to do with the task itself and could be an issue within 
the platform.

Ideally there should be configurable limits for various states within the 
NodeManager to limit various states. The RM does not care about most of these 
and its only between AM and the NM. We can start by making these global 
configurable defaults and in future we can make it fancier by letting AM 
override them in the start container request. 

This jira will be used to limit localization time and we can open others if we 
feel we need to limit other operations.

  was:
There are no timeouts that can be used to limit the time taken by various 
container startup operations. Localization for example could take a long time 
and there is no automated way to kill an task if its stuck in these states. 
These may have nothing to do with the task itself and could be an issue within 
the platform. 

Ideally there should be configurable limits for various states within the 
NodeManager to limit various states. The RM does not care about most of these 
and its only between AM and the NM. We can start by making these global 
configurable defaults and in future we can make it fancier by letting AM 
override them in the start container request.

This jira will be used to limit localization time and we open others if we feel 
we need to limit other operations.


 Container localization has no timeouts and tasks can be stuck there for a 
 long time
 ---

 Key: YARN-2175
 URL: https://issues.apache.org/jira/browse/YARN-2175
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot

 There are no timeouts that can be used to limit the time taken by various 
 container startup operations. Localization for example could take a long time 
 and there is no automated way to kill an task if its stuck in these states. 
 These may have nothing to do with the task itself and could be an issue 
 within the platform.
 Ideally there should be configurable limits for various states within the 
 NodeManager to limit various states. The RM does not care about most of these 
 and its only between AM and the NM. We can start by making these global 
 configurable defaults and in future we can make it fancier by letting AM 
 override them in the start container request. 
 This jira will be used to limit localization time and we can open others if 
 we feel we need to limit other operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2175) Container localization has no timeouts and tasks can be stuck there for a long time

2014-07-01 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2175:


Description: 
There are no timeouts that can be used to limit the time taken by various 
container startup operations. Localization for example could take a long time 
and there is no automated way to kill an task if its stuck in these states. 
These may have nothing to do with the task itself and could be an issue within 
the platform. 

Ideally there should be configurable limits for various states within the 
NodeManager to limit various states. The RM does not care about most of these 
and its only between AM and the NM. We can start by making these global 
configurable defaults and in future we can make it fancier by letting AM 
override them in the start container request.

This jira will be used to limit localization time and we open others if we feel 
we need to limit other operations.

  was:
There are no timeouts that can be used to limit the time taken by various 
container startup operations. Localization for example could take a long time 
and there is no way to kill an task if its stuck in these states. These may 
have nothing to do with the task itself and could be an issue within the 
platform. 

Ideally there should be configurable limits for various states within the 
NodeManager to limit various states. The RM does not care about most of these 
and its only between AM and the NM. We can start by making these global 
configurable defaults and in future we can make it fancier by letting AM 
override them in the start container request.

This jira will be used to limit localization time and we open others if we feel 
we need to limit other operations.


 Container localization has no timeouts and tasks can be stuck there for a 
 long time
 ---

 Key: YARN-2175
 URL: https://issues.apache.org/jira/browse/YARN-2175
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot

 There are no timeouts that can be used to limit the time taken by various 
 container startup operations. Localization for example could take a long time 
 and there is no automated way to kill an task if its stuck in these states. 
 These may have nothing to do with the task itself and could be an issue 
 within the platform. 
 Ideally there should be configurable limits for various states within the 
 NodeManager to limit various states. The RM does not care about most of these 
 and its only between AM and the NM. We can start by making these global 
 configurable defaults and in future we can make it fancier by letting AM 
 override them in the start container request.
 This jira will be used to limit localization time and we open others if we 
 feel we need to limit other operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2175) Container localization has no timeouts and tasks can be stuck there for a long time

2014-06-17 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2175:


Affects Version/s: 2.4.0

 Container localization has no timeouts and tasks can be stuck there for a 
 long time
 ---

 Key: YARN-2175
 URL: https://issues.apache.org/jira/browse/YARN-2175
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Anubhav Dhoot

 There are no timeouts that can be used to limit the time taken by various 
 container startup operations. Localization for example could take a long time 
 and there is no way to kill an task if its stuck in these states. These may 
 have nothing to do with the task itself and could be an issue within the 
 platform. 
 Ideally there should be configurable limits for various states within the 
 NodeManager to limit various states. The RM does not care about most of these 
 and its only between AM and the NM. We can start by making these global 
 configurable defaults and in future we can make it fancier by letting AM 
 override them in the start container request.
 This jira will be used to limit localization time and we open others if we 
 feel we need to limit other operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)