[jira] [Commented] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command

2014-01-23 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880029#comment-13880029
 ] 

Akira AJISAKA commented on YARN-1480:
-

Thanks for renewing the patch!
I built a document and it looks good. I also confirmed the CLI behavior.
Would you add a test case for invalid finalStatus as that for invalid appStates?

 RM web services getApps() accepts many more filters than ApplicationCLI 
 list command
 --

 Key: YARN-1480
 URL: https://issues.apache.org/jira/browse/YARN-1480
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-1480-2.patch, YARN-1480.patch


 Nowadays RM web services getApps() accepts many more filters than 
 ApplicationCLI list command, which only accepts state and type. IMHO, 
 ideally, different interfaces should provide consistent functionality. Is it 
 better to allow more filters in ApplicationCLI?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1628) TestContainerManagerSecurity fails on trunk

2014-01-23 Thread Mit Desai (JIRA)
Mit Desai created YARN-1628:
---

 Summary: TestContainerManagerSecurity fails on trunk
 Key: YARN-1628
 URL: https://issues.apache.org/jira/browse/YARN-1628
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0, 3.0.0
Reporter: Mit Desai
Assignee: Mit Desai


The Test fails with the following error

{noformat}
java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at 
org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145)
at 
org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1628) TestContainerManagerSecurity fails on trunk

2014-01-23 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1628:


Attachment: YARN-1628.patch

Attaching the patch.
The argument InvalidHost was not a valid argument which resulted into 
throwing UnknownHostException.

 TestContainerManagerSecurity fails on trunk
 ---

 Key: YARN-1628
 URL: https://issues.apache.org/jira/browse/YARN-1628
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-1628.patch


 The Test fails with the following error
 {noformat}
 java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost
   at 
 org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
   at 
 org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145)
   at 
 org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer

2014-01-23 Thread Sandy Ryza (JIRA)
Sandy Ryza created YARN-1629:


 Summary: IndexOutOfBoundsException in Fair Scheduler 
MaxRunningAppsEnforcer
 Key: YARN-1629
 URL: https://issues.apache.org/jira/browse/YARN-1629
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza


This can occur when the second-to-last app in a queue's pending app list is 
made runnable.  The app is pulled out from under the iterator. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-23 Thread Aditya Acharya (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Acharya updated YARN-1630:
-

Attachment: diff.txt

 Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
 forever
 ---

 Key: YARN-1630
 URL: https://issues.apache.org/jira/browse/YARN-1630
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Aditya Acharya
Assignee: Aditya Acharya
 Attachments: diff.txt


 I ran an MR2 application that would have been long running, and killed it 
 programmatically using a YarnClient. The app was killed, but the client hung 
 forever. The message that I saw, which spammed the logs, was Watiting for 
 application application_1389036507624_0018 to be killed.
 The RM log indicated that the app had indeed transitioned from RUNNING to 
 KILLED, but for some reason future responses to the RPC to kill the 
 application did not indicate that the app had been terminated.
 I tracked this down to YarnClientImpl.java, and though I was unable to 
 reproduce the bug, I wrote a patch to introduce a bound on the number of 
 times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-23 Thread Aditya Acharya (JIRA)
Aditya Acharya created YARN-1630:


 Summary: Unbounded waiting for response in YarnClientImpl.java 
causes thread to hang forever
 Key: YARN-1630
 URL: https://issues.apache.org/jira/browse/YARN-1630
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Aditya Acharya
Assignee: Aditya Acharya
 Attachments: diff.txt

I ran an MR2 application that would have been long running, and killed it 
programmatically using a YarnClient. The app was killed, but the client hung 
forever. The message that I saw, which spammed the logs, was Watiting for 
application application_1389036507624_0018 to be killed.

The RM log indicated that the app had indeed transitioned from RUNNING to 
KILLED, but for some reason future responses to the RPC to kill the application 
did not indicate that the app had been terminated.

I tracked this down to YarnClientImpl.java, and though I was unable to 
reproduce the bug, I wrote a patch to introduce a bound on the number of times 
that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer

2014-01-23 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1629:
-

Attachment: YARN-1629.patch

 IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer
 --

 Key: YARN-1629
 URL: https://issues.apache.org/jira/browse/YARN-1629
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1629.patch


 This can occur when the second-to-last app in a queue's pending app list is 
 made runnable.  The app is pulled out from under the iterator. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer

2014-01-23 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1629:
-

Attachment: YARN-1629-1.patch

 IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer
 --

 Key: YARN-1629
 URL: https://issues.apache.org/jira/browse/YARN-1629
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1629-1.patch, YARN-1629.patch


 This can occur when the second-to-last app in a queue's pending app list is 
 made runnable.  The app is pulled out from under the iterator. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer

2014-01-23 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1629:
-

Attachment: YARN-1629-2.patch

 IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer
 --

 Key: YARN-1629
 URL: https://issues.apache.org/jira/browse/YARN-1629
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1629-1.patch, YARN-1629-2.patch, YARN-1629.patch


 This can occur when the second-to-last app in a queue's pending app list is 
 made runnable.  The app is pulled out from under the iterator. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer

2014-01-23 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880466#comment-13880466
 ] 

Alejandro Abdelnur commented on YARN-1629:
--

+1

 IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer
 --

 Key: YARN-1629
 URL: https://issues.apache.org/jira/browse/YARN-1629
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1629-1.patch, YARN-1629-2.patch, YARN-1629.patch


 This can occur when the second-to-last app in a queue's pending app list is 
 made runnable.  The app is pulled out from under the iterator. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-23 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880725#comment-13880725
 ] 

Vinod Kumar Vavilapalli commented on YARN-1618:
---

Sorry Jian is away and I'm tied up with other things.

The point about saving app before scheduler acknowledges is a known issue. If 
that is the only issue, we can close as a duplicate of YARN-1507 which already 
exists.

I'll need to look carefully but this is why I think we did this. Because the 
save happens asynchronously, an app may get killed while the app is persisted 
but not acknowledged. That's the reason why it moves to FINAL_SAVING state 
first. Obviously, that also means  that there are cases like this when the 
original save request hasn't reached the state-store yet.

 Applications transition from NEW to FINAL_SAVING, and try to update 
 non-existing entries in the state-store
 ---

 Key: YARN-1618
 URL: https://issues.apache.org/jira/browse/YARN-1618
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1618-1.patch


 YARN-891 augments the RMStateStore to store information on completed 
 applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
 This leads to the RM trying to update entries in the state-store that do not 
 exist. On ZKRMStateStore, this leads to the RM crashing. 
 Previous description:
 ZKRMStateStore fails to handle updates to znodes that don't exist. For 
 instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
 In these cases, the store should create the missing znode and handle the 
 update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-23 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880745#comment-13880745
 ] 

Bikas Saha commented on YARN-1618:
--

App goes from NEW-NEW_SAVING upon receiving START. It goes from 
NEW-SAVING-SUBMITTED after app is saved. It goes from NEW_SAVING-FINAL_SAVING 
if killed while saving. All of these work fine. 
Given the above transitions, app should go from NEW-KILLED if its killed 
before receiving the START event. START event should be ignored in KILLED state 
(currently it is not ignored). So if START comes after KILL then its a no-op. 
If START comes before KILL then state store is fine since the app will first be 
saved and then updated.
Its interesting that we caught the race such that KILL came before START. The 
START should come almost immediately after the RMAppImpl object is created in a 
NEW state during regular app submission. Karthik, are we sure that this 
happened? This should not happen during recovery time since the RMAppImpl moves 
from NEW-NEXT_STATE after receiving the RECOVER event. RPC servers should not 
be running during recovery. Vinod, is it still the case that RPC servers are 
started after recovery is complete?

 Applications transition from NEW to FINAL_SAVING, and try to update 
 non-existing entries in the state-store
 ---

 Key: YARN-1618
 URL: https://issues.apache.org/jira/browse/YARN-1618
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1618-1.patch


 YARN-891 augments the RMStateStore to store information on completed 
 applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
 This leads to the RM trying to update entries in the state-store that do not 
 exist. On ZKRMStateStore, this leads to the RM crashing. 
 Previous description:
 ZKRMStateStore fails to handle updates to znodes that don't exist. For 
 instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
 In these cases, the store should create the missing znode and handle the 
 update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)