[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers
[ https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589296#comment-13589296 ] Bikas Saha commented on YARN-417: - I think if ContainerExitCodes needs to be added then it should be its own jira because its an addition to the YARN API and should be kept distinct from this jira. This jira could be marked dependent on that jira. Its also missing out of memory, preemption from what I see in the patch. ContainerRequest is something thats tightly coupled with the AMRMClient and hence I had put it inside AMRMClient. Its expected to be used in other places and thats why its public. The helper function would have helped because containers contain information set by 2 entities - RM & NM. And its "status" is a combination of containerState and containerExitCode. e.g. state could be running in which case exit codes dont matter. The state could be completed in which case the exit code can tell us where it was killed or not. The exit code may not be enough because the RM could preempt a container before its launched and hence may not have a real exit code. Exit codes are not portable across platforms (eg. Windows and Linux). The helper function lets the library hide all this and present a single status value for the user to look at. Whether the container is allocated, running, completed_with_success, killed, preempted, out of memory etc. At some point this could move into YARN but as it evolves, the library might be a good place to house it. Does that help clarify its utility? Why is client.start() being called in init? client.stop() is being called in stop(). {code} + @Override + public void init(Configuration conf) { +super.init(conf); +client.init(conf); +client.start(); + } {code} Not waiting for the thread to join()? Why interrupt()? Thread needs to be stopped first so that it stops calling into the client. or else it can call into a client that has already stopped. {code} + @Override + public void stop() { +client.stop(); +keepRunning = false; +thread.interrupt(); + } {code} I am wary of calling back on the heartbeat thread itself. If you notice the interface patch I had uploaded, I had left some comments on moving this to its own thread. This is important because the callback code can be arbitrary and may not complete in time for our heartbeat, specially with 1000's of containers. We cannot let our heartbeat rate be dependent on app code performance. > Add a poller that allows the AM to receive notifications when it is assigned > containers > --- > > Key: YARN-417 > URL: https://issues.apache.org/jira/browse/YARN-417 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, > YARN-417-1.patch, YARN-417.patch, YarnAppMaster.java, > YarnAppMasterListener.java > > > Writing AMs would be easier for some if they did not have to handle > heartbeating to the RM on their own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers
[ https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589185#comment-13589185 ] Hadoop QA commented on YARN-417: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571354/YARN-417-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/445//console This message is automatically generated. > Add a poller that allows the AM to receive notifications when it is assigned > containers > --- > > Key: YARN-417 > URL: https://issues.apache.org/jira/browse/YARN-417 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, > YARN-417-1.patch, YARN-417.patch, YarnAppMaster.java, > YarnAppMasterListener.java > > > Writing AMs would be easier for some if they did not have to handle > heartbeating to the RM on their own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-323) Yarn CLI commands prints classpath
[ https://issues.apache.org/jira/browse/YARN-323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kapoor updated YARN-323: - Priority: Trivial (was: Minor) > Yarn CLI commands prints classpath > -- > > Key: YARN-323 > URL: https://issues.apache.org/jira/browse/YARN-323 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.0.1-alpha >Reporter: Nishan Shetty >Priority: Trivial > > Execute ./yarn commands. It will print classpath in console -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers
[ https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589182#comment-13589182 ] Sandy Ryza commented on YARN-417: - Uploaded a second cut with what was discussed above. One more thought: would it make sense to take ContainerRequest out of AMRMClient as its now used in places where AMRMClient is not? > Add a poller that allows the AM to receive notifications when it is assigned > containers > --- > > Key: YARN-417 > URL: https://issues.apache.org/jira/browse/YARN-417 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, > YARN-417-1.patch, YARN-417.patch, YarnAppMaster.java, > YarnAppMasterListener.java > > > Writing AMs would be easier for some if they did not have to handle > heartbeating to the RM on their own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-323) Yarn CLI commands prints classpath
[ https://issues.apache.org/jira/browse/YARN-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589183#comment-13589183 ] Abhishek Kapoor commented on YARN-323: -- I dont see the classpath being printed on console. Please confirm, or close the issue. Thanks Abhishek > Yarn CLI commands prints classpath > -- > > Key: YARN-323 > URL: https://issues.apache.org/jira/browse/YARN-323 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.0.1-alpha >Reporter: Nishan Shetty >Priority: Minor > > Execute ./yarn commands. It will print classpath in console -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers
[ https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-417: Attachment: YARN-417-1.patch > Add a poller that allows the AM to receive notifications when it is assigned > containers > --- > > Key: YARN-417 > URL: https://issues.apache.org/jira/browse/YARN-417 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, > YARN-417-1.patch, YARN-417.patch, YarnAppMaster.java, > YarnAppMasterListener.java > > > Writing AMs would be easier for some if they did not have to handle > heartbeating to the RM on their own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers
[ https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589177#comment-13589177 ] Sandy Ryza commented on YARN-417: - That's sounds right Chris. Will include that in the class's doc. I've thought a little more about the ContainerCompletionReason, and I'm not sure it's necessary, as there are already constants in YarnConfiguration for the special exit codes, and there are only two, ABORTED_CONTAINER_EXIT_STATUS and DISK_FAILED. As these don't really have to do with configuration, it might make sense to move them to a ContainerExitCodes class, and just point to that class in the doc for ContainerStatus#getExitCode and CallbackHandler#onContainersCompleted > Add a poller that allows the AM to receive notifications when it is assigned > containers > --- > > Key: YARN-417 > URL: https://issues.apache.org/jira/browse/YARN-417 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, > YARN-417.patch, YarnAppMaster.java, YarnAppMasterListener.java > > > Writing AMs would be easier for some if they did not have to handle > heartbeating to the RM on their own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-376) Apps that have completed can appear as RUNNING on the NM UI
[ https://issues.apache.org/jira/browse/YARN-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589175#comment-13589175 ] Hadoop QA commented on YARN-376: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571301/YARN-376.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/444//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/444//console This message is automatically generated. > Apps that have completed can appear as RUNNING on the NM UI > --- > > Key: YARN-376 > URL: https://issues.apache.org/jira/browse/YARN-376 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.3-alpha, 0.23.6 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Attachments: YARN-376.patch, YARN-376.patch, YARN-376.patch > > > On a busy cluster we've noticed a growing number of applications appear as > RUNNING on a nodemanager web pages but the applications have long since > finished. Looking at the NM logs, it appears the RM never told the > nodemanager that the application had finished. This is also reflected in a > jstack of the NM process, since many more log aggregation threads are running > then one would expect from the number of actively running applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-376) Apps that have completed can appear as RUNNING on the NM UI
[ https://issues.apache.org/jira/browse/YARN-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589136#comment-13589136 ] Jason Lowe commented on YARN-376: - The eclipse failure appears to be unrelated, as it builds fine for me locally. Also I can't see how this change would affect the eclipse:eclipse build which is failing in hadoop-common. > Apps that have completed can appear as RUNNING on the NM UI > --- > > Key: YARN-376 > URL: https://issues.apache.org/jira/browse/YARN-376 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.3-alpha, 0.23.6 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Attachments: YARN-376.patch, YARN-376.patch, YARN-376.patch > > > On a busy cluster we've noticed a growing number of applications appear as > RUNNING on a nodemanager web pages but the applications have long since > finished. Looking at the NM logs, it appears the RM never told the > nodemanager that the application had finished. This is also reflected in a > jstack of the NM process, since many more log aggregation threads are running > then one would expect from the number of actively running applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-376) Apps that have completed can appear as RUNNING on the NM UI
[ https://issues.apache.org/jira/browse/YARN-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588936#comment-13588936 ] Hadoop QA commented on YARN-376: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571301/YARN-376.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/443//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/443//console This message is automatically generated. > Apps that have completed can appear as RUNNING on the NM UI > --- > > Key: YARN-376 > URL: https://issues.apache.org/jira/browse/YARN-376 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.3-alpha, 0.23.6 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Attachments: YARN-376.patch, YARN-376.patch, YARN-376.patch > > > On a busy cluster we've noticed a growing number of applications appear as > RUNNING on a nodemanager web pages but the applications have long since > finished. Looking at the NM logs, it appears the RM never told the > nodemanager that the application had finished. This is also reflected in a > jstack of the NM process, since many more log aggregation threads are running > then one would expect from the number of actively running applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-380) yarn node -status prints Last-Last-Health-Update
[ https://issues.apache.org/jira/browse/YARN-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588930#comment-13588930 ] Hadoop QA commented on YARN-380: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571303/issues-yarn-380.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/442//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/442//console This message is automatically generated. > yarn node -status prints Last-Last-Health-Update > > > Key: YARN-380 > URL: https://issues.apache.org/jira/browse/YARN-380 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.0.3-alpha >Reporter: Thomas Graves >Assignee: omkar vinit joshi > Labels: usability > Attachments: issues-yarn-380.patch > > > I assume the Last-Last-Health-Update is a typo and it should just be > Last-Health-Update. > $ yarn node -status foo.com:8041 > Node Report : > Node-Id : foo.com:8041 > Rack : /10.10.10.0 > Node-State : RUNNING > Node-Http-Address : foo.com:8042 > Health-Status(isNodeHealthy) : true > Last-Last-Health-Update : 1360118400219 > Health-Report : > Containers : 0 > Memory-Used : 0M > Memory-Capacity : 24576 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-380) yarn node -status prints Last-Last-Health-Update
[ https://issues.apache.org/jira/browse/YARN-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] omkar vinit joshi updated YARN-380: --- Attachment: (was: issue-yarn-380.patch) > yarn node -status prints Last-Last-Health-Update > > > Key: YARN-380 > URL: https://issues.apache.org/jira/browse/YARN-380 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.0.3-alpha >Reporter: Thomas Graves >Assignee: omkar vinit joshi > Labels: usability > Attachments: issues-yarn-380.patch > > > I assume the Last-Last-Health-Update is a typo and it should just be > Last-Health-Update. > $ yarn node -status foo.com:8041 > Node Report : > Node-Id : foo.com:8041 > Rack : /10.10.10.0 > Node-State : RUNNING > Node-Http-Address : foo.com:8042 > Health-Status(isNodeHealthy) : true > Last-Last-Health-Update : 1360118400219 > Health-Report : > Containers : 0 > Memory-Used : 0M > Memory-Capacity : 24576 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-380) yarn node -status prints Last-Last-Health-Update
[ https://issues.apache.org/jira/browse/YARN-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588907#comment-13588907 ] omkar vinit joshi commented on YARN-380: Yes .. T was for timezone.(-08:00 for PST). I am making output more readable. > yarn node -status prints Last-Last-Health-Update > > > Key: YARN-380 > URL: https://issues.apache.org/jira/browse/YARN-380 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.0.3-alpha >Reporter: Thomas Graves >Assignee: omkar vinit joshi > Labels: usability > Attachments: issues-yarn-380.patch > > > I assume the Last-Last-Health-Update is a typo and it should just be > Last-Health-Update. > $ yarn node -status foo.com:8041 > Node Report : > Node-Id : foo.com:8041 > Rack : /10.10.10.0 > Node-State : RUNNING > Node-Http-Address : foo.com:8042 > Health-Status(isNodeHealthy) : true > Last-Last-Health-Update : 1360118400219 > Health-Report : > Containers : 0 > Memory-Used : 0M > Memory-Capacity : 24576 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-380) yarn node -status prints Last-Last-Health-Update
[ https://issues.apache.org/jira/browse/YARN-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] omkar vinit joshi updated YARN-380: --- Attachment: issues-yarn-380.patch > yarn node -status prints Last-Last-Health-Update > > > Key: YARN-380 > URL: https://issues.apache.org/jira/browse/YARN-380 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.0.3-alpha >Reporter: Thomas Graves >Assignee: omkar vinit joshi > Labels: usability > Attachments: issues-yarn-380.patch > > > I assume the Last-Last-Health-Update is a typo and it should just be > Last-Health-Update. > $ yarn node -status foo.com:8041 > Node Report : > Node-Id : foo.com:8041 > Rack : /10.10.10.0 > Node-State : RUNNING > Node-Http-Address : foo.com:8042 > Health-Status(isNodeHealthy) : true > Last-Last-Health-Update : 1360118400219 > Health-Report : > Containers : 0 > Memory-Used : 0M > Memory-Capacity : 24576 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-376) Apps that have completed can appear as RUNNING on the NM UI
[ https://issues.apache.org/jira/browse/YARN-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-376: Attachment: YARN-376.patch Thanks for the review, Sidd. I originally had it update the heartbeat since the RMNode interface already knew about the heartbeat type and it's more efficient (don't need to create an extra copy of the app list and grab the write lock only once instead of twice). Updated to change get*ToCleanup to pull*ToCleanup and test no longer needs the heartbeat response since it no longer updates it directly. > Apps that have completed can appear as RUNNING on the NM UI > --- > > Key: YARN-376 > URL: https://issues.apache.org/jira/browse/YARN-376 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.3-alpha, 0.23.6 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Attachments: YARN-376.patch, YARN-376.patch, YARN-376.patch > > > On a busy cluster we've noticed a growing number of applications appear as > RUNNING on a nodemanager web pages but the applications have long since > finished. Looking at the NM logs, it appears the RM never told the > nodemanager that the application had finished. This is also reflected in a > jstack of the NM process, since many more log aggregation threads are running > then one would expect from the number of actively running applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-433) When RM is catching up with node updates then it should not expire acquired containers
Bikas Saha created YARN-433: --- Summary: When RM is catching up with node updates then it should not expire acquired containers Key: YARN-433 URL: https://issues.apache.org/jira/browse/YARN-433 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong RM expires containers that are not launched within some time of being allocated. The default is 10mins. When an RM is not keeping up with node updates then it may not be aware of new launched containers. If the expire thread fires for such containers then the RM can expire them even though they may have launched. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-237) Refreshing the RM page forgets how many rows I had in my Datatables
[ https://issues.apache.org/jira/browse/YARN-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1351#comment-1351 ] Hadoop QA commented on YARN-237: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571286/YARN-237.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/441//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/441//console This message is automatically generated. > Refreshing the RM page forgets how many rows I had in my Datatables > --- > > Key: YARN-237 > URL: https://issues.apache.org/jira/browse/YARN-237 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 0.23.4, 3.0.0 >Reporter: Ravi Prakash >Assignee: jian he > Labels: usability > Attachments: YARN-237.patch > > > If I choose a 100 rows, and then refresh the page, DataTables goes back to > showing me 20 rows. > This user preference should be stored in a cookie. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-237) Refreshing the RM page forgets how many rows I had in my Datatables
[ https://issues.apache.org/jira/browse/YARN-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian he updated YARN-237: - Attachment: YARN-237.patch fix the problem of state saving at RM page > Refreshing the RM page forgets how many rows I had in my Datatables > --- > > Key: YARN-237 > URL: https://issues.apache.org/jira/browse/YARN-237 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 0.23.4, 3.0.0 >Reporter: Ravi Prakash >Assignee: jian he > Labels: usability > Attachments: YARN-237.patch > > > If I choose a 100 rows, and then refresh the page, DataTables goes back to > showing me 20 rows. > This user preference should be stored in a cookie. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler
[ https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588861#comment-13588861 ] Xuan Gong commented on YARN-365: bq:Do we need to worry about there being overlap between the 2 lists. i.e. a newlyLaunchedContainer also got completed by the time the slow RM handled the NM updates? Thanks for the comments. I think we are fine here. The way to handle newlyLaunchedContainers is to submit a LAUNCHED event to RMContainerImpl, and RMContainerImpl will unregister(remove) this container from containerAllocationExpirer list. That is how we handle the newlyLaunchedContainers. It does not actually launch the container. Just tell the RM that this container is being used right now. > Each NM heartbeat should not generate an event for the Scheduler > > > Key: YARN-365 > URL: https://issues.apache.org/jira/browse/YARN-365 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Affects Versions: 0.23.5 >Reporter: Siddharth Seth >Assignee: Xuan Gong > Fix For: 2.0.4-beta > > Attachments: Prototype2.txt, Prototype3.txt, YARN-365.10.patch, > YARN-365.1.patch, YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, > YARN-365.5.patch, YARN-365.6.patch, YARN-365.7.patch, YARN-365.8.patch, > YARN-365.9.patch > > > Follow up from YARN-275 > https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager
[ https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588856#comment-13588856 ] Hadoop QA commented on YARN-198: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571280/YARN-198.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/440//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/440//console This message is automatically generated. > If we are navigating to Nodemanager UI from Resourcemanager,then there is not > link to navigate back to Resource manager > --- > > Key: YARN-198 > URL: https://issues.apache.org/jira/browse/YARN-198 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Ramgopal N >Assignee: jian he >Priority: Minor > Labels: usability > Attachments: YARN-198.patch > > > If we are navigating to Nodemanager by clicking on the node link in RM,there > is no link provided on the NM to navigate back to RM. > If there is a link to navigate back to RM it would be good -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-430) Add HDFS based store for RM
[ https://issues.apache.org/jira/browse/YARN-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian he reassigned YARN-430: Assignee: jian he (was: Bikas Saha) > Add HDFS based store for RM > --- > > Key: YARN-430 > URL: https://issues.apache.org/jira/browse/YARN-430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: jian he > > There is a generic FileSystem store but it does not take advantage of HDFS > features like directories, replication, DFSClient advanced settings for HA, > retries etc. Writing a store thats optimized for HDFS would be good. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-380) yarn node -status prints Last-Last-Health-Update
[ https://issues.apache.org/jira/browse/YARN-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588840#comment-13588840 ] Vinod Kumar Vavilapalli commented on YARN-380: -- Looked at the patch. One comment: - When I tried to print the output by modifying the test itself, it says "Last-Health-Update : 1969-12-31T16:00:00-08:00", not sure if you are seeing the extraneous T character or not. Please verify. If it is indeed like that, we will need to fix it. > yarn node -status prints Last-Last-Health-Update > > > Key: YARN-380 > URL: https://issues.apache.org/jira/browse/YARN-380 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.0.3-alpha >Reporter: Thomas Graves >Assignee: omkar vinit joshi > Labels: usability > Attachments: issue-yarn-380.patch > > > I assume the Last-Last-Health-Update is a typo and it should just be > Last-Health-Update. > $ yarn node -status foo.com:8041 > Node Report : > Node-Id : foo.com:8041 > Rack : /10.10.10.0 > Node-State : RUNNING > Node-Http-Address : foo.com:8042 > Health-Status(isNodeHealthy) : true > Last-Last-Health-Update : 1360118400219 > Health-Report : > Containers : 0 > Memory-Used : 0M > Memory-Capacity : 24576 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager
[ https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian he updated YARN-198: - Attachment: YARN-198.patch add a link at NM page to navigate back to RM page > If we are navigating to Nodemanager UI from Resourcemanager,then there is not > link to navigate back to Resource manager > --- > > Key: YARN-198 > URL: https://issues.apache.org/jira/browse/YARN-198 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Ramgopal N >Assignee: jian he >Priority: Minor > Labels: usability > Attachments: YARN-198.patch > > > If we are navigating to Nodemanager by clicking on the node link in RM,there > is no link provided on the NM to navigate back to RM. > If there is a link to navigate back to RM it would be good -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler
[ https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588818#comment-13588818 ] Bikas Saha commented on YARN-365: - Do we need to worry about there being overlap between the 2 lists. i.e. a newlyLaunchedContainer also got completed by the time the slow RM handled the NM updates? {code} + private synchronized void nodeUpdate(RMNode nm) { if (LOG.isDebugEnabled()) { LOG.debug("nodeUpdate: " + nm + " clusterResources: " + clusterResource); } - -FiCaSchedulerNode node = getNode(nm.getNodeID()); +FiCaSchedulerNode node = getNode(nm.getNodeID()); +List containerInfoList = nm.pullContainerUpdates(); +List newlyLaunchedContainers = new ArrayList(); +List completedContainers = new ArrayList(); +for(UpdatedContainerInfo containerInfo : containerInfoList) { + newlyLaunchedContainers.addAll(containerInfo.getNewlyLaunchedContainers()); + completedContainers.addAll(containerInfo.getCompletedContainers()); +} + {code} Note than this problem (if it is a problem) exists regardless of this change because a container may start and complete within the NM heartbeat interval. However, chances of hitting it are low before this change because the heartbeat interval is short and so the RM never see a node update in which the same container both launches and completes. After this change, with a slow RM, this can easily happen, specially because we are simply concatenating both sub-lists. > Each NM heartbeat should not generate an event for the Scheduler > > > Key: YARN-365 > URL: https://issues.apache.org/jira/browse/YARN-365 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Affects Versions: 0.23.5 >Reporter: Siddharth Seth >Assignee: Xuan Gong > Fix For: 2.0.4-beta > > Attachments: Prototype2.txt, Prototype3.txt, YARN-365.10.patch, > YARN-365.1.patch, YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, > YARN-365.5.patch, YARN-365.6.patch, YARN-365.7.patch, YARN-365.8.patch, > YARN-365.9.patch > > > Follow up from YARN-275 > https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-432) Documentation for Log Aggregation and log retrieval.
[ https://issues.apache.org/jira/browse/YARN-432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-432: - Issue Type: Sub-task (was: Bug) Parent: YARN-431 > Documentation for Log Aggregation and log retrieval. > > > Key: YARN-432 > URL: https://issues.apache.org/jira/browse/YARN-432 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Mahadev konar >Assignee: Siddharth Seth > > Retrieving logs in 0.23 is very different from what 0.20.* does. This is a > very new feature which will require good documentation for users to get used > to it. Lets make sure we have some solid documentation for this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-432) Documentation for Log Aggregation and log retrieval.
[ https://issues.apache.org/jira/browse/YARN-432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli moved MAPREDUCE-3743 to YARN-432: - Component/s: (was: mrv2) Affects Version/s: (was: 0.23.0) Key: YARN-432 (was: MAPREDUCE-3743) Project: Hadoop YARN (was: Hadoop Map/Reduce) > Documentation for Log Aggregation and log retrieval. > > > Key: YARN-432 > URL: https://issues.apache.org/jira/browse/YARN-432 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Mahadev konar >Assignee: Siddharth Seth > > Retrieving logs in 0.23 is very different from what 0.20.* does. This is a > very new feature which will require good documentation for users to get used > to it. Lets make sure we have some solid documentation for this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-395) RM should have a way to disable scheduling to a set of nodes
[ https://issues.apache.org/jira/browse/YARN-395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-395: - Issue Type: Sub-task (was: Improvement) Parent: YARN-397 > RM should have a way to disable scheduling to a set of nodes > > > Key: YARN-395 > URL: https://issues.apache.org/jira/browse/YARN-395 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Arun C Murthy > > There should be a way to say schedule to A, B and C but never to D. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-173) Page navigation support for container logs page
[ https://issues.apache.org/jira/browse/YARN-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-173: - Labels: usability (was: ) > Page navigation support for container logs page > --- > > Key: YARN-173 > URL: https://issues.apache.org/jira/browse/YARN-173 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.0.2-alpha, 0.23.3 >Reporter: Jason Lowe > Labels: usability > > ContainerLogsPage and AggregatedLogsBlock both support {{start}} and {{end}} > parameters which are a big help when trying to sift through a huge log. > However it's annoying to have to manually edit the URL to go through a giant > log page-by-page. It would be very handy if the web page also provided page > navigation links so flipping to the next/previous/first/last chunk of log is > a simple click away. Bonus points for providing a way to easily change the > size of the log chunk shown per page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-200) yarn log does not output all needed information, and is in a binary format
[ https://issues.apache.org/jira/browse/YARN-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-200: - Issue Type: Sub-task (was: Bug) Parent: YARN-431 > yarn log does not output all needed information, and is in a binary format > -- > > Key: YARN-200 > URL: https://issues.apache.org/jira/browse/YARN-200 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 0.23.5 >Reporter: Robert Joseph Evans > Labels: usability > > yarn logs does not output attemptid, nodename, or container-id. Missing > these makes it very difficult to look through the logs for failed containers > and tie them back to actual tasks and task attempts. > Also the output currently includes several binary characters. This is OK for > being machine readable, but difficult for being human readable, or even for > using standard tool like grep. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-200) yarn log does not output all needed information, and is in a binary format
[ https://issues.apache.org/jira/browse/YARN-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-200: - Labels: usability (was: ) > yarn log does not output all needed information, and is in a binary format > -- > > Key: YARN-200 > URL: https://issues.apache.org/jira/browse/YARN-200 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 0.23.5 >Reporter: Robert Joseph Evans > Labels: usability > > yarn logs does not output attemptid, nodename, or container-id. Missing > these makes it very difficult to look through the logs for failed containers > and tie them back to actual tasks and task attempts. > Also the output currently includes several binary characters. This is OK for > being machine readable, but difficult for being human readable, or even for > using standard tool like grep. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-196) Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM
[ https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588696#comment-13588696 ] Hitesh Shah commented on YARN-196: -- Also, description in yarn-default.xml should mention the value is specified in seconds. > Nodemanager if started before starting Resource manager is getting > shutdown.But if both RM and NM are started and then after if RM is going > down,NM is retrying for the RM. > --- > > Key: YARN-196 > URL: https://issues.apache.org/jira/browse/YARN-196 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Ramgopal N >Assignee: Xuan Gong > Attachments: MAPREDUCE-3676.patch, YARN-196.1.patch, > YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch, YARN-196.5.patch > > > If NM is started before starting the RM ,NM is shutting down with the > following error > {code} > ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting > services org.apache.hadoop.yarn.server.nodemanager.NodeManager > org.apache.avro.AvroRuntimeException: > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149) > at > org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242) > Caused by: java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145) > ... 3 more > Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: > Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on > connection exception: java.net.ConnectException: Connection refused; For more > details see: http://wiki.apache.org/hadoop/ConnectionRefused > at > org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131) > at $Proxy23.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59) > ... 5 more > Caused by: java.net.ConnectException: Call From > HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection > exception: java.net.ConnectException: Connection refused; For more details > see: http://wiki.apache.org/hadoop/ConnectionRefused > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857) > at org.apache.hadoop.ipc.Client.call(Client.java:1141) > at org.apache.hadoop.ipc.Client.call(Client.java:1100) > at > org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128) > ... 7 more > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563) > at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247) > at org.apache.hadoop.ipc.Client.call(Client.java:1117) > ... 9 more > 2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: > AsyncDispatcher thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76) > at java.l
[jira] [Resolved] (YARN-324) Provide way to preserve container directories
[ https://issues.apache.org/jira/browse/YARN-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-324. -- Resolution: Invalid Lohit, as Jason mentioned, yarn.nodemanager.delete.debug-delay-sec should work for you. Please reopen this ticket if you disagree. > Provide way to preserve container directories > - > > Key: YARN-324 > URL: https://issues.apache.org/jira/browse/YARN-324 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.0.3-alpha >Reporter: Lohit Vijayarenu > > There should be a way to preserve container directories (along with > filecache/appcache) for offline debugging. As of today, if container > completes (either success or failure) it would get cleaned up. In case of > failure it becomes very hard to debug to find out what the case of failure > is. Having ability to preserve container directories will enable one to log > into the machine and debug further for failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-196) Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM
[ https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588694#comment-13588694 ] Hitesh Shah commented on YARN-196: -- + public static final int DEFAULT_RESOURCEMANAGER_CONNECT_WAIT_SECS = + 15*60*1000; + public static final long DEFAULT_RESOURCEMANAGER_CONNECT_RETRY_INTERVAL_SECS = + 30*1000; Variable says seconds but value is still in milliseconds? Likewise for yarn-default.xml +long rmConnectWaitMS = +conf.getInt( +YarnConfiguration.RESOURCEMANAGER_CONNECT_WAIT_SECS, +YarnConfiguration.DEFAULT_RESOURCEMANAGER_CONNECT_WAIT_SECS); +long rmConnectionRetryIntervalMS = +conf.getLong( +YarnConfiguration.RESOURCEMANAGER_CONNECT_RETRY_INTERVAL_SECS, + YarnConfiguration.DEFAULT_RESOURCEMANAGER_CONNECT_RETRY_INTERVAL_SECS); Above variables could be set using *1000 to keep code clean. Special handling needed for -1. > Nodemanager if started before starting Resource manager is getting > shutdown.But if both RM and NM are started and then after if RM is going > down,NM is retrying for the RM. > --- > > Key: YARN-196 > URL: https://issues.apache.org/jira/browse/YARN-196 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Ramgopal N >Assignee: Xuan Gong > Attachments: MAPREDUCE-3676.patch, YARN-196.1.patch, > YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch, YARN-196.5.patch > > > If NM is started before starting the RM ,NM is shutting down with the > following error > {code} > ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting > services org.apache.hadoop.yarn.server.nodemanager.NodeManager > org.apache.avro.AvroRuntimeException: > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149) > at > org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242) > Caused by: java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145) > ... 3 more > Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: > Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on > connection exception: java.net.ConnectException: Connection refused; For more > details see: http://wiki.apache.org/hadoop/ConnectionRefused > at > org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131) > at $Proxy23.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59) > ... 5 more > Caused by: java.net.ConnectException: Call From > HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection > exception: java.net.ConnectException: Connection refused; For more details > see: http://wiki.apache.org/hadoop/ConnectionRefused > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857) > at org.apache.hadoop.ipc.Client.call(Client.java:1141) > at org.apache.hadoop.ipc.Client.call(Client.java:1100) > at > org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128) > ... 7 more > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563) > at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1
[jira] [Updated] (YARN-226) Log aggregation should not assume an AppMaster will have containerId 1
[ https://issues.apache.org/jira/browse/YARN-226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-226: - Issue Type: Sub-task (was: Bug) Parent: YARN-431 > Log aggregation should not assume an AppMaster will have containerId 1 > -- > > Key: YARN-226 > URL: https://issues.apache.org/jira/browse/YARN-226 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Siddharth Seth > > In case of reservcations, etc - AppMasters may not get container id 1. We > likely need additional info in the CLC / tokens indicating whether a > container is an AM or not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-126) yarn rmadmin help message contains reference to hadoop cli and JT
[ https://issues.apache.org/jira/browse/YARN-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-126: - Labels: usability (was: ) > yarn rmadmin help message contains reference to hadoop cli and JT > - > > Key: YARN-126 > URL: https://issues.apache.org/jira/browse/YARN-126 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Thomas Graves > Labels: usability > > has option to specify a job tracker and the last line for general command > line syntax had "bin/hadoop command [genericOptions] [commandOptions]" > ran "yarn rmadmin" to get usage: > RMAdmin > Usage: java RMAdmin >[-refreshQueues] >[-refreshNodes] >[-refreshUserToGroupsMappings] >[-refreshSuperUserGroupsConfiguration] >[-refreshAdminAcls] >[-refreshServiceAcl] >[-help [cmd]] > Generic options supported are > -conf specify an application configuration file > -D use value for given property > -fs specify a namenode > -jt specify a job tracker > -files specify comma separated files to be > copied to the map reduce cluster > -libjars specify comma separated jar files > to include in the classpath. > -archives specify comma separated > archives to be unarchived on the compute machines. > The general command line syntax is > bin/hadoop command [genericOptions] [commandOptions] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-171) NodeManager should serve logs directly if log-aggregation is not enabled
[ https://issues.apache.org/jira/browse/YARN-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-171: - Issue Type: Sub-task (was: Bug) Parent: YARN-431 > NodeManager should serve logs directly if log-aggregation is not enabled > > > Key: YARN-171 > URL: https://issues.apache.org/jira/browse/YARN-171 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 0.23.3 >Reporter: Vinod Kumar Vavilapalli >Assignee: Siddharth Seth > Attachments: YARN171_WIP.txt > > > NodeManagers never serve logs for completed applications. If log-aggregation > is not enabled, in the interim, due to bugs like YARN-162, this is a serious > problem for users as logs are necessarily not available. > We should let nodes serve logs directly if > YarnConfiguration.LOG_AGGREGATION_ENABLED is set. This should be okay as > NonAggregatingLogHandler can retain logs upto > YarnConfiguration.NM_LOG_RETAIN_SECONDS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-431) [Umbrella] Complete/Stabilize YARN appliation log-aggregation
Vinod Kumar Vavilapalli created YARN-431: Summary: [Umbrella] Complete/Stabilize YARN appliation log-aggregation Key: YARN-431 URL: https://issues.apache.org/jira/browse/YARN-431 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-85) Allow per job log aggregation configuration
[ https://issues.apache.org/jira/browse/YARN-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-85: Issue Type: Sub-task (was: Improvement) Parent: YARN-431 > Allow per job log aggregation configuration > --- > > Key: YARN-85 > URL: https://issues.apache.org/jira/browse/YARN-85 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > > Currently, if log aggregation is enabled for a cluster - logs for all jobs > will be aggregated - leading to a whole bunch of files on hdfs which users > may not want. > Users should be able to control this along with the aggregation policy - > failed only, all, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-239) Make link in "Aggregation is not enabled. Try the nodemanager at"
[ https://issues.apache.org/jira/browse/YARN-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-239: - Labels: usability (was: ) > Make link in "Aggregation is not enabled. Try the nodemanager at" > - > > Key: YARN-239 > URL: https://issues.apache.org/jira/browse/YARN-239 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.0.0-alpha >Reporter: Radim Kolar >Priority: Trivial > Labels: usability > > if log aggregation is disabled message is displayed > *Aggregation is not enabled. Try the nodemanager at reavers.com:9006* > It would be helpfull to make link to nodemanager clickable. > This message is located in > /hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/log/AggregatedLogsBlock.java > but i could not figure out how to make link in hamlet framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers
[ https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588671#comment-13588671 ] Chris Riccomini commented on YARN-417: -- Hey Guys, I also agree with the comments and nits that Sandy/Karthik pointed out. Just so I'm clear, the way this would be used a main thread would be: 1. instantiate 2. call init 3. call start 4. call register 5. make initial container request 6. wait until containers complete I think what I'd probably end up doing for #6 is just using a countdown latch that I'd wait on, and the callback decrements whenever a container completes. Probably good enough. Cheers, Chris > Add a poller that allows the AM to receive notifications when it is assigned > containers > --- > > Key: YARN-417 > URL: https://issues.apache.org/jira/browse/YARN-417 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, > YARN-417.patch, YarnAppMaster.java, YarnAppMasterListener.java > > > Writing AMs would be easier for some if they did not have to handle > heartbeating to the RM on their own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager
[ https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian he reassigned YARN-198: Assignee: jian he (was: Senthil V Kumar) Hey Senthil, I have a patch for this, mind if I take this over? > If we are navigating to Nodemanager UI from Resourcemanager,then there is not > link to navigate back to Resource manager > --- > > Key: YARN-198 > URL: https://issues.apache.org/jira/browse/YARN-198 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Ramgopal N >Assignee: jian he >Priority: Minor > Labels: usability > > If we are navigating to Nodemanager by clicking on the node link in RM,there > is no link provided on the NM to navigate back to RM. > If there is a link to navigate back to RM it would be good -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-237) Refreshing the RM page forgets how many rows I had in my Datatables
[ https://issues.apache.org/jira/browse/YARN-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian he reassigned YARN-237: Assignee: jian he > Refreshing the RM page forgets how many rows I had in my Datatables > --- > > Key: YARN-237 > URL: https://issues.apache.org/jira/browse/YARN-237 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 0.23.4, 3.0.0 >Reporter: Ravi Prakash >Assignee: jian he > Labels: usability > > If I choose a 100 rows, and then refresh the page, DataTables goes back to > showing me 20 rows. > This user preference should be stored in a cookie. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-430) Add HDFS based store for RM
Bikas Saha created YARN-430: --- Summary: Add HDFS based store for RM Key: YARN-430 URL: https://issues.apache.org/jira/browse/YARN-430 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Bikas Saha There is a generic FileSystem store but it does not take advantage of HDFS features like directories, replication, DFSClient advanced settings for HA, retries etc. Writing a store thats optimized for HDFS would be good. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers
[ https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-417: Attachment: AMRMClientAsync-1.java > Add a poller that allows the AM to receive notifications when it is assigned > containers > --- > > Key: YARN-417 > URL: https://issues.apache.org/jira/browse/YARN-417 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, > YARN-417.patch, YarnAppMaster.java, YarnAppMasterListener.java > > > Writing AMs would be easier for some if they did not have to handle > heartbeating to the RM on their own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers
[ https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588552#comment-13588552 ] Sandy Ryza commented on YARN-417: - Thanks for the interface proposal, Bikas. It looks good to me. Having a separate method invocation for each completed container shouldn't have significant performance impact, as Java inlines even virtual methods when it needs to (http://www.quora.com/How-many-CPU-instructions-are-typical-for-Java-method-call-overhead), but the single call with the list doesn't seem any worse to me. Nits: * missing an onReboot method * ContainerCompletionStatus is describing a cause more than a state, so a name like ContainerCompletionReason fits a little better to me * agree with Karthik that it would be much more intuitive for getContainerCompletionStatus to be in ContainerStatus. Is there a strong reason against this? Attaching an updated proposal > Add a poller that allows the AM to receive notifications when it is assigned > containers > --- > > Key: YARN-417 > URL: https://issues.apache.org/jira/browse/YARN-417 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: AMRMClientAsync.java, YARN-417.patch, > YarnAppMaster.java, YarnAppMasterListener.java > > > Writing AMs would be easier for some if they did not have to handle > heartbeating to the RM on their own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers
[ https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588501#comment-13588501 ] Karthik Kambatla commented on YARN-417: --- Thanks Bikas, the interface looks good. One comment though - shouldn't we move {{ContainerCompletionStatus}} and {{getContainerCompletionStatus}} to {{ContainerStatus}}? > Add a poller that allows the AM to receive notifications when it is assigned > containers > --- > > Key: YARN-417 > URL: https://issues.apache.org/jira/browse/YARN-417 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: AMRMClientAsync.java, YARN-417.patch, > YarnAppMaster.java, YarnAppMasterListener.java > > > Writing AMs would be easier for some if they did not have to handle > heartbeating to the RM on their own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-426) Failure to download a public resource on a node prevents further downloads of the resource from that node
[ https://issues.apache.org/jira/browse/YARN-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588437#comment-13588437 ] Hudson commented on YARN-426: - Integrated in Hadoop-trunk-Commit #3390 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3390/]) YARN-426. Failure to download a public resource prevents further downloads (Jason Lowe via bobby) (Revision 1450807) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1450807 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java > Failure to download a public resource on a node prevents further downloads of > the resource from that node > - > > Key: YARN-426 > URL: https://issues.apache.org/jira/browse/YARN-426 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.3-alpha, 0.23.6 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Fix For: 3.0.0, 0.23.7, 2.0.4-beta > > Attachments: YARN-426.patch > > > If the NM encounters an error while downloading a public resource, it fails > to empty the list of request events corresponding to the resource request in > {{attempts}}. If the same public resource is subsequently requested on that > node, {{PublicLocalizer.addResource}} will skip the download since it will > mistakenly believe a download of that resource is already in progress. At > that point any container that requests the public resource will just hang in > the {{LOCALIZING}} state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-426) Failure to download a public resource on a node prevents further downloads of the resource from that node
[ https://issues.apache.org/jira/browse/YARN-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588424#comment-13588424 ] Robert Joseph Evans commented on YARN-426: -- The patch looks good to me. +1 I'll check it in. > Failure to download a public resource on a node prevents further downloads of > the resource from that node > - > > Key: YARN-426 > URL: https://issues.apache.org/jira/browse/YARN-426 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.3-alpha, 0.23.6 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-426.patch > > > If the NM encounters an error while downloading a public resource, it fails > to empty the list of request events corresponding to the resource request in > {{attempts}}. If the same public resource is subsequently requested on that > node, {{PublicLocalizer.addResource}} will skip the download since it will > mistakenly believe a download of that resource is already in progress. At > that point any container that requests the public resource will just hang in > the {{LOCALIZING}} state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers
[ https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-417: Attachment: AMRMClientAsync.java Attaching an interface proposal > Add a poller that allows the AM to receive notifications when it is assigned > containers > --- > > Key: YARN-417 > URL: https://issues.apache.org/jira/browse/YARN-417 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: AMRMClientAsync.java, YARN-417.patch, > YarnAppMaster.java, YarnAppMasterListener.java > > > Writing AMs would be easier for some if they did not have to handle > heartbeating to the RM on their own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira