[jira] [Commented] (YARN-447) applicationComparator improvement for CS
[ https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619551#comment-13619551 ] Hadoop QA commented on YARN-447: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576484/YARN-447-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/644//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/644//console This message is automatically generated. applicationComparator improvement for CS Key: YARN-447 URL: https://issues.apache.org/jira/browse/YARN-447 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.3-alpha Reporter: nemon lou Assignee: nemon lou Priority: Minor Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch Now the compare code is : return a1.getApplicationId().getId() - a2.getApplicationId().getId(); Will be replaced with : return a1.getApplicationId().compareTo(a2.getApplicationId()); This will bring some benefits: 1,leave applicationId compare logic to ApplicationId class; 2,In future's HA mode,cluster time stamp may change,ApplicationId class already takes care of this condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-447) applicationComparator improvement for CS
[ https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619573#comment-13619573 ] Vinod Kumar Vavilapalli commented on YARN-447: -- Latest patch looks good. +1. Checking this in. applicationComparator improvement for CS Key: YARN-447 URL: https://issues.apache.org/jira/browse/YARN-447 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.3-alpha Reporter: nemon lou Assignee: nemon lou Priority: Minor Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch Now the compare code is : return a1.getApplicationId().getId() - a2.getApplicationId().getId(); Will be replaced with : return a1.getApplicationId().compareTo(a2.getApplicationId()); This will bring some benefits: 1,leave applicationId compare logic to ApplicationId class; 2,In future's HA mode,cluster time stamp may change,ApplicationId class already takes care of this condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality
[ https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619578#comment-13619578 ] Bikas Saha commented on YARN-392: - Sorry, I did not see that patch carefully and assumed that it does what is suggested in https://issues.apache.org/jira/browse/YARN-392?focusedCommentId=13583713page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13583713 whereas it actually implements the proposal in YARN-398. The typical use case for blacklisting is to disable a set of nodes globally. e.g. never gives me nodes A and B even when I ask for resources at *. Having to implement blacklisting by doing it on a per-priority will make the common case painful to work with. So I am not in favor of such a proposal unless there is a strong use case for blacklisting on specific priorities. Arun, Vinod and I had an offline discussion where we agreed that we are better off creating an API for blacklisting a set of nodes. Make it possible to schedule to specific nodes without dropping locality Key: YARN-392 URL: https://issues.apache.org/jira/browse/YARN-392 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Sandy Ryza Attachments: YARN-392-1.patch, YARN-392.patch Currently its not possible to specify scheduling requests for specific nodes and nowhere else. The RM automatically relaxes locality to rack and * and assigns non-specified machines to the app. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619580#comment-13619580 ] Hitesh Shah commented on YARN-193: -- {code} +and will get capped to this value. When it is set to -1, checking against the +maximum allocation should be disable./description {code} I am not sure if we should allow disabling of the max memory and max vcores setting. Was it supported earlier or does the patch introduce this support? Spelling mistake: alloacated {code} +LOG.info(Resource request was not able to be alloacated for + + application attempt + appAttemptId + because it + + failed to pass the validation. + e.getMessage()); {code} The above could be made more simple and brief. For example, LOG.warn(Invalid resource ask by application + appAttemptId, e); . Also, please use LOG.level(message, throwable) when trying to log an exception. {code} +RPCUtil.getRemoteException(e); {code} Above is missing a throw. Likewise, in handling of submitApplication, please change log level to warn and also use the correct log function instead of using e.getMessage(). {code} if (globalMaxAppAttempts = 0) { throw new YarnException( The global max attempts should be a positive integer.); } {code} Unrelated to this patch but when throwing/logging errors related to configs, we should always point to the configuration property to let the user know which property needs to be changed. Please file a separate jira for the above. With respect to this, it may be useful to point to the property when throwing exceptions for invalid min/max memory/vcores. Unnecessary import in RMAppAttemptImpl: {code} +import org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils; {code} For InvalidResourceRequestException, missing javadocs for class description. Question - should normalization of resource requests be done inside the scheduler or in the ApplicationMasterService itself which handles the allocate call? If maxMemory or maxVcores is set to -1, what will happen when normalize() is called? I think there are missing tests related to use of DISABLE_RESOURCELIMIT_CHECK in both validate and normalize functions that should have caught this error. In any case, the main question is whether DISABLE_RESOURCELIMIT_CHECK should actually be allowed. Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-447) applicationComparator improvement for CS
[ https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619581#comment-13619581 ] Hudson commented on YARN-447: - Integrated in Hadoop-trunk-Commit #3547 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3547/]) YARN-447. Move ApplicationComparator in CapacityScheduler to use comparator in ApplicationId. Contributed by Nemon Lou. (Revision 1463405) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463405 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java applicationComparator improvement for CS Key: YARN-447 URL: https://issues.apache.org/jira/browse/YARN-447 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.3-alpha Reporter: nemon lou Assignee: nemon lou Priority: Minor Fix For: 2.0.5-beta Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch Now the compare code is : return a1.getApplicationId().getId() - a2.getApplicationId().getId(); Will be replaced with : return a1.getApplicationId().compareTo(a2.getApplicationId()); This will bring some benefits: 1,leave applicationId compare logic to ApplicationId class; 2,In future's HA mode,cluster time stamp may change,ApplicationId class already takes care of this condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-444) Move special container exit codes from YarnConfiguration to API
[ https://issues.apache.org/jira/browse/YARN-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619584#comment-13619584 ] Bikas Saha commented on YARN-444: - IMO when the container exits because YARN took some specific action on it eg. killed due to preemption or killed due to memory then YARN should assign action specific exit status using new values defined inside ContainerExitStatus. Currently, NM kills the container and assigns its real exit code to exit status. So at the AM its hard to tell why the container exited. Of course, not as part of this jira. Move special container exit codes from YarnConfiguration to API --- Key: YARN-444 URL: https://issues.apache.org/jira/browse/YARN-444 Project: Hadoop YARN Issue Type: Sub-task Components: api, applications/distributed-shell Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-444-1.patch, YARN-444.patch YarnConfiguration currently contains the special container exit codes INVALID_CONTAINER_EXIT_STATUS = -1000, ABORTED_CONTAINER_EXIT_STATUS = -100, and DISKS_FAILED = -101. These are not really not really related to configuration, and YarnConfiguration should not become a place to put miscellaneous constants. Per discussion on YARN-417, appmaster writers need to be able to provide special handling for them, so it might make sense to move these to their own user-facing class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-527) Local filecache mkdir fails
Knut O. Hellan created YARN-527: --- Summary: Local filecache mkdir fails Key: YARN-527 URL: https://issues.apache.org/jira/browse/YARN-527 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.0-alpha Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes and six worker nodes. Reporter: Knut O. Hellan Priority: Minor Jobs failed with no other explanation than this stack trace: 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag nostics report from attempt_1364591875320_0017_m_00_0: java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893 55400878397 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Manually creating the directory worked. This behavior was common to at least several nodes in the cluster. The situation was resolved by removing and recreating all /disk?/yarn/local/filecache directories on all nodes. It is unclear whether Yarn struggled with the number of files or if there were corrupt files in the caches. The situation was triggered by a node dying. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-527) Local filecache mkdir fails
[ https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Knut O. Hellan updated YARN-527: Attachment: yarn-site.xml Local filecache mkdir fails --- Key: YARN-527 URL: https://issues.apache.org/jira/browse/YARN-527 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.0-alpha Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes and six worker nodes. Reporter: Knut O. Hellan Priority: Minor Attachments: yarn-site.xml Jobs failed with no other explanation than this stack trace: 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag nostics report from attempt_1364591875320_0017_m_00_0: java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893 55400878397 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Manually creating the directory worked. This behavior was common to at least several nodes in the cluster. The situation was resolved by removing and recreating all /disk?/yarn/local/filecache directories on all nodes. It is unclear whether Yarn struggled with the number of files or if there were corrupt files in the caches. The situation was triggered by a node dying. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619690#comment-13619690 ] Hudson commented on YARN-309: - Integrated in Hadoop-Yarn-trunk #173 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/173/]) YARN-309. Changed NodeManager to obtain heart-beat interval from the ResourceManager. Contributed by Xuan Gong. (Revision 1463346) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463346 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerBuilderUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.0.5-beta Attachments: YARN-309.10.patch, YARN-309.11.patch, YARN-309.1.patch, YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing
[ https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619694#comment-13619694 ] Hudson commented on YARN-516: - Integrated in Hadoop-Yarn-trunk #173 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/173/]) YARN-516. Fix failure in TestContainerLocalizer caused by HADOOP-9357. Contributed by Andrew Wang. (Revision 1463362) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463362 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java TestContainerLocalizer.testContainerLocalizerMain is failing Key: YARN-516 URL: https://issues.apache.org/jira/browse/YARN-516 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Andrew Wang Fix For: 2.0.5-beta Attachments: YARN-516.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-524) TestYarnVersionInfo failing if generated properties doesn't include an SVN URL
[ https://issues.apache.org/jira/browse/YARN-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619695#comment-13619695 ] Hudson commented on YARN-524: - Integrated in Hadoop-Yarn-trunk #173 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/173/]) YARN-524 TestYarnVersionInfo failing if generated properties doesn't include an SVN URL (Revision 1463300) Result = SUCCESS stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463300 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java TestYarnVersionInfo failing if generated properties doesn't include an SVN URL -- Key: YARN-524 URL: https://issues.apache.org/jira/browse/YARN-524 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 3.0.0 Environment: OS/X with branch off github Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Fix For: 3.0.0 Attachments: YARN-524.patch {{TestYarnVersionInfo}} fails in the {{YarnVersionInfo.getUrl()}} call returns {{Unknown}} when that is the value inserted into the property file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-525) make CS node-locality-delay refreshable
[ https://issues.apache.org/jira/browse/YARN-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-525: --- Issue Type: Improvement (was: Bug) Summary: make CS node-locality-delay refreshable (was: yarn.scheduler.capacity.node-locality-delay doesn't change with rmadmin -refreshQueues) make CS node-locality-delay refreshable --- Key: YARN-525 URL: https://issues.apache.org/jira/browse/YARN-525 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.0.3-alpha, 0.23.7 Reporter: Thomas Graves the config yarn.scheduler.capacity.node-locality-delay doesn't change when you change the value in capacity_scheduler.xml and then run yarn rmadmin -refreshQueues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-528) Make IDs read only
Robert Joseph Evans created YARN-528: Summary: Make IDs read only Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Improvement Reporter: Robert Joseph Evans I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated YARN-528: - Attachment: YARN-528.txt This patch contains changes to both Map/Reduce IDs as well as YARN APIs. I don't really want to split them up right now, but I am happy to file a separate JIRA for tracking purposes if the community decides this is a direction we want to go in. Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Improvement Reporter: Robert Joseph Evans Attachments: YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans reassigned YARN-528: Assignee: Robert Joseph Evans Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Improvement Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619750#comment-13619750 ] Hadoop QA commented on YARN-528: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576553/YARN-528.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 49 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/645//console This message is automatically generated. Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Improvement Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619761#comment-13619761 ] Hudson commented on YARN-309: - Integrated in Hadoop-Hdfs-trunk #1362 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/]) YARN-309. Changed NodeManager to obtain heart-beat interval from the ResourceManager. Contributed by Xuan Gong. (Revision 1463346) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463346 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerBuilderUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.0.5-beta Attachments: YARN-309.10.patch, YARN-309.11.patch, YARN-309.1.patch, YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing
[ https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619765#comment-13619765 ] Hudson commented on YARN-516: - Integrated in Hadoop-Hdfs-trunk #1362 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/]) YARN-516. Fix failure in TestContainerLocalizer caused by HADOOP-9357. Contributed by Andrew Wang. (Revision 1463362) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463362 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java TestContainerLocalizer.testContainerLocalizerMain is failing Key: YARN-516 URL: https://issues.apache.org/jira/browse/YARN-516 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Andrew Wang Fix For: 2.0.5-beta Attachments: YARN-516.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-524) TestYarnVersionInfo failing if generated properties doesn't include an SVN URL
[ https://issues.apache.org/jira/browse/YARN-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619766#comment-13619766 ] Hudson commented on YARN-524: - Integrated in Hadoop-Hdfs-trunk #1362 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/]) YARN-524 TestYarnVersionInfo failing if generated properties doesn't include an SVN URL (Revision 1463300) Result = FAILURE stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463300 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java TestYarnVersionInfo failing if generated properties doesn't include an SVN URL -- Key: YARN-524 URL: https://issues.apache.org/jira/browse/YARN-524 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 3.0.0 Environment: OS/X with branch off github Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Fix For: 3.0.0 Attachments: YARN-524.patch {{TestYarnVersionInfo}} fails in the {{YarnVersionInfo.getUrl()}} call returns {{Unknown}} when that is the value inserted into the property file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-447) applicationComparator improvement for CS
[ https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619768#comment-13619768 ] Hudson commented on YARN-447: - Integrated in Hadoop-Hdfs-trunk #1362 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/]) YARN-447. Move ApplicationComparator in CapacityScheduler to use comparator in ApplicationId. Contributed by Nemon Lou. (Revision 1463405) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463405 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java applicationComparator improvement for CS Key: YARN-447 URL: https://issues.apache.org/jira/browse/YARN-447 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.3-alpha Reporter: nemon lou Assignee: nemon lou Priority: Minor Fix For: 2.0.5-beta Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch Now the compare code is : return a1.getApplicationId().getId() - a2.getApplicationId().getId(); Will be replaced with : return a1.getApplicationId().compareTo(a2.getApplicationId()); This will bring some benefits: 1,leave applicationId compare logic to ApplicationId class; 2,In future's HA mode,cluster time stamp may change,ApplicationId class already takes care of this condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality
[ https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619778#comment-13619778 ] Thomas Graves commented on YARN-392: Bikas when you say creating an API for blacklisting a set of nodes are you basically referring to YARN-398 or something else? Make it possible to schedule to specific nodes without dropping locality Key: YARN-392 URL: https://issues.apache.org/jira/browse/YARN-392 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Sandy Ryza Attachments: YARN-392-1.patch, YARN-392.patch Currently its not possible to specify scheduling requests for specific nodes and nowhere else. The RM automatically relaxes locality to rack and * and assigns non-specified machines to the app. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-527) Local filecache mkdir fails
[ https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619797#comment-13619797 ] Knut O. Hellan commented on YARN-527: - Digging through the code, it looks to me like the native Java File.mkdirs is used to actually create the directory and it will not give information about why it failed. If that is the case then I guess this issue is actually a feature request that yarn should be better at cleaning up old file caches so that this situation will not happen. Local filecache mkdir fails --- Key: YARN-527 URL: https://issues.apache.org/jira/browse/YARN-527 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.0-alpha Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes and six worker nodes. Reporter: Knut O. Hellan Priority: Minor Attachments: yarn-site.xml Jobs failed with no other explanation than this stack trace: 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag nostics report from attempt_1364591875320_0017_m_00_0: java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893 55400878397 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Manually creating the directory worked. This behavior was common to at least several nodes in the cluster. The situation was resolved by removing and recreating all /disk?/yarn/local/filecache directories on all nodes. It is unclear whether Yarn struggled with the number of files or if there were corrupt files in the caches. The situation was triggered by a node dying. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619817#comment-13619817 ] Hudson commented on YARN-309: - Integrated in Hadoop-Mapreduce-trunk #1389 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/]) YARN-309. Changed NodeManager to obtain heart-beat interval from the ResourceManager. Contributed by Xuan Gong. (Revision 1463346) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463346 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerBuilderUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.0.5-beta Attachments: YARN-309.10.patch, YARN-309.11.patch, YARN-309.1.patch, YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing
[ https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619821#comment-13619821 ] Hudson commented on YARN-516: - Integrated in Hadoop-Mapreduce-trunk #1389 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/]) YARN-516. Fix failure in TestContainerLocalizer caused by HADOOP-9357. Contributed by Andrew Wang. (Revision 1463362) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463362 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java TestContainerLocalizer.testContainerLocalizerMain is failing Key: YARN-516 URL: https://issues.apache.org/jira/browse/YARN-516 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Andrew Wang Fix For: 2.0.5-beta Attachments: YARN-516.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-524) TestYarnVersionInfo failing if generated properties doesn't include an SVN URL
[ https://issues.apache.org/jira/browse/YARN-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619822#comment-13619822 ] Hudson commented on YARN-524: - Integrated in Hadoop-Mapreduce-trunk #1389 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/]) YARN-524 TestYarnVersionInfo failing if generated properties doesn't include an SVN URL (Revision 1463300) Result = SUCCESS stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463300 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java TestYarnVersionInfo failing if generated properties doesn't include an SVN URL -- Key: YARN-524 URL: https://issues.apache.org/jira/browse/YARN-524 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 3.0.0 Environment: OS/X with branch off github Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Fix For: 3.0.0 Attachments: YARN-524.patch {{TestYarnVersionInfo}} fails in the {{YarnVersionInfo.getUrl()}} call returns {{Unknown}} when that is the value inserted into the property file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment
[ https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619825#comment-13619825 ] Hudson commented on YARN-475: - Integrated in Hadoop-Mapreduce-trunk #1389 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/]) YARN-475. Remove a unused constant in the public API - ApplicationConstants.AM_APP_ATTEMPT_ID_ENV. Contributed by Hitesh Shah. (Revision 1463033) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463033 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment --- Key: YARN-475 URL: https://issues.apache.org/jira/browse/YARN-475 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah Assignee: Hitesh Shah Fix For: 2.0.5-beta Attachments: YARN-475.1.patch AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the application attempt id from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-447) applicationComparator improvement for CS
[ https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619824#comment-13619824 ] Hudson commented on YARN-447: - Integrated in Hadoop-Mapreduce-trunk #1389 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/]) YARN-447. Move ApplicationComparator in CapacityScheduler to use comparator in ApplicationId. Contributed by Nemon Lou. (Revision 1463405) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463405 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java applicationComparator improvement for CS Key: YARN-447 URL: https://issues.apache.org/jira/browse/YARN-447 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.3-alpha Reporter: nemon lou Assignee: nemon lou Priority: Minor Fix For: 2.0.5-beta Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch Now the compare code is : return a1.getApplicationId().getId() - a2.getApplicationId().getId(); Will be replaced with : return a1.getApplicationId().compareTo(a2.getApplicationId()); This will bring some benefits: 1,leave applicationId compare logic to ApplicationId class; 2,In future's HA mode,cluster time stamp may change,ApplicationId class already takes care of this condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619911#comment-13619911 ] Robert Joseph Evans commented on YARN-528: -- The build failed, because it needs to be upmerged, again :( Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Improvement Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated YARN-528: - Attachment: YARN-528.txt Upmerged Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Improvement Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: YARN-528.txt, YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619957#comment-13619957 ] Zhijie Shen commented on YARN-193: -- {quote} I am not sure if we should allow disabling of the max memory and max vcores setting. Was it supported earlier or does the patch introduce this support? {quote} Yes, the patch introduces the support. It is already there in your previous patch. I inherit it and and some description in yarn-default.xml. I'm fine with whether the function need to be supported or not. One risk I can image if the function is supported is that AM memory can exceeds yarn.nodemanager.resource.memory-mb when DISABLE_RESOURCELIMIT_CHECK is set. Then, the problem described in YARN-389 will occur. {quote} Question - should normalization of resource requests be done inside the scheduler or in the ApplicationMasterService itself which handles the allocate call? {quote} I think it should be better to do normalization outside allocate, because allocate is not only called in ApplicationMasterService and it is not necessary that normalize is called every time when allocate is called. For example, RMAppAttemptImpl#ScheduleTransition#transition doesn't require to do normalization because the resource has been validated during the submission stage. For another example, RMAppAttemptImpl#AMContainerAllocatedTransition#transition supplies an empty ask. {quote} Unrelated to this patch but when throwing/logging errors related to configs, we should always point to the configuration property to let the user know which property needs to be changed. Please file a separate jira for the above. {quote} I'll do that, and modify the log information when exception is thrown in this patch. {quote} For InvalidResourceRequestException, missing javadocs for class description. {quote} I'll add the description. {quote} If maxMemory or maxVcores is set to -1, what will happen when normalize() is called? {quote} The normalized value has not upper bound. Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619989#comment-13619989 ] Hadoop QA commented on YARN-528: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576592/YARN-528.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 50 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/646//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/646//console This message is automatically generated. Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Improvement Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: YARN-528.txt, YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality
[ https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620008#comment-13620008 ] Bikas Saha commented on YARN-392: - Yes YARN-398 but not the proposal currently in there. The alternative proposal is to have a new method in AM RM protocol using which the AM can blacklist nodes globally for all tasks (at all priorities) for that app. Make it possible to schedule to specific nodes without dropping locality Key: YARN-392 URL: https://issues.apache.org/jira/browse/YARN-392 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Sandy Ryza Attachments: YARN-392-1.patch, YARN-392.patch Currently its not possible to specify scheduling requests for specific nodes and nowhere else. The RM automatically relaxes locality to rack and * and assigns non-specified machines to the app. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-122) CompositeService should clone the Configurations it passes to children
[ https://issues.apache.org/jira/browse/YARN-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-122: Priority: Minor (was: Major) CompositeService should clone the Configurations it passes to children -- Key: YARN-122 URL: https://issues.apache.org/jira/browse/YARN-122 Project: Hadoop YARN Issue Type: Sub-task Reporter: Steve Loughran Priority: Minor Original Estimate: 0.5h Remaining Estimate: 0.5h {{CompositeService.init(Configuration)}} saves the configuration passed in *and* passes the same instance down to all managed services. This means a change in the configuration of one child could propagate to all the others. Unless this is desired, the configuration should be cloned for each child. Fast and easy fix; tests can be added to those coming in MAPREDUCE-4014 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian he updated YARN-529: - Summary: MR app master clean staging dir when reboot command sent from RM while the MR job succeeded (was: IF RM rebooted when MR job succeeded ) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded --- Key: YARN-529 URL: https://issues.apache.org/jira/browse/YARN-529 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: jian he -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian he updated YARN-529: - Description: MR app master will clean staging dir, if the job is already succeeded and asked to reboot. RM will consider this job unsuccessful and launch further attempts, further attempts will fail because staging dir is cleaned MR app master clean staging dir when reboot command sent from RM while the MR job succeeded --- Key: YARN-529 URL: https://issues.apache.org/jira/browse/YARN-529 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: jian he MR app master will clean staging dir, if the job is already succeeded and asked to reboot. RM will consider this job unsuccessful and launch further attempts, further attempts will fail because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian he reassigned YARN-529: Assignee: jian he MR app master clean staging dir when reboot command sent from RM while the MR job succeeded --- Key: YARN-529 URL: https://issues.apache.org/jira/browse/YARN-529 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: jian he Assignee: jian he MR app master will clean staging dir, if the job is already succeeded and asked to reboot. RM will consider this job unsuccessful and launch further attempts, further attempts will fail because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
Steve Loughran created YARN-530: --- Summary: Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services Key: YARN-530 URL: https://issues.apache.org/jira/browse/YARN-530 Project: Hadoop YARN Issue Type: Sub-task Reporter: Steve Loughran # Extend the YARN {{Service}} interface as discussed in YARN-117 # Implement the changes in {{AbstractService}} and {{FilterService}}. # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reassigned YARN-530: --- Assignee: Steve Loughran Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services - Key: YARN-530 URL: https://issues.apache.org/jira/browse/YARN-530 Project: Hadoop YARN Issue Type: Sub-task Reporter: Steve Loughran Assignee: Steve Loughran # Extend the YARN {{Service}} interface as discussed in YARN-117 # Implement the changes in {{AbstractService}} and {{FilterService}}. # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-121) Yarn services to throw a YarnException on invalid state changs
[ https://issues.apache.org/jira/browse/YARN-121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved YARN-121. - Resolution: Duplicate Fix Version/s: 3.0.0 Superceded by YARN-530 Yarn services to throw a YarnException on invalid state changs -- Key: YARN-121 URL: https://issues.apache.org/jira/browse/YARN-121 Project: Hadoop YARN Issue Type: Sub-task Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Fix For: 3.0.0 Original Estimate: 0.5h Remaining Estimate: 0.5h the {{EnsureCurrentState()}} checks of services throw an {{IllegalStateException}} if the state is wrong. If this was changed to {{YarnException}}. wrapper services such as CompositeService could relay this direct, instead of wrapping it in their own. Time to implement mainly in changing the lifecycle test cases of MAPREDUCE-3939 subtasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-120) Make yarn-common services robust
[ https://issues.apache.org/jira/browse/YARN-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved YARN-120. - Resolution: Duplicate Fix Version/s: 3.0.0 Superceded by YARN-530 Make yarn-common services robust Key: YARN-120 URL: https://issues.apache.org/jira/browse/YARN-120 Project: Hadoop YARN Issue Type: Sub-task Reporter: Steve Loughran Assignee: Steve Loughran Labels: yarn Fix For: 3.0.0 Attachments: MAPREDUCE-4014.patch Review the yarn common services ({{CompositeService}}, {{AbstractLivelinessMonitor}} and make their service startup _and especially shutdown_ more robust against out-of-lifecycle invocation and partially complete initialization. Write tests for these where possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-382) SchedulerUtils improve way normalizeRequest sets the resource capabilities
[ https://issues.apache.org/jira/browse/YARN-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620057#comment-13620057 ] Bikas Saha commented on YARN-382: - +1 looks good to me. SchedulerUtils improve way normalizeRequest sets the resource capabilities -- Key: YARN-382 URL: https://issues.apache.org/jira/browse/YARN-382 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Thomas Graves Assignee: Zhijie Shen Attachments: YARN-382_1.patch, YARN-382_2.patch, YARN-382_demo.patch In YARN-370, we changed it from setting the capability to directly setting memory and cores: -ask.setCapability(normalized); +ask.getCapability().setMemory(normalized.getMemory()); +ask.getCapability().setVirtualCores(normalized.getVirtualCores()); We did this because it is directly setting the values in the original resource object passed in when the AM gets allocated and without it the AM doesn't get the resource normalized correctly in the submission context. See YARN-370 for more details. I think we should find a better way of doing this long term, one so we don't have to keep adding things there when new resources are added, two because its a bit confusing as to what its doing and prone to someone accidentally breaking it in the future again. Something closer to what Arun suggested in YARN-370 would be better but we need to make sure all the places work and get some more testing on it before putting it in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-530: Attachment: YARN-117changes.pdf this is an overview of the changes, with explanations Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services - Key: YARN-530 URL: https://issues.apache.org/jira/browse/YARN-530 Project: Hadoop YARN Issue Type: Sub-task Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117changes.pdf # Extend the YARN {{Service}} interface as discussed in YARN-117 # Implement the changes in {{AbstractService}} and {{FilterService}}. # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-530: Attachment: YARN-530.patch This is the subset of YARN-117 for yarn-common Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services - Key: YARN-530 URL: https://issues.apache.org/jira/browse/YARN-530 Project: Hadoop YARN Issue Type: Sub-task Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117changes.pdf, YARN-530.patch # Extend the YARN {{Service}} interface as discussed in YARN-117 # Implement the changes in {{AbstractService}} and {{FilterService}}. # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620066#comment-13620066 ] Bikas Saha commented on YARN-193: - Can we check that we are getting the expected exception and not some other one? {code} +try { + rmService.submitApplication(submitRequest); + Assert.fail(Application submission should fail because); +} catch (YarnRemoteException e) { + // Exception is expected +} + } {code} Setting the same config twice? In second set, why not use a -ve value instead of the DISABLE value? Its not clear whether we want to disable check or set a -ve value. same for others. {code} +conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 0); +conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, +ResourceCalculator.DISABLE_RESOURCELIMIT_CHECK); +try { + resourceManager.init(conf); + fail(Exception is expected because the min memory allocation is + + non-positive.); +} catch (YarnException e) { + // Exception is expected. {code} Lets also add a test for case when memory is more than max. Normalize should always reduce that to max. Same for DRF {code} +// max is not a multiple of min +maxResource = Resources.createResource(maxMemory - 10, 0); +ask.setCapability(Resources.createResource(maxMemory - 100)); +// multiple of minMemory maxMemory, then reduce to maxMemory +SchedulerUtils.normalizeRequest(ask, resourceCalculator, null, +minResource, maxResource); +assertEquals(maxResource.getMemory(), ask.getCapability().getMemory()); } {code} Rename testAppSubmitError() to show that its testing invalid resource request? TestAMRMClient. Why is this change needed? {code} +amResource.setMemory( +YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB); +amContainer.setResource(amResource); {code} Dont we need to throw? {code} + } catch (InvalidResourceRequestException e) { +LOG.info(Resource request was not able to be alloacated for + + application attempt + appAttemptId + because it + + failed to pass the validation. + e.getMessage()); +RPCUtil.getRemoteException(e); + } {code} typo {code} +// validate scheduler vcors allocation setting {code} This will need to be rebased after YARN-382 which I am going to commit shortly. I am fine with requiring that a max allocation limit be set. We should also make sure that max allocation from config can be matched by at least 1 machine in the cluster. That should be a different jira. IMO, Normalization should be called only inside the scheduler. It is an artifact of the scheduler logic. Nothing in the RM requires resources to be normalized to a multiple of min. Only the scheduler needs it to makes its life easier and it could choose to not do so. Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117.patch This is the across-all-yarn-projects patch (plus HADOOP-9447) just to show what the combined patch looks and tests like. YARN-530 contains the changes to yarn-common which should be the first step. (This patch contains those) Enhance YARN service model -- Key: YARN-117 URL: https://issues.apache.org/jira/browse/YARN-117 Project: Hadoop YARN Issue Type: Improvement Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117.patch Having played the YARN service model, there are some issues that I've identified based on past work and initial use. This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs. h2. state model prevents stopped state being entered if you could not successfully start the service. In the current lifecycle you cannot stop a service unless it was successfully started, but * {{init()}} may acquire resources that need to be explicitly released * if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources. *Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null. Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than stopped. MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one. h2. AbstractService doesn't prevent duplicate state change requests. The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this. This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to retrofit the same behaviour to everything that extends {{AbstractService}} -something that must be done before the class is considered stable (because once the lifecycle methods are declared final, all subclasses that are out of the source tree will need fixing by the respective developers. h2. AbstractService state change doesn't defend against race conditions. There's no concurrency locks on the state transitions. Whatever fix for wrong state calls is added should correct this to prevent re-entrancy, such as {{stop()}} being called from two threads. h2. Static methods to choreograph of lifecycle operations Helper methods to move things through lifecycles. init-start is common, stop-if-service!=null another. Some static methods can execute these, and even call {{stop()}} if {{init()}} raises an exception. These could go into a class {{ServiceOps}} in the same package. These can be used by those services that wrap other services, and help manage more robust shutdowns. h2. state transition failures are something that registered service listeners may wish to be informed of. When a state transition fails a {{RuntimeException}} can be thrown -and the service listeners are not informed as the notification point isn't reached. They may wish to know this, especially for management and diagnostics. *Fix:* extend {{ServiceStateChangeListener}} with a callback such as {{stateChangeFailed(Service service,Service.State targeted-state, RuntimeException e)}} that is invoked from the (final) state change methods in the {{AbstractService}} class (once they delegate to their inner {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing implementations of the interface. h2. Service listener failures not handled Is this an error an error or not? Log and ignore may not be what is desired. *Proposed:* during {{stop()}} any exception by a listener is caught and discarded, to increase the likelihood of a better shutdown, but do not add try-catch
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620091#comment-13620091 ] Bikas Saha commented on YARN-193: - Also, why are there so many normalize functions and why are we creating a new Resource object every time we normalize? We should fix this in a different jira though. Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620092#comment-13620092 ] jian he commented on YARN-529: -- several solutions: 1. Let RM accept old attempts. In current case, RM will raise exception because unrecognized attempts and think the job unsuccessful 2. Only clean staging dir after AM successfully unregister with RM. We can use a flag to indicate or modify state machine when receive JOB_AM_REBOOT, transition from SUCCEEDED to REBOOT. The potential problem is that, when job transition to SUCCEEDED state, some job success metrics stuff has already been triggered. MR app master clean staging dir when reboot command sent from RM while the MR job succeeded --- Key: YARN-529 URL: https://issues.apache.org/jira/browse/YARN-529 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: jian he Assignee: jian he MR app master will clean staging dir, if the job is already succeeded and asked to reboot. RM will consider this job unsuccessful and launch further attempts, further attempts will fail because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620096#comment-13620096 ] Hadoop QA commented on YARN-530: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576617/YARN-530.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 33 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/647//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/647//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/647//console This message is automatically generated. Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services - Key: YARN-530 URL: https://issues.apache.org/jira/browse/YARN-530 Project: Hadoop YARN Issue Type: Sub-task Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117changes.pdf, YARN-530.patch # Extend the YARN {{Service}} interface as discussed in YARN-117 # Implement the changes in {{AbstractService}} and {{FilterService}}. # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-382) SchedulerUtils improve way normalizeRequest sets the resource capabilities
[ https://issues.apache.org/jira/browse/YARN-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620097#comment-13620097 ] Hudson commented on YARN-382: - Integrated in Hadoop-trunk-Commit #3549 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3549/]) YARN-382. SchedulerUtils improve way normalizeRequest sets the resource capabilities (Zhijie Shen via bikas) (Revision 1463653) Result = SUCCESS bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463653 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java SchedulerUtils improve way normalizeRequest sets the resource capabilities -- Key: YARN-382 URL: https://issues.apache.org/jira/browse/YARN-382 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Thomas Graves Assignee: Zhijie Shen Attachments: YARN-382_1.patch, YARN-382_2.patch, YARN-382_demo.patch In YARN-370, we changed it from setting the capability to directly setting memory and cores: -ask.setCapability(normalized); +ask.getCapability().setMemory(normalized.getMemory()); +ask.getCapability().setVirtualCores(normalized.getVirtualCores()); We did this because it is directly setting the values in the original resource object passed in when the AM gets allocated and without it the AM doesn't get the resource normalized correctly in the submission context. See YARN-370 for more details. I think we should find a better way of doing this long term, one so we don't have to keep adding things there when new resources are added, two because its a bit confusing as to what its doing and prone to someone accidentally breaking it in the future again. Something closer to what Arun suggested in YARN-370 would be better but we need to make sure all the places work and get some more testing on it before putting it in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-382) SchedulerUtils improve way normalizeRequest sets the resource capabilities
[ https://issues.apache.org/jira/browse/YARN-382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-382: - Fix Version/s: 2.0.5-beta SchedulerUtils improve way normalizeRequest sets the resource capabilities -- Key: YARN-382 URL: https://issues.apache.org/jira/browse/YARN-382 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Thomas Graves Assignee: Zhijie Shen Fix For: 2.0.5-beta Attachments: YARN-382_1.patch, YARN-382_2.patch, YARN-382_demo.patch In YARN-370, we changed it from setting the capability to directly setting memory and cores: -ask.setCapability(normalized); +ask.getCapability().setMemory(normalized.getMemory()); +ask.getCapability().setVirtualCores(normalized.getVirtualCores()); We did this because it is directly setting the values in the original resource object passed in when the AM gets allocated and without it the AM doesn't get the resource normalized correctly in the submission context. See YARN-370 for more details. I think we should find a better way of doing this long term, one so we don't have to keep adding things there when new resources are added, two because its a bit confusing as to what its doing and prone to someone accidentally breaking it in the future again. Something closer to what Arun suggested in YARN-370 would be better but we need to make sure all the places work and get some more testing on it before putting it in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-442) The ID classes should be immutable
[ https://issues.apache.org/jira/browse/YARN-442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-442. -- Resolution: Duplicate Assignee: (was: Xuan Gong) YARN-528 is fixing this, closing as duplicate. The ID classes should be immutable -- Key: YARN-442 URL: https://issues.apache.org/jira/browse/YARN-442 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth ApplicationId, ApplicationAttemptId, ContainerId should be immutable. That should allow for a simpler implementation as well as remove synchronization. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620163#comment-13620163 ] Hadoop QA commented on YARN-117: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576620/YARN-117.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 28 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 33 warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 10 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy: org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup org.apache.hadoop.mapreduce.security.ssl.TestEncryptedShuffle org.apache.hadoop.mapred.TestNetworkedJob org.apache.hadoop.mapred.TestClusterMRNotification org.apache.hadoop.mapred.TestJobCounters org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner org.apache.hadoop.mapred.TestMiniMRClasspath org.apache.hadoop.mapred.TestBlockLimits org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers org.apache.hadoop.mapred.TestMiniMRChildTask org.apache.hadoop.mapreduce.security.TestMRCredentials org.apache.hadoop.mapreduce.v2.TestNonExistentJob org.apache.hadoop.mapreduce.v2.TestRMNMInfo org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser org.apache.hadoop.mapreduce.v2.TestMROldApiJobs org.apache.hadoop.mapreduce.TestMapReduceLazyOutput org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution org.apache.hadoop.mapred.TestJobCleanup org.apache.hadoop.mapred.TestReduceFetch org.apache.hadoop.mapred.TestReduceFetchFromPartialMem org.apache.hadoop.mapred.TestMerge org.apache.hadoop.mapreduce.v2.TestMRJobs org.apache.hadoop.mapreduce.TestChild org.apache.hadoop.mapred.TestJobName org.apache.hadoop.mapred.TestLazyOutput org.apache.hadoop.mapreduce.security.TestBinaryTokenFile org.apache.hadoop.mapreduce.v2.TestUberAM org.apache.hadoop.mapred.TestMiniMRClientCluster org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService org.apache.hadoop.mapred.TestClusterMapReduceTestCase org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter org.apache.hadoop.ipc.TestSocketFactory org.apache.hadoop.mapred.TestJobSysDirWithDFS org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/648//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/648//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Findbugs warnings:
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620170#comment-13620170 ] Vinod Kumar Vavilapalli commented on YARN-528: -- bq. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. We have to make a call on this, don't think we explicitly took that decision yet. That said, I am inclined to throw it away but there were a couple of reasons why we put this (like being able to pass through unindentified fields for e.g. from new RM to new NM via old AM). I would like a day or two to dig into those with knowledgeable folks offline. Thanks for your patience. Oh, and let's separate the tickets into MR and YARN only changes please - there isn't any pain as they are all orthogonal changes. Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Sub-task Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: YARN-528.txt, YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620230#comment-13620230 ] Bikas Saha commented on YARN-529: - By 1) you mean let RM accept finishApplicationAttempt() from the last attempt? MR app master clean staging dir when reboot command sent from RM while the MR job succeeded --- Key: YARN-529 URL: https://issues.apache.org/jira/browse/YARN-529 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He MR app master will clean staging dir, if the job is already succeeded and asked to reboot. If the finishApplicationMaster call fails, RM will consider this job unfinished and launch further attempts, further attempts will fail because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-529) Succeeded MR job is retried by RM if finishApplicationMaster() call fails
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-529: Summary: Succeeded MR job is retried by RM if finishApplicationMaster() call fails (was: Succeeded RM job is retried by RM if finishApplicationMaster() call fails) Succeeded MR job is retried by RM if finishApplicationMaster() call fails - Key: YARN-529 URL: https://issues.apache.org/jira/browse/YARN-529 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He MR app master will clean staging dir, if the job is already succeeded and asked to reboot. If the finishApplicationMaster call fails, RM will consider this job unfinished and launch further attempts, further attempts will fail because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-529) Succeeded RM job is retried by RM if finishApplicationMaster() call fails
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-529: Summary: Succeeded RM job is retried by RM if finishApplicationMaster() call fails (was: MR app master clean staging dir when reboot command sent from RM while the MR job succeeded) Succeeded RM job is retried by RM if finishApplicationMaster() call fails - Key: YARN-529 URL: https://issues.apache.org/jira/browse/YARN-529 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He MR app master will clean staging dir, if the job is already succeeded and asked to reboot. If the finishApplicationMaster call fails, RM will consider this job unfinished and launch further attempts, further attempts will fail because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-529) Succeeded MR job is retried by RM if finishApplicationMaster() call fails
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-529: Issue Type: Improvement (was: Sub-task) Parent: (was: YARN-128) Succeeded MR job is retried by RM if finishApplicationMaster() call fails - Key: YARN-529 URL: https://issues.apache.org/jira/browse/YARN-529 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Jian He Assignee: Jian He MR app master will clean staging dir, if the job is already succeeded and asked to reboot. If the finishApplicationMaster call fails, RM will consider this job unfinished and launch further attempts, further attempts will fail because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-527) Local filecache mkdir fails
[ https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620239#comment-13620239 ] Vinod Kumar Vavilapalli commented on YARN-527: -- Is there any difference in how NodeManager tried to create the dir and your manual creation? Like the user running NM and user who manually created the dir? Can you reproduce this? If we can find out exactly why NM couldn't create it automatically, then we can do something about it. Local filecache mkdir fails --- Key: YARN-527 URL: https://issues.apache.org/jira/browse/YARN-527 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.0-alpha Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes and six worker nodes. Reporter: Knut O. Hellan Priority: Minor Attachments: yarn-site.xml Jobs failed with no other explanation than this stack trace: 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag nostics report from attempt_1364591875320_0017_m_00_0: java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893 55400878397 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Manually creating the directory worked. This behavior was common to at least several nodes in the cluster. The situation was resolved by removing and recreating all /disk?/yarn/local/filecache directories on all nodes. It is unclear whether Yarn struggled with the number of files or if there were corrupt files in the caches. The situation was triggered by a node dying. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620261#comment-13620261 ] Xuan Gong commented on YARN-101: 1.Use YarnServerBuilderUtils for constructing node-heartbeat response 2.User BuilderUtils to create ApplicationId, ContainerId, ContainerStatus, etc 3.Recreated the test case as last comment suggested If the heartbeat message loss, the nodestatus info of complete container will loss too. Key: YARN-101 URL: https://issues.apache.org/jira/browse/YARN-101 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: suse. Reporter: xieguiming Assignee: Xuan Gong Priority: Minor Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, YARN-101.4.patch, YARN-101.5.patch see the red color: org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java protected void startStatusUpdater() { new Thread(Node Status Updater) { @Override @SuppressWarnings(unchecked) public void run() { int lastHeartBeatID = 0; while (!isStopped) { // Send heartbeat try { synchronized (heartbeatMonitor) { heartbeatMonitor.wait(heartBeatInterval); } {color:red} // Before we send the heartbeat, we get the NodeStatus, // whose method removes completed containers. NodeStatus nodeStatus = getNodeStatus(); {color} nodeStatus.setResponseId(lastHeartBeatID); NodeHeartbeatRequest request = recordFactory .newRecordInstance(NodeHeartbeatRequest.class); request.setNodeStatus(nodeStatus); {color:red} // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here. HeartbeatResponse response = resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); {color} if (response.getNodeAction() == NodeAction.SHUTDOWN) { LOG .info(Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat, + hence shutting down.); NodeStatusUpdaterImpl.this.stop(); break; } if (response.getNodeAction() == NodeAction.REBOOT) { LOG.info(Node is out of sync with ResourceManager, + hence rebooting.); NodeStatusUpdaterImpl.this.reboot(); break; } lastHeartBeatID = response.getResponseId(); ListContainerId containersToCleanup = response .getContainersToCleanupList(); if (containersToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedContainersEvent(containersToCleanup)); } ListApplicationId appsToCleanup = response.getApplicationsToCleanupList(); //Only start tracking for keepAlive on FINISH_APP trackAppsForKeepAlive(appsToCleanup); if (appsToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedAppsEvent(appsToCleanup)); } } catch (Throwable e) { // TODO Better error handling. Thread can die with the rest of the // NM still running. LOG.error(Caught exception in status-updater, e); } } } }.start(); } private NodeStatus getNodeStatus() { NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class); nodeStatus.setNodeId(this.nodeId); int numActiveContainers = 0; ListContainerStatus containersStatuses = new ArrayListContainerStatus(); for (IteratorEntryContainerId, Container i = this.context.getContainers().entrySet().iterator(); i.hasNext();) { EntryContainerId, Container e = i.next(); ContainerId containerId = e.getKey(); Container container = e.getValue(); // Clone the container to send it to the RM org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = container.cloneAndGetContainerStatus(); containersStatuses.add(containerStatus); ++numActiveContainers; LOG.info(Sending out status for container: + containerStatus); {color:red} // Here is the part that removes the completed containers. if (containerStatus.getState() == ContainerState.COMPLETE) { // Remove
[jira] [Assigned] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land
[ https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-486: -- Assignee: Xuan Gong (was: Bikas Saha) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land - Key: YARN-486 URL: https://issues.apache.org/jira/browse/YARN-486 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Currently, id, resource request etc need to be copied over from Container to ContainerLaunchContext. This can be brittle. Also it leads to duplication of information (such as Resource from CLC and Resource from Container and Container.tokens). Sending Container directly to startContainer solves these problems. It also makes CLC clean by only having stuff in it that it set by the client/AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620275#comment-13620275 ] Siddharth Seth commented on YARN-528: - Yep, we'll likely only support a single serialization, which at this point is PB. What the current approach was supposed to be good at. 1. Handling unknown fields (which proto already supports), which could make rolling upgrades etc easier. 2. Wrapping the object which came over the wire - with a goal of creating fewer objects. I don't think the second point was really achieved, with the implementation getting complicated because of the interfaces being mutable, lists and supporting chained sets (clc.getResource().setMemory()). I think point one should continue to be maintained. Do we want *Proto references in the APIs (client library versus Java Protocol definition) . At the moment, these are only referenced in the PBImpls - and hidden by the abstraction layer. What I don't like about the patch is Protos leaking into the object constructors. Instead, I think we could just use simple Java objects, with conversion at the RPC layer (I believe this is similar to the HDFS model). Unknown fields can be handled via byte[] arrays. I'm guessing very few of the interfaces actually need to be mutable - so in that sense, yes this needs to be done before beta. OTOH, changing the PBImpl itself can be done at a later point if required. (Earlier is of-course better, and I'd be happy to help with this. Was planning on working on YARN-442 before you started this work). Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Sub-task Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: YARN-528.txt, YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-479: - Attachment: YARN-479.5.patch NM retry behavior for connection to RM should be similar for lost heartbeats Key: YARN-479 URL: https://issues.apache.org/jira/browse/YARN-479 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Jian He Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, YARN-479.4.patch, YARN-479.5.patch Regardless of connection loss at the start or at an intermediate point, NM's retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620284#comment-13620284 ] Hadoop QA commented on YARN-101: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576650/YARN-101.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/649//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/649//console This message is automatically generated. If the heartbeat message loss, the nodestatus info of complete container will loss too. Key: YARN-101 URL: https://issues.apache.org/jira/browse/YARN-101 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: suse. Reporter: xieguiming Assignee: Xuan Gong Priority: Minor Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, YARN-101.4.patch, YARN-101.5.patch see the red color: org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java protected void startStatusUpdater() { new Thread(Node Status Updater) { @Override @SuppressWarnings(unchecked) public void run() { int lastHeartBeatID = 0; while (!isStopped) { // Send heartbeat try { synchronized (heartbeatMonitor) { heartbeatMonitor.wait(heartBeatInterval); } {color:red} // Before we send the heartbeat, we get the NodeStatus, // whose method removes completed containers. NodeStatus nodeStatus = getNodeStatus(); {color} nodeStatus.setResponseId(lastHeartBeatID); NodeHeartbeatRequest request = recordFactory .newRecordInstance(NodeHeartbeatRequest.class); request.setNodeStatus(nodeStatus); {color:red} // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here. HeartbeatResponse response = resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); {color} if (response.getNodeAction() == NodeAction.SHUTDOWN) { LOG .info(Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat, + hence shutting down.); NodeStatusUpdaterImpl.this.stop(); break; } if (response.getNodeAction() == NodeAction.REBOOT) { LOG.info(Node is out of sync with ResourceManager, + hence rebooting.); NodeStatusUpdaterImpl.this.reboot(); break; } lastHeartBeatID = response.getResponseId(); ListContainerId containersToCleanup = response .getContainersToCleanupList(); if (containersToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedContainersEvent(containersToCleanup)); } ListApplicationId appsToCleanup = response.getApplicationsToCleanupList(); //Only start tracking for keepAlive on FINISH_APP trackAppsForKeepAlive(appsToCleanup); if (appsToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedAppsEvent(appsToCleanup)); } } catch (Throwable e) { // TODO Better error handling. Thread can die with the rest of the // NM still running.
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620291#comment-13620291 ] Hadoop QA commented on YARN-479: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576654/YARN-479.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/650//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/650//console This message is automatically generated. NM retry behavior for connection to RM should be similar for lost heartbeats Key: YARN-479 URL: https://issues.apache.org/jira/browse/YARN-479 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Jian He Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, YARN-479.4.patch, YARN-479.5.patch Regardless of connection loss at the start or at an intermediate point, NM's retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620326#comment-13620326 ] Robert Joseph Evans commented on YARN-528: -- I am fine with splitting the MR changes from the YARN change like I said, I put this out here more to be a question of how do we want to go about implementing theses changes, and the test was more of a prototype example. I personally lean more towards using the *Proto classes directly. Why have something else wrapping it if we don't need it, even if it is a small and simple layer. The only reason I did not go that route here is because of toString(). With the IDs we rely on having ID.toString() turn into something very specific that can be parsed and turned back into an instance of the object. If I had the time I would trace down all places where we call toString on them and replace it with something else. I may just scale back the scope of the patch to look at ApplicationID to begin with and try to see if I can accomplish this. bq. Wrapping the object which came over the wire - with a goal of creating fewer objects. I really don't understand how this is supposed to work. How do we create fewer objects by wrapping them in more objects? I can see us doing something like deduping the objects that come over the wire, but I don't see how wrapping works here. Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Sub-task Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: YARN-528.txt, YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-532) RMAdminProtocolPBClientImpl should implement Closeable
Siddharth Seth created YARN-532: --- Summary: RMAdminProtocolPBClientImpl should implement Closeable Key: YARN-532 URL: https://issues.apache.org/jira/browse/YARN-532 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha Reporter: Siddharth Seth Assignee: Siddharth Seth Required for RPC.stopProxy to work. Already done in most of the other protocols. (MAPREDUCE-5117 addressing the one other protocol missing this) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-532) RMAdminProtocolPBClientImpl should implement Closeable
[ https://issues.apache.org/jira/browse/YARN-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-532: Attachment: YARN-532.txt Trivial fix. RMAdminProtocolPBClientImpl should implement Closeable -- Key: YARN-532 URL: https://issues.apache.org/jira/browse/YARN-532 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: YARN-532.txt Required for RPC.stopProxy to work. Already done in most of the other protocols. (MAPREDUCE-5117 addressing the one other protocol missing this) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620348#comment-13620348 ] Siddharth Seth commented on YARN-528: - bq. I really don't understand how this is supposed to work. How do we create fewer objects by wrapping them in more objects? I can see us doing something like deduping the objects that come over the wire, but I don't see how wrapping works here. Not compared to using Protos directly (which wasn't really an option), but compared to an alternate of converting only for the RPC layer. Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Sub-task Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: YARN-528.txt, YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-532) RMAdminProtocolPBClientImpl should implement Closeable
[ https://issues.apache.org/jira/browse/YARN-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620362#comment-13620362 ] Hadoop QA commented on YARN-532: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576674/YARN-532.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/651//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/651//console This message is automatically generated. RMAdminProtocolPBClientImpl should implement Closeable -- Key: YARN-532 URL: https://issues.apache.org/jira/browse/YARN-532 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: YARN-532.txt Required for RPC.stopProxy to work. Already done in most of the other protocols. (MAPREDUCE-5117 addressing the one other protocol missing this) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-193: - Attachment: YARN-193.12.patch 1. Remove the DISABLE_RESOURCELIMIT_CHECK feature, and its related test cases. 2. Rewrite the log messages, and output them through LOG.warn. 3. Add javadocs for InvalidResourceRequestException. 4. Check whether thrown exception is InvalidResourceRequestException in TestClientRMService. 5. Add the test case of ask max in TestSchedulerUtils. 6. Fixed other minor issues commented by Bikas and Hitesh (e.g., typo, unnecessary import). 7. Rebase with YARN-382. Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-467: --- Attachment: yarn-467-20130402.patch Fixing below issues 1) all the formatting issues 2) adding one additional test case for checking Directory state transition from FULL-NON_FULL-FULL 3) javadoc warnings Jobs fail during resource localization when public distributed-cache hits unix directory limits --- Key: YARN-467 URL: https://issues.apache.org/jira/browse/YARN-467 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, yarn-467-20130322.3.patch, yarn-467-20130322.patch, yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, yarn-467-20130401.patch, yarn-467-20130402.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception. java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) we need to have a mechanism where in we can create directory hierarchy and limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620384#comment-13620384 ] Hadoop QA commented on YARN-467: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576681/yarn-467-20130402.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/652//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/652//console This message is automatically generated. Jobs fail during resource localization when public distributed-cache hits unix directory limits --- Key: YARN-467 URL: https://issues.apache.org/jira/browse/YARN-467 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, yarn-467-20130322.3.patch, yarn-467-20130322.patch, yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, yarn-467-20130401.patch, yarn-467-20130402.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception. java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) we need to have a mechanism where in we can create directory hierarchy and limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620385#comment-13620385 ] Hadoop QA commented on YARN-193: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576680/YARN-193.12.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/653//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/653//console This message is automatically generated. Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-467: --- Attachment: yarn-467-20130402.1.patch fixing test issue... that check is no longer valid. Jobs fail during resource localization when public distributed-cache hits unix directory limits --- Key: YARN-467 URL: https://issues.apache.org/jira/browse/YARN-467 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, yarn-467-20130322.3.patch, yarn-467-20130322.patch, yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception. java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) we need to have a mechanism where in we can create directory hierarchy and limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620412#comment-13620412 ] Hadoop QA commented on YARN-467: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576688/yarn-467-20130402.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/654//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/654//console This message is automatically generated. Jobs fail during resource localization when public distributed-cache hits unix directory limits --- Key: YARN-467 URL: https://issues.apache.org/jira/browse/YARN-467 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, yarn-467-20130322.3.patch, yarn-467-20130322.patch, yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception. java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) we need to have a mechanism where in we can create directory hierarchy and limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-533) Pointing to the config property when throwing/logging the config-related exception
Zhijie Shen created YARN-533: Summary: Pointing to the config property when throwing/logging the config-related exception Key: YARN-533 URL: https://issues.apache.org/jira/browse/YARN-533 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen When throwing/logging errors related to configiguration, we should always point to the configuration property to let users know which property needs to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-495) Containers are not terminated when the NM is rebooted
[ https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-495: - Attachment: YARN-495.2.patch Containers are not terminated when the NM is rebooted - Key: YARN-495 URL: https://issues.apache.org/jira/browse/YARN-495 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-495.1.patch, YARN-495.2.patch When a reboot command is sent from RM, the node manager doesn't clean up the containers while its stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-495) Containers are not terminated when the NM is rebooted
[ https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620443#comment-13620443 ] Jian He commented on YARN-495: -- Uploaded a patch, change NM behavior from REBOOT to RESYNC when the RM restarted Containers are not terminated when the NM is rebooted - Key: YARN-495 URL: https://issues.apache.org/jira/browse/YARN-495 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-495.1.patch, YARN-495.2.patch When a reboot command is sent from RM, the node manager doesn't clean up the containers while its stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-495) Containers are not terminated when the NM is rebooted
[ https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620448#comment-13620448 ] Hadoop QA commented on YARN-495: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576695/YARN-495.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/655//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/655//console This message is automatically generated. Containers are not terminated when the NM is rebooted - Key: YARN-495 URL: https://issues.apache.org/jira/browse/YARN-495 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-495.1.patch, YARN-495.2.patch When a reboot command is sent from RM, the node manager doesn't clean up the containers while its stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-534) AM max attempts is not checked when RM restart and try to recover attempts
Jian He created YARN-534: Summary: AM max attempts is not checked when RM restart and try to recover attempts Key: YARN-534 URL: https://issues.apache.org/jira/browse/YARN-534 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Currently,AM max attempts is only checked if the current attempt fails and check to see whether to create new attempt. If the RM restarts before the max-attempt fails, it'll not clean the state store, when RM comes back, it will retry attempt again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-458) Resource manager address must be placed in four different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-458: Attachment: YARN-458.patch Resource manager address must be placed in four different configs - Key: YARN-458 URL: https://issues.apache.org/jira/browse/YARN-458 Project: Hadoop YARN Issue Type: Bug Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-458.patch The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address A new user trying to configure a cluster needs to know the names of all these four configs. It would be much easier if they could simply specify yarn.resourcemanager.address and default ports for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-458) Resource manager address must be placed in four different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620489#comment-13620489 ] Sandy Ryza commented on YARN-458: - Uploaded a patch that adds yarn.resourcemanager.hostname and yarn.nodemanager.hostname properties, and changes all the other configs to use ${yarn.resourcemanager.address} and ${yarn.nodemanager.address). Resource manager address must be placed in four different configs - Key: YARN-458 URL: https://issues.apache.org/jira/browse/YARN-458 Project: Hadoop YARN Issue Type: Bug Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-458.patch The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address A new user trying to configure a cluster needs to know the names of all these four configs. It would be much easier if they could simply specify yarn.resourcemanager.address and default ports for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-458: Summary: YARN daemon addresses must be placed in many different configs (was: Resource manager address must be placed in four different configs) YARN daemon addresses must be placed in many different configs -- Key: YARN-458 URL: https://issues.apache.org/jira/browse/YARN-458 Project: Hadoop YARN Issue Type: Bug Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-458.patch The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address A new user trying to configure a cluster needs to know the names of all these four configs. It would be much easier if they could simply specify yarn.resourcemanager.address and default ports for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-458: Description: The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address A new user trying to configure a cluster needs to know the names of all these four configs. The same issue exists for nodemanagers. It would be much easier if they could simply specify yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports for the other ones would kick in. was: The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address A new user trying to configure a cluster needs to know the names of all these four configs. It would be much easier if they could simply specify yarn.resourcemanager.address and default ports for the other ones would kick in. YARN daemon addresses must be placed in many different configs -- Key: YARN-458 URL: https://issues.apache.org/jira/browse/YARN-458 Project: Hadoop YARN Issue Type: Bug Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-458.patch The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address A new user trying to configure a cluster needs to know the names of all these four configs. The same issue exists for nodemanagers. It would be much easier if they could simply specify yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-458: Affects Version/s: 2.0.3-alpha YARN daemon addresses must be placed in many different configs -- Key: YARN-458 URL: https://issues.apache.org/jira/browse/YARN-458 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-458.patch The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address A new user trying to configure a cluster needs to know the names of all these four configs. The same issue exists for nodemanagers. It would be much easier if they could simply specify yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-458: Component/s: resourcemanager nodemanager YARN daemon addresses must be placed in many different configs -- Key: YARN-458 URL: https://issues.apache.org/jira/browse/YARN-458 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-458.patch The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address A new user trying to configure a cluster needs to know the names of all these four configs. The same issue exists for nodemanagers. It would be much easier if they could simply specify yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620501#comment-13620501 ] Hadoop QA commented on YARN-458: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576699/YARN-458.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/656//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/656//console This message is automatically generated. YARN daemon addresses must be placed in many different configs -- Key: YARN-458 URL: https://issues.apache.org/jira/browse/YARN-458 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-458.patch The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address A new user trying to configure a cluster needs to know the names of all these four configs. The same issue exists for nodemanagers. It would be much easier if they could simply specify yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620535#comment-13620535 ] Omkar Vinit Joshi commented on YARN-467: I have tested this code for below scenarios * I used 4 local-dirs to see if the localization gets distributed across them and LocalCacheDirectoryManager is managing them separately * I tested for various values of yarn.nodemanager.local-cache.max-files-per-directory =36, 37 , 40 and much larger.. * I modified the cache cleanup interval and cache target size in mb to see older files getting removed from cache and LocalCacheDirectoryManager's sub directories are getting reused. * I tested that we never run into a situation where we have more number of files or sub directories in any local-directory than what is specified in the configuration. Jobs fail during resource localization when public distributed-cache hits unix directory limits --- Key: YARN-467 URL: https://issues.apache.org/jira/browse/YARN-467 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, yarn-467-20130322.3.patch, yarn-467-20130322.patch, yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception. java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) we need to have a mechanism where in we can create directory hierarchy and limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-467: --- Attachment: yarn-467-20130402.2.patch Jobs fail during resource localization when public distributed-cache hits unix directory limits --- Key: YARN-467 URL: https://issues.apache.org/jira/browse/YARN-467 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, yarn-467-20130322.3.patch, yarn-467-20130322.patch, yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.2.patch, yarn-467-20130402.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception. java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) we need to have a mechanism where in we can create directory hierarchy and limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620546#comment-13620546 ] Hadoop QA commented on YARN-467: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576705/yarn-467-20130402.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/657//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/657//console This message is automatically generated. Jobs fail during resource localization when public distributed-cache hits unix directory limits --- Key: YARN-467 URL: https://issues.apache.org/jira/browse/YARN-467 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, yarn-467-20130322.3.patch, yarn-467-20130322.patch, yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.2.patch, yarn-467-20130402.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception. java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) we need to have a mechanism where in we can create directory hierarchy and limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-101: --- Attachment: YARN-101.6.patch recreate test case to verify status of all containers in every heartbeat If the heartbeat message loss, the nodestatus info of complete container will loss too. Key: YARN-101 URL: https://issues.apache.org/jira/browse/YARN-101 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: suse. Reporter: xieguiming Assignee: Xuan Gong Priority: Minor Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, YARN-101.4.patch, YARN-101.5.patch, YARN-101.6.patch see the red color: org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java protected void startStatusUpdater() { new Thread(Node Status Updater) { @Override @SuppressWarnings(unchecked) public void run() { int lastHeartBeatID = 0; while (!isStopped) { // Send heartbeat try { synchronized (heartbeatMonitor) { heartbeatMonitor.wait(heartBeatInterval); } {color:red} // Before we send the heartbeat, we get the NodeStatus, // whose method removes completed containers. NodeStatus nodeStatus = getNodeStatus(); {color} nodeStatus.setResponseId(lastHeartBeatID); NodeHeartbeatRequest request = recordFactory .newRecordInstance(NodeHeartbeatRequest.class); request.setNodeStatus(nodeStatus); {color:red} // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here. HeartbeatResponse response = resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); {color} if (response.getNodeAction() == NodeAction.SHUTDOWN) { LOG .info(Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat, + hence shutting down.); NodeStatusUpdaterImpl.this.stop(); break; } if (response.getNodeAction() == NodeAction.REBOOT) { LOG.info(Node is out of sync with ResourceManager, + hence rebooting.); NodeStatusUpdaterImpl.this.reboot(); break; } lastHeartBeatID = response.getResponseId(); ListContainerId containersToCleanup = response .getContainersToCleanupList(); if (containersToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedContainersEvent(containersToCleanup)); } ListApplicationId appsToCleanup = response.getApplicationsToCleanupList(); //Only start tracking for keepAlive on FINISH_APP trackAppsForKeepAlive(appsToCleanup); if (appsToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedAppsEvent(appsToCleanup)); } } catch (Throwable e) { // TODO Better error handling. Thread can die with the rest of the // NM still running. LOG.error(Caught exception in status-updater, e); } } } }.start(); } private NodeStatus getNodeStatus() { NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class); nodeStatus.setNodeId(this.nodeId); int numActiveContainers = 0; ListContainerStatus containersStatuses = new ArrayListContainerStatus(); for (IteratorEntryContainerId, Container i = this.context.getContainers().entrySet().iterator(); i.hasNext();) { EntryContainerId, Container e = i.next(); ContainerId containerId = e.getKey(); Container container = e.getValue(); // Clone the container to send it to the RM org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = container.cloneAndGetContainerStatus(); containersStatuses.add(containerStatus); ++numActiveContainers; LOG.info(Sending out status for container: + containerStatus); {color:red} // Here is the part that removes the completed containers. if (containerStatus.getState() == ContainerState.COMPLETE) { // Remove i.remove(); {color} LOG.info(Removed completed container + containerId); } }
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620608#comment-13620608 ] Vinod Kumar Vavilapalli commented on YARN-467: -- Perfect, the latest patch looks good. Checking it in. Jobs fail during resource localization when public distributed-cache hits unix directory limits --- Key: YARN-467 URL: https://issues.apache.org/jira/browse/YARN-467 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, yarn-467-20130322.3.patch, yarn-467-20130322.patch, yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.2.patch, yarn-467-20130402.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception. java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) we need to have a mechanism where in we can create directory hierarchy and limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620610#comment-13620610 ] Hadoop QA commented on YARN-101: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576714/YARN-101.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/658//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/658//console This message is automatically generated. If the heartbeat message loss, the nodestatus info of complete container will loss too. Key: YARN-101 URL: https://issues.apache.org/jira/browse/YARN-101 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: suse. Reporter: xieguiming Assignee: Xuan Gong Priority: Minor Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, YARN-101.4.patch, YARN-101.5.patch, YARN-101.6.patch see the red color: org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java protected void startStatusUpdater() { new Thread(Node Status Updater) { @Override @SuppressWarnings(unchecked) public void run() { int lastHeartBeatID = 0; while (!isStopped) { // Send heartbeat try { synchronized (heartbeatMonitor) { heartbeatMonitor.wait(heartBeatInterval); } {color:red} // Before we send the heartbeat, we get the NodeStatus, // whose method removes completed containers. NodeStatus nodeStatus = getNodeStatus(); {color} nodeStatus.setResponseId(lastHeartBeatID); NodeHeartbeatRequest request = recordFactory .newRecordInstance(NodeHeartbeatRequest.class); request.setNodeStatus(nodeStatus); {color:red} // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here. HeartbeatResponse response = resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); {color} if (response.getNodeAction() == NodeAction.SHUTDOWN) { LOG .info(Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat, + hence shutting down.); NodeStatusUpdaterImpl.this.stop(); break; } if (response.getNodeAction() == NodeAction.REBOOT) { LOG.info(Node is out of sync with ResourceManager, + hence rebooting.); NodeStatusUpdaterImpl.this.reboot(); break; } lastHeartBeatID = response.getResponseId(); ListContainerId containersToCleanup = response .getContainersToCleanupList(); if (containersToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedContainersEvent(containersToCleanup)); } ListApplicationId appsToCleanup = response.getApplicationsToCleanupList(); //Only start tracking for keepAlive on FINISH_APP trackAppsForKeepAlive(appsToCleanup); if (appsToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedAppsEvent(appsToCleanup)); } } catch (Throwable e) { // TODO Better error handling. Thread can die with the rest of the // NM
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620617#comment-13620617 ] Hudson commented on YARN-467: - Integrated in Hadoop-trunk-Commit #3552 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3552/]) YARN-467. Modify public distributed cache to localize files such that no local directory hits unix file count limits and thus prevent job failures. Contributed by Omkar Vinit Joshi. (Revision 1463823) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463823 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheDirectoryManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalCacheDirectoryManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceRetention.java Jobs fail during resource localization when public distributed-cache hits unix directory limits --- Key: YARN-467 URL: https://issues.apache.org/jira/browse/YARN-467 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Fix For: 2.0.5-beta Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, yarn-467-20130322.3.patch, yarn-467-20130322.patch, yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.2.patch, yarn-467-20130402.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception. java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620638#comment-13620638 ] Bikas Saha commented on YARN-193: - Default value of max-vcores of 32 might be too high. Why is conf being set 2 times for each value? Same for vcores. {code} +conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 2048); +conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 1024); +try { + resourceManager.init(conf); + fail(Exception is expected because the min memory allocation is + + larger than the max memory allocation.); +} catch (YarnException e) { + // Exception is expected. +} {code} Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620651#comment-13620651 ] Zhijie Shen commented on YARN-193: -- {quote} Default value of max-vcores of 32 might be too high. {quote} Why 32 is originally used? In http://hortonworks.com/blog/apache-hadoop-yarn-background-and-an-overview/, it is said: 2012 – 16+ cores, 48-96GB of RAM, 12x2TB or 12x3TB of disk. How about we choosing 16? {quote} Why is conf being set 2 times for each value? Same for vcores. {quote} I'll fix the bug. Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira