[jira] [Commented] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB
[ https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078954#comment-14078954 ] Leitao Guo commented on YARN-2368: -- Thanks [~ozawa] for your comments. I deployed hadoop-2.3.0-cdh5.1.0 with 22-queue fairscheduler on my 20-node cluster. Two resourcemanagers are deployed exclusively on 10.153.80.8 and 10.153.80.18. Jobs are submitted from gridmix: {code} sudo -u mapred hadoop jar /usr/lib/hadoop-mapreduce/hadoop-gridmix.jar -Dgridmix.min.file.size=10485760 -Dgridmix.job-submission.use-queue-in-trace=true -Dgridmix.distributed-cache-emulation.enable=false -generate 34816m hdfs:///user/mapred/foo/ hdfs:///tmp/job-trace.json {code} job-trace.json is generated by Rumen, with 6,000 jobs, average #maptasks per job is 320 and average #reducetasks is 25. I found 3 times (gridmix tested more than 3 times) that resourcemanager failed when handle STATE_STORE_OP_FAILED event. At the same time, zookeeper throws out 'Len error IOException' {code} ... ... 2014-07-24 21:00:51,170 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.153.80.8:47135 2014-07-24 21:00:51,171 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x247678daa88001a at /10.153.80.8:47135 2014-07-24 21:00:51,171 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: 0x247678daa88001a 2014-07-24 21:00:51,171 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x247678daa88001a with negotiated timeout 1 for client /10.153.80.8:47135 2014-07-24 21:00:51,171 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth packet /10.153.80.8:47135 2014-07-24 21:00:51,172 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth success /10.153.80.8:47135 2014-07-24 21:00:51,186 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247678daa88001a due to java.io.IOException: Len error 1813411 2014-07-24 21:00:51,186 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.153.80.8:47135 which had sessionid 0x247678daa88001a ... ... 2014-07-25 22:10:08,919 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.153.80.8:50480 2014-07-25 22:10:08,921 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x247684586e70006 at /10.153.80.8:50480 2014-07-25 22:10:08,922 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x247684586e70006 with negotiated timeout 1 for client /10.153.80.8:50480 2014-07-25 22:10:08,922 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth packet /10.153.80.8:50480 2014-07-25 22:10:08,923 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth success /10.153.80.8:50480 2014-07-25 22:10:08,934 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247684586e70006 due to java.io.IOException: Len error 1530747 2014-07-25 22:10:08,934 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.153.80.8:50480 which had sessionid 0x247684586e70006 ... ... 2014-07-26 02:22:59,627 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.153.80.18:60588 2014-07-26 02:22:59,629 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x2476de7c1af0002 at /10.153.80.18:60588 2014-07-26 02:22:59,629 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x2476de7c1af0002 with negotiated timeout 1 for client /10.153.80.18:60588 2014-07-26 02:22:59,630 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth packet /10.153.80.18:60588 2014-07-26 02:22:59,630 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth success /10.153.80.18:60588 2014-07-26 02:22:59,648 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x2476de7c1af0002 due to java.io.IOException: Len error 1649043 2014-07-26 02:22:59,648 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.153.80.18:60588 which had sessionid
[jira] [Created] (YARN-2371) Wrong NMToken is issued when NM preserving restart with containers running
Hong Zhiguo created YARN-2371: - Summary: Wrong NMToken is issued when NM preserving restart with containers running Key: YARN-2371 URL: https://issues.apache.org/jira/browse/YARN-2371 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo When application is submitted with ApplicationSubmissionContext.getKeepContainersAcrossApplicationAttempts() == true, and NM is restarted with containers running, wrong NMToken is issued to AM through RegisterApplicationMasterResponse. See the NM log: {code} 2014-07-30 11:59:58,941 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Unauthorized request to start container.- NMToken for application attempt : appattempt_1406691610864_0002_01 was used for starting container with container token issued for application attempt : appattempt_1406691610864_0002_02 {code} The reason is in below code: {code} createAndGetNMToken(String applicationSubmitter, ApplicationAttemptId appAttemptId, Container container) { .. Token token = createNMToken(container.getId().getApplicationAttemptId(), container.getNodeId(), applicationSubmitter); .. } {code} appAttemptId instead of container.getId().getApplicationAttemptId() should be passed to createNMToken. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-18) Configurable Hierarchical Topology for YARN
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenwu Peng updated YARN-18: --- Attachment: YARN-18-v8-1.patch Sync up with latest changes on trunk base on [~djp]'s latest patch. Thanks a lot for Junping. Configurable Hierarchical Topology for YARN --- Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, HierachicalTopologyForYARNr1.pdf, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, MAPREDUCE-4309.patch, Pluggable topologies with NodeGroup for YARN.pdf, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.1.patch, YARN-18-v6.2.patch, YARN-18-v6.3.patch, YARN-18-v6.4.patch, YARN-18-v6.patch, YARN-18-v7.1.patch, YARN-18-v7.2.patch, YARN-18-v7.3.patch, YARN-18-v7.patch, YARN-18-v8-1.patch, YARN-18.patch Per discussion in the design lounge of Hadoop Summit 2013, we agreed to change the design of “Pluggable topologies with NodeGroup for YARN” to support a configurable hierarchical topology that makes adding additional locality layers simple. Please refer attached doc HierachicalTopologyForYARNr1.pdf for details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-18) Configurable Hierarchical Topology for YARN
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079016#comment-14079016 ] Hadoop QA commented on YARN-18: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658610/YARN-18-v8-1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4479//console This message is automatically generated. Configurable Hierarchical Topology for YARN --- Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, HierachicalTopologyForYARNr1.pdf, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, MAPREDUCE-4309.patch, Pluggable topologies with NodeGroup for YARN.pdf, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.1.patch, YARN-18-v6.2.patch, YARN-18-v6.3.patch, YARN-18-v6.4.patch, YARN-18-v6.patch, YARN-18-v7.1.patch, YARN-18-v7.2.patch, YARN-18-v7.3.patch, YARN-18-v7.patch, YARN-18-v8-1.patch, YARN-18.patch Per discussion in the design lounge of Hadoop Summit 2013, we agreed to change the design of “Pluggable topologies with NodeGroup for YARN” to support a configurable hierarchical topology that makes adding additional locality layers simple. Please refer attached doc HierachicalTopologyForYARNr1.pdf for details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-18) Configurable Hierarchical Topology for YARN
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenwu Peng updated YARN-18: --- Attachment: (was: YARN-18-v8-1.patch) Configurable Hierarchical Topology for YARN --- Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, HierachicalTopologyForYARNr1.pdf, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, MAPREDUCE-4309.patch, Pluggable topologies with NodeGroup for YARN.pdf, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.1.patch, YARN-18-v6.2.patch, YARN-18-v6.3.patch, YARN-18-v6.4.patch, YARN-18-v6.patch, YARN-18-v7.1.patch, YARN-18-v7.2.patch, YARN-18-v7.3.patch, YARN-18-v7.patch, YARN-18.patch Per discussion in the design lounge of Hadoop Summit 2013, we agreed to change the design of “Pluggable topologies with NodeGroup for YARN” to support a configurable hierarchical topology that makes adding additional locality layers simple. Please refer attached doc HierachicalTopologyForYARNr1.pdf for details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-18) Configurable Hierarchical Topology for YARN
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenwu Peng updated YARN-18: --- Attachment: YARN-18-v8.0.patch Pass the right file Configurable Hierarchical Topology for YARN --- Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, HierachicalTopologyForYARNr1.pdf, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, MAPREDUCE-4309.patch, Pluggable topologies with NodeGroup for YARN.pdf, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.1.patch, YARN-18-v6.2.patch, YARN-18-v6.3.patch, YARN-18-v6.4.patch, YARN-18-v6.patch, YARN-18-v7.1.patch, YARN-18-v7.2.patch, YARN-18-v7.3.patch, YARN-18-v7.patch, YARN-18-v8.0.patch, YARN-18.patch Per discussion in the design lounge of Hadoop Summit 2013, we agreed to change the design of “Pluggable topologies with NodeGroup for YARN” to support a configurable hierarchical topology that makes adding additional locality layers simple. Please refer attached doc HierachicalTopologyForYARNr1.pdf for details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079041#comment-14079041 ] duanfa commented on YARN-1149: -- I get this excepiton also, waiting update to 2.2.0 NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.2.0 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, YARN-1149.8.patch, YARN-1149.9.patch, YARN-1149_branch-2.1-beta.1.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-18) Configurable Hierarchical Topology for YARN
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079064#comment-14079064 ] Hadoop QA commented on YARN-18: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658611/YARN-18-v8.0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4480//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4480//console This message is automatically generated. Configurable Hierarchical Topology for YARN --- Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, HierachicalTopologyForYARNr1.pdf, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, MAPREDUCE-4309.patch, Pluggable topologies with NodeGroup for YARN.pdf, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.1.patch, YARN-18-v6.2.patch, YARN-18-v6.3.patch, YARN-18-v6.4.patch, YARN-18-v6.patch, YARN-18-v7.1.patch, YARN-18-v7.2.patch, YARN-18-v7.3.patch, YARN-18-v7.patch, YARN-18-v8.0.patch, YARN-18.patch Per discussion in the design lounge of Hadoop Summit 2013, we agreed to change the design of “Pluggable topologies with NodeGroup for YARN” to support a configurable hierarchical topology that makes adding additional locality layers simple. Please refer attached doc HierachicalTopologyForYARNr1.pdf for details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-18) Configurable Hierarchical Topology for YARN
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenwu Peng updated YARN-18: --- Attachment: YARN-18.v8.1.patch YARN-18.v8.1.patch fix the minor comment issue. Configurable Hierarchical Topology for YARN --- Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, HierachicalTopologyForYARNr1.pdf, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, MAPREDUCE-4309.patch, Pluggable topologies with NodeGroup for YARN.pdf, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.1.patch, YARN-18-v6.2.patch, YARN-18-v6.3.patch, YARN-18-v6.4.patch, YARN-18-v6.patch, YARN-18-v7.1.patch, YARN-18-v7.2.patch, YARN-18-v7.3.patch, YARN-18-v7.patch, YARN-18-v8.0.patch, YARN-18.patch, YARN-18.v8.1.patch Per discussion in the design lounge of Hadoop Summit 2013, we agreed to change the design of “Pluggable topologies with NodeGroup for YARN” to support a configurable hierarchical topology that makes adding additional locality layers simple. Please refer attached doc HierachicalTopologyForYARNr1.pdf for details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079104#comment-14079104 ] Milan Potocnik commented on YARN-1994: -- I agree it is tricky to hunt down all service endpoints and make sure they support proper hostname. Also, when new endpoints are added, they have to be aware of this convention. I guess some logic could be added to RPC.Server in the long run. Few notes, though. We have experienced issues on Windows, when client and service are on the same machine. It turns out that in certain situations when client is resolving the connect address (which has actual ('main') hostname), it does not go to DNS Server, but rather performs it locally and in some cases might return an unwanted IP address (since the machine itself is aware of all of its network interfaces). If special hostname is used ('hostname-IB' in my earlier example), resolve will go to DNS Server and everything will work. The proposed approach is not unprecedented, HDFS has similar functionality, you can specify custom hostnames for components (so that InetAddress.getLocalHost().getHostName() is never called). Please have a look at: - fs.default.fs - you can specify custom hostname that namenode will use - dfs.namenode.rpc-bind-host - can be set to 0.0.0.0 in that case - dfs.datanode.hostname - can be used to specify custom datanode hostname - dfs.datanode.address, dfs.datanode.ipc.address, etc... - can be set to 0.0.0.0 in that case - dfs.client.use.datanode.hostname and dfs.datanode.use.datanode.hostname also need to be set So I think it would make sense to have similar functionality available in YARN/MR as well. Thanks, Milan Expose YARN/MR endpoints on multiple interfaces --- Key: YARN-1994 URL: https://issues.apache.org/jira/browse/YARN-1994 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Craig Welch Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch YARN and MapReduce daemons currently do not support specifying a wildcard address for the server endpoints. This prevents the endpoints from being accessible from all interfaces on a multihomed machine. Note that if we do specify INADDR_ANY for any of the options, it will break clients as they will attempt to connect to 0.0.0.0. We need a solution that allows specifying a hostname or IP-address for clients while requesting wildcard bind for the servers. (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2371) Wrong NMToken is issued when NM preserving restart with containers running
[ https://issues.apache.org/jira/browse/YARN-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2371: -- Attachment: YARN-2371.patch Wrong NMToken is issued when NM preserving restart with containers running -- Key: YARN-2371 URL: https://issues.apache.org/jira/browse/YARN-2371 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Attachments: YARN-2371.patch When application is submitted with ApplicationSubmissionContext.getKeepContainersAcrossApplicationAttempts() == true, and NM is restarted with containers running, wrong NMToken is issued to AM through RegisterApplicationMasterResponse. See the NM log: {code} 2014-07-30 11:59:58,941 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Unauthorized request to start container.- NMToken for application attempt : appattempt_1406691610864_0002_01 was used for starting container with container token issued for application attempt : appattempt_1406691610864_0002_02 {code} The reason is in below code: {code} createAndGetNMToken(String applicationSubmitter, ApplicationAttemptId appAttemptId, Container container) { .. Token token = createNMToken(container.getId().getApplicationAttemptId(), container.getNodeId(), applicationSubmitter); .. } {code} appAttemptId instead of container.getId().getApplicationAttemptId() should be passed to createNMToken. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2371) Wrong NMToken is issued when NM preserving restarts with containers running
[ https://issues.apache.org/jira/browse/YARN-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2371: -- Summary: Wrong NMToken is issued when NM preserving restarts with containers running (was: Wrong NMToken is issued when NM preserving restart with containers running) Wrong NMToken is issued when NM preserving restarts with containers running --- Key: YARN-2371 URL: https://issues.apache.org/jira/browse/YARN-2371 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Attachments: YARN-2371.patch When application is submitted with ApplicationSubmissionContext.getKeepContainersAcrossApplicationAttempts() == true, and NM is restarted with containers running, wrong NMToken is issued to AM through RegisterApplicationMasterResponse. See the NM log: {code} 2014-07-30 11:59:58,941 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Unauthorized request to start container.- NMToken for application attempt : appattempt_1406691610864_0002_01 was used for starting container with container token issued for application attempt : appattempt_1406691610864_0002_02 {code} The reason is in below code: {code} createAndGetNMToken(String applicationSubmitter, ApplicationAttemptId appAttemptId, Container container) { .. Token token = createNMToken(container.getId().getApplicationAttemptId(), container.getNodeId(), applicationSubmitter); .. } {code} appAttemptId instead of container.getId().getApplicationAttemptId() should be passed to createNMToken. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2371) Wrong NMToken is issued when NM preserving restarts with containers running
[ https://issues.apache.org/jira/browse/YARN-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2371: -- Issue Type: Sub-task (was: Bug) Parent: YARN-1489 Wrong NMToken is issued when NM preserving restarts with containers running --- Key: YARN-2371 URL: https://issues.apache.org/jira/browse/YARN-2371 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Attachments: YARN-2371.patch When application is submitted with ApplicationSubmissionContext.getKeepContainersAcrossApplicationAttempts() == true, and NM is restarted with containers running, wrong NMToken is issued to AM through RegisterApplicationMasterResponse. See the NM log: {code} 2014-07-30 11:59:58,941 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Unauthorized request to start container.- NMToken for application attempt : appattempt_1406691610864_0002_01 was used for starting container with container token issued for application attempt : appattempt_1406691610864_0002_02 {code} The reason is in below code: {code} createAndGetNMToken(String applicationSubmitter, ApplicationAttemptId appAttemptId, Container container) { .. Token token = createNMToken(container.getId().getApplicationAttemptId(), container.getNodeId(), applicationSubmitter); .. } {code} appAttemptId instead of container.getId().getApplicationAttemptId() should be passed to createNMToken. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2372) There is Chinese Characters in the FairScheduler's document
Fengdong Yu created YARN-2372: - Summary: There is Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-2372: -- Summary: There are Chinese Characters in the FairScheduler's document (was: There is Chinese Characters in the FairScheduler's document) There are Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu reassigned YARN-2372: - Assignee: Fengdong Yu There are Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-2372.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-2372: -- Attachment: YARN-2372.patch There are Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Priority: Minor Attachments: YARN-2372.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079162#comment-14079162 ] Hadoop QA commented on YARN-2372: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658625/YARN-2372.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4481//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4481//console This message is automatically generated. There are Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-2372.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2328) FairScheduler: Verify update and continuous scheduling threads are stopped when the scheduler is stopped
[ https://issues.apache.org/jira/browse/YARN-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079196#comment-14079196 ] Hudson commented on YARN-2328: -- FAILURE: Integrated in Hadoop-Yarn-trunk #628 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/628/]) YARN-2328. FairScheduler: Verify update and continuous scheduling threads are stopped when the scheduler is stopped. (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1614432) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java FairScheduler: Verify update and continuous scheduling threads are stopped when the scheduler is stopped Key: YARN-2328 URL: https://issues.apache.org/jira/browse/YARN-2328 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.4.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Minor Attachments: yarn-2328-1.patch, yarn-2328-2.patch, yarn-2328-2.patch, yarn-2328-preview.patch FairScheduler threads can use a little cleanup and tests. To begin with, the update and continuous-scheduling threads should extend Thread and handle being interrupted. We should have tests for starting and stopping them as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2354) DistributedShell may allocate more containers than client specified after it restarts
[ https://issues.apache.org/jira/browse/YARN-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079195#comment-14079195 ] Hudson commented on YARN-2354: -- FAILURE: Integrated in Hadoop-Yarn-trunk #628 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/628/]) YARN-2354. DistributedShell may allocate more containers than client specified after AM restarts. Contributed by Li Lu (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1614538) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSFailedAppMaster.java DistributedShell may allocate more containers than client specified after it restarts - Key: YARN-2354 URL: https://issues.apache.org/jira/browse/YARN-2354 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Li Lu Fix For: 2.6.0 Attachments: YARN-2354-072514.patch, YARN-2354-072814.patch, YARN-2354-072914.patch To reproduce, run distributed shell with -num_containers option, In ApplicationMaster.java, the following code has some issue. {code} int numTotalContainersToRequest = numTotalContainers - previousAMRunningContainers.size(); for (int i = 0; i numTotalContainersToRequest; ++i) { ContainerRequest containerAsk = setupContainerAskForRM(); amRMClient.addContainerRequest(containerAsk); } numRequestedContainers.set(numTotalContainersToRequest); {code} numRequestedContainers doesn't account for previous AM's requested containers. so numRequestedContainers should be set to numTotalContainers -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2283) RM failed to release the AM container
[ https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079234#comment-14079234 ] Sunil G commented on YARN-2283: --- I tried to reproduce this and I found AM memory is immediately released. Could you please try to recur this and give the exact steps? RM failed to release the AM container - Key: YARN-2283 URL: https://issues.apache.org/jira/browse/YARN-2283 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Environment: NM1: AM running NM2: Map task running mapreduce.map.maxattempts=1 Reporter: Nishan Shetty Priority: Critical During container stability test i faced this problem While job is running map task got killed Observe that eventhough application is FAILED MRAppMaster process is running till timeout because RM did not release the AM container {code} 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1405318134611_0002_01_05 Container Transitioned from RUNNING to COMPLETED 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Completed container: container_1405318134611_0002_01_05 in state: COMPLETED event:FINISHED 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1405318134611_0002 CONTAINERID=container_1405318134611_0002_01_05 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: Finish information of container container_1405318134611_0002_01_05 is written 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: Stored the finish data of container container_1405318134611_0002_01_05 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: Released container container_1405318134611_0002_01_05 of capacity memory:1024, vCores:1 on host HOST-10-18-40-153:45026, which currently has 1 containers, memory:2048, vCores:1 used and memory:6144, vCores:7 available, release resources=true 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default used=memory:2048, vCores:1 numContainers=1 user=testos user-resources=memory:2048, vCores:1 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: completedContainer container=Container: [ContainerId: container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, NodeHttpAddress: HOST-10-18-40-153:45025, Resource: memory:1024, vCores:1, Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=memory:2048, vCores:1, usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster=memory:8192, vCores:8 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 used=memory:2048, vCores:1 cluster=memory:8192, vCores:8 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting completed queue: root.default stats: default: capacity=1.0, absoluteCapacity=1.0, usedResources=memory:2048, vCores:1, usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application attempt appattempt_1405318134611_0002_01 released container container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 #containers=1 available=6144 used=2048 with event: FINISHED 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1405318134611_0002_01 with final state: FINISHING 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1405318134611_0002 with final state: FINISHING 2014-07-14 14:43:34,947 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: NodeDataChanged with state:SyncConnected for
[jira] [Commented] (YARN-2354) DistributedShell may allocate more containers than client specified after it restarts
[ https://issues.apache.org/jira/browse/YARN-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079290#comment-14079290 ] Hudson commented on YARN-2354: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1821 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1821/]) YARN-2354. DistributedShell may allocate more containers than client specified after AM restarts. Contributed by Li Lu (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1614538) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSFailedAppMaster.java DistributedShell may allocate more containers than client specified after it restarts - Key: YARN-2354 URL: https://issues.apache.org/jira/browse/YARN-2354 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Li Lu Fix For: 2.6.0 Attachments: YARN-2354-072514.patch, YARN-2354-072814.patch, YARN-2354-072914.patch To reproduce, run distributed shell with -num_containers option, In ApplicationMaster.java, the following code has some issue. {code} int numTotalContainersToRequest = numTotalContainers - previousAMRunningContainers.size(); for (int i = 0; i numTotalContainersToRequest; ++i) { ContainerRequest containerAsk = setupContainerAskForRM(); amRMClient.addContainerRequest(containerAsk); } numRequestedContainers.set(numTotalContainersToRequest); {code} numRequestedContainers doesn't account for previous AM's requested containers. so numRequestedContainers should be set to numTotalContainers -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: YARN-2348.2.patch Please find the new patch to this issue in YARN-2348.2.patch. In this patch, resourcemanager server will format the date of Start/FinisheTime first, instead of rendering date in browser. ResourceManager web UI should display locale time instead of UTC time - Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: YARN-2348.2.patch, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: (was: 2.after-change.jpg) ResourceManager web UI should display locale time instead of UTC time - Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: YARN-2348.2.patch, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: (was: 1.before-change.jpg) ResourceManager web UI should display locale time instead of UTC time - Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: YARN-2348.2.patch, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: 4.after-patch.JPG 3.before-patch.JPG Here are the new snapshots of Web UI of my cluster before/after the patch. ResourceManager web UI should display locale time instead of UTC time - Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Summary: ResourceManager web UI should display server-side time instead of UTC time (was: ResourceManager web UI should display locale time instead of UTC time) ResourceManager web UI should display server-side time instead of UTC time -- Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Description: ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display server-side time in default. (was: ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display server-side time in default.) ResourceManager web UI should display server-side time instead of UTC time -- Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Description: ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display server-side time in default. was:ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display local time of users. ResourceManager web UI should display server-side time instead of UTC time -- Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079299#comment-14079299 ] Leitao Guo commented on YARN-2348: -- Hi [~aw] [~tucu00] [~raviprak] , thanks for your comments. I agree with you that the Web UI should display the time just the same as the server side. Please have a check of the new patch, thanks! ResourceManager web UI should display server-side time instead of UTC time -- Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2370) Made comment more accurate in AppSchedulingInfo.java
[ https://issues.apache.org/jira/browse/YARN-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079314#comment-14079314 ] Wenwu Peng commented on YARN-2370: -- Just update comments, don't need case to cover Made comment more accurate in AppSchedulingInfo.java Key: YARN-2370 URL: https://issues.apache.org/jira/browse/YARN-2370 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Wenwu Peng Assignee: Wenwu Peng Priority: Trivial Attachments: YARN-2370.0.patch in method allocateOffSwitch of AppSchedulingInfo.java, only invoke update OffRack request, the comment should be Update cloned OffRack requests for recovery not Update cloned RackLocal and OffRack requests for recover {code} // Update cloned RackLocal and OffRack requests for recovery resourceRequests.add(cloneResourceRequest(offSwitchRequest)); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2354) DistributedShell may allocate more containers than client specified after it restarts
[ https://issues.apache.org/jira/browse/YARN-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079339#comment-14079339 ] Hudson commented on YARN-2354: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1847 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1847/]) YARN-2354. DistributedShell may allocate more containers than client specified after AM restarts. Contributed by Li Lu (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1614538) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSFailedAppMaster.java DistributedShell may allocate more containers than client specified after it restarts - Key: YARN-2354 URL: https://issues.apache.org/jira/browse/YARN-2354 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Li Lu Fix For: 2.6.0 Attachments: YARN-2354-072514.patch, YARN-2354-072814.patch, YARN-2354-072914.patch To reproduce, run distributed shell with -num_containers option, In ApplicationMaster.java, the following code has some issue. {code} int numTotalContainersToRequest = numTotalContainers - previousAMRunningContainers.size(); for (int i = 0; i numTotalContainersToRequest; ++i) { ContainerRequest containerAsk = setupContainerAskForRM(); amRMClient.addContainerRequest(containerAsk); } numRequestedContainers.set(numTotalContainersToRequest); {code} numRequestedContainers doesn't account for previous AM's requested containers. so numRequestedContainers should be set to numTotalContainers -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079341#comment-14079341 ] Hadoop QA commented on YARN-2348: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658650/4.after-patch.JPG against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4482//console This message is automatically generated. ResourceManager web UI should display server-side time instead of UTC time -- Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2328) FairScheduler: Verify update and continuous scheduling threads are stopped when the scheduler is stopped
[ https://issues.apache.org/jira/browse/YARN-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079340#comment-14079340 ] Hudson commented on YARN-2328: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1847 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1847/]) YARN-2328. FairScheduler: Verify update and continuous scheduling threads are stopped when the scheduler is stopped. (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1614432) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java FairScheduler: Verify update and continuous scheduling threads are stopped when the scheduler is stopped Key: YARN-2328 URL: https://issues.apache.org/jira/browse/YARN-2328 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.4.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Minor Attachments: yarn-2328-1.patch, yarn-2328-2.patch, yarn-2328-2.patch, yarn-2328-preview.patch FairScheduler threads can use a little cleanup and tests. To begin with, the update and continuous-scheduling threads should extend Thread and handle being interrupted. We should have tests for starting and stopping them as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079371#comment-14079371 ] Craig Welch commented on YARN-1994: --- [~mipoto], it's unfortunate that there is that corner case behavior in Windows, that doesn't sound correct. In any case, I can see how you would want this behavior (binding to 0.0.0.0 but still specifying a particular address) and could think of it as an extension of the single-bind address behavior (in which case the address is communicated around just by virtue of the listener binding only to it and therefore returning it when requested). I also don't want to have us commit something which doesn't meet your full set of needs. I was concerned that in some cases the original patch changed behavior when bind-host was not present, I think our final patch needs to carry forward changes to avoid this. I also think there is value in looking at folding this into ipc.Server/Configuration/YarnConfiguration so that the overall flow is as it was before, there isn't the re-calculation of address all over (re-determination using the address, default address, default port), (to your comment about adding it to RPC.Server). [~arpitagarwal] [~xgong] I have a .9 patch which is lhe .7 patch with the additional tests and the change to the RPCUtil.getSocketAddr but which I never uploaded - you've actually reviewed the code already at different times, it's a combination of the current patch and .7, effectively. I would like to take one more look at doing it the way I have outlined above, if that proves simple enough I think it will be preferable over the long term and will support Milan's case - if not, we can go with the .9 and I'll upload it. Now that I understand it more, I can test it by specifying a hostname which is not the host's actual hostname in my test cluster... Expose YARN/MR endpoints on multiple interfaces --- Key: YARN-1994 URL: https://issues.apache.org/jira/browse/YARN-1994 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Craig Welch Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch YARN and MapReduce daemons currently do not support specifying a wildcard address for the server endpoints. This prevents the endpoints from being accessible from all interfaces on a multihomed machine. Note that if we do specify INADDR_ANY for any of the options, it will break clients as they will attempt to connect to 0.0.0.0. We need a solution that allows specifying a hostname or IP-address for clients while requesting wildcard bind for the servers. (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated YARN-2348: Attachment: YARN-2348.2.patch Re-upload since the last build is against a JPG file... ResourceManager web UI should display server-side time instead of UTC time -- Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch, YARN-2348.2.patch, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079423#comment-14079423 ] Leitao Guo commented on YARN-2348: -- [~chengbing.liu] thanks, Bing! ResourceManager web UI should display server-side time instead of UTC time -- Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch, YARN-2348.2.patch, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079428#comment-14079428 ] wangmeng commented on YARN-2348: very good! my leaders! -- 发自 Android 网易邮箱 在2014年07月31日 00:08,Leitao Guo (JIRA)写道: [ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079423#comment-14079423 ] Leitao Guo commented on YARN-2348: -- [~chengbing.liu] thanks, Bing! ResourceManager web UI should display server-side time instead of UTC time -- Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch, YARN-2348.2.patch, YARN-2348.patchResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.2#6252) ResourceManager web UI should display server-side time instead of UTC time -- Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch, YARN-2348.2.patch, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079534#comment-14079534 ] Hadoop QA commented on YARN-2348: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658662/YARN-2348.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService org.apache.hadoop.yarn.server.resourcemanager.TestRMHA org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4483//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4483//console This message is automatically generated. ResourceManager web UI should display server-side time instead of UTC time -- Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch, YARN-2348.2.patch, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2008: -- Attachment: YARN-2008.3.patch Added better check for invalid divisor, bounds check for absoluteMaxAvail CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He Assignee: Craig Welch Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | | / | \ | | L1ParentQueue1 | | L1ParentQueue2| | (allowed to use up 80% of its parent)| | (allowed to use 20% in minimum of its parent)| |/ | \ || | L2LeafQueue1 |L2LeafQueue2 | | |(50% of its parent) | (50% of its parent in minimum) | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken
[ https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079701#comment-14079701 ] Jason Lowe commented on YARN-2208: -- This appears to have broken backwards compatibility with the previous release, since the new RM cannot load an old AMRM token persisted in the state store. A sample exception where the new RM starts with old RM state: {noformat} 2014-07-30 11:09:17,041 FATAL [main] resourcemanager.ResourceManager (ResourceManager.java:main(1050)) - Error starting ResourceManager org.apache.hadoop.service.ServiceStateException: java.io.EOFException at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:837) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:877) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:874) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:874) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:918) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1047) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.yarn.security.AMRMTokenIdentifier.readFields(AMRMTokenIdentifier.java:87) at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:142) at org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager.addPersistedPassword(AMRMTokenSecretManager.java:205) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:740) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:710) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:676) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:425) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1030) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:489) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) ... 10 more {noformat} I realize this is supposed to be fixed eventually under YARN-668, but in the interim token changes like this and YARN-2152 are routinely breaking the ability to do upgrades without wiping the YARN state stores of the cluster. Arguably this should either be marked as an incompatible change or the release note should state that the RM state store needs to be wiped when upgrading. AMRMTokenManager need to have a way to roll over AMRMToken -- Key: YARN-2208 URL: https://issues.apache.org/jira/browse/YARN-2208 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch, YARN-2208.7.patch, YARN-2208.8.patch, YARN-2208.8.patch, YARN-2208.8.patch, YARN-2208.9.patch, YARN-2208.9.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-1994: -- Attachment: YARN-1994.13.patch Reversion to earlier logic with updateConnectAddr, with added tests old behavior for non bind-host cases Expose YARN/MR endpoints on multiple interfaces --- Key: YARN-1994 URL: https://issues.apache.org/jira/browse/YARN-1994 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Craig Welch Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, YARN-1994.13.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch YARN and MapReduce daemons currently do not support specifying a wildcard address for the server endpoints. This prevents the endpoints from being accessible from all interfaces on a multihomed machine. Note that if we do specify INADDR_ANY for any of the options, it will break clients as they will attempt to connect to 0.0.0.0. We need a solution that allows specifying a hostname or IP-address for clients while requesting wildcard bind for the servers. (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken
[ https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079722#comment-14079722 ] Xuan Gong commented on YARN-2208: - After YARN-2211, the AMRMTokenManager state will be saved separately. We only save the currentMasterKey and NextMasterKey. And I will remove the AMRMToken from AppAttemptState. In that case, when we recover the attempt, we will not recover the AMRMToken directly from AppAttemptState. So, I think this issue for AMRMToken will be gone. AMRMTokenManager need to have a way to roll over AMRMToken -- Key: YARN-2208 URL: https://issues.apache.org/jira/browse/YARN-2208 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch, YARN-2208.7.patch, YARN-2208.8.patch, YARN-2208.8.patch, YARN-2208.8.patch, YARN-2208.9.patch, YARN-2208.9.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079732#comment-14079732 ] Craig Welch commented on YARN-1994: --- I've uploaded the .9 patch (renamed to .13) referenced above, also planning to take forward the .12 patch with the refactored version which supports overriding the host with address entry in bind-host cases Expose YARN/MR endpoints on multiple interfaces --- Key: YARN-1994 URL: https://issues.apache.org/jira/browse/YARN-1994 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Craig Welch Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, YARN-1994.13.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch YARN and MapReduce daemons currently do not support specifying a wildcard address for the server endpoints. This prevents the endpoints from being accessible from all interfaces on a multihomed machine. Note that if we do specify INADDR_ANY for any of the options, it will break clients as they will attempt to connect to 0.0.0.0. We need a solution that allows specifying a hostname or IP-address for clients while requesting wildcard bind for the servers. (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken
[ https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079756#comment-14079756 ] Jian He commented on YARN-2208: --- [~jlowe], thanks for noting this. YARN-2212 is an immediate follow-up patch and will fix this in the same release. AMRMTokenManager need to have a way to roll over AMRMToken -- Key: YARN-2208 URL: https://issues.apache.org/jira/browse/YARN-2208 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch, YARN-2208.7.patch, YARN-2208.8.patch, YARN-2208.8.patch, YARN-2208.8.patch, YARN-2208.9.patch, YARN-2208.9.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
Larry McCay created YARN-2373: - Summary: WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords Key: YARN-2373 URL: https://issues.apache.org/jira/browse/YARN-2373 Project: Hadoop YARN Issue Type: Bug Reporter: Larry McCay As part of HADOOP-10904, this jira represents a change to WebAppUtils to uptake the use of the credential provider API through the new method on Configuration called getPassword. This provides an alternative to storing the passwords in clear text within the ssl-server.xml file while maintaining backward compatibility with that behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079785#comment-14079785 ] Hadoop QA commented on YARN-2008: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658692/YARN-2008.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4484//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4484//console This message is automatically generated. CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He Assignee: Craig Welch Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | | / | \ | | L1ParentQueue1 | | L1ParentQueue2| | (allowed to use up 80% of its parent)| | (allowed to use 20% in minimum of its parent)| |/ | \ || | L2LeafQueue1 |L2LeafQueue2 | | |(50% of its parent) | (50% of its parent in minimum) | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079871#comment-14079871 ] Hadoop QA commented on YARN-1994: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658703/YARN-1994.13.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4485//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4485//console This message is automatically generated. Expose YARN/MR endpoints on multiple interfaces --- Key: YARN-1994 URL: https://issues.apache.org/jira/browse/YARN-1994 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Craig Welch Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, YARN-1994.13.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch YARN and MapReduce daemons currently do not support specifying a wildcard address for the server endpoints. This prevents the endpoints from being accessible from all interfaces on a multihomed machine. Note that if we do specify INADDR_ANY for any of the options, it will break clients as they will attempt to connect to 0.0.0.0. We need a solution that allows specifying a hostname or IP-address for clients while requesting wildcard bind for the servers. (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically
[ https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079876#comment-14079876 ] Jian He commented on YARN-2212: --- - AMS#registerApplicationMaster changes not needed. - May not say stable now. {code} @Stable public abstract Token getAMRMToken(); {code} - ApplicationReport#getAMRMToken for unmanaged AM needs to be updated as well. - we can move the AMRMToken creation from RMAppAttemptImpl to AMLauncher? - {code} Use newInstance instead. BuilderUtils.newAMRMToken( amrmToken.getIdentifier(), amrmToken.getKind().toString(), amrmToken.getPassword(), amrmToken.getService().toString()) {code} - Test AMRMClient automatically takes care of the new AMRMToken transfer. - Please run on real cluster also and set roll-over interval to a small value to make sure it actually works. ApplicationMaster needs to find a way to update the AMRMToken periodically -- Key: YARN-2212 URL: https://issues.apache.org/jira/browse/YARN-2212 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2212.1.patch, YARN-2212.2.patch, YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2374) YARN trunk build failing TestDistributedShell.testDSShell
Varun Vasudev created YARN-2374: --- Summary: YARN trunk build failing TestDistributedShell.testDSShell Key: YARN-2374 URL: https://issues.apache.org/jira/browse/YARN-2374 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev The YARN trunk build has been failing for the last few days in the distributed shell module. {noformat} testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 27.269 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:188) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2374) YARN trunk build failing TestDistributedShell.testDSShell
[ https://issues.apache.org/jira/browse/YARN-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080240#comment-14080240 ] Li Lu commented on YARN-2374: - Thanks [~vvasudev]! This problem has been bothered me for two issues, YARN-2354 and YARN-2295. Seems like this is connected with the network settings of the server, causing the following lines to fail {code} if (appReport.getHost().startsWith(hostName) appReport.getRpcPort() == -1) { verified = true; } {code} If such check failed, verified will never be set to true, hence the test will fail. I haven't looked into it in details, but I think the hostName may be something you'd like to start with. YARN trunk build failing TestDistributedShell.testDSShell - Key: YARN-2374 URL: https://issues.apache.org/jira/browse/YARN-2374 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev The YARN trunk build has been failing for the last few days in the distributed shell module. {noformat} testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 27.269 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:188) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-1994: -- Attachment: YARN-1994.14.patch Extension of the .12 patch with updateConnectAddr support for overriding the connection address in bind-host cases based on the services address config, with the additional tests and a refactor to overload the old methods in Configuration instead of move to RPCUtil, in all cases where there is no bind-host behavior should be as it was before the change Expose YARN/MR endpoints on multiple interfaces --- Key: YARN-1994 URL: https://issues.apache.org/jira/browse/YARN-1994 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Craig Welch Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch YARN and MapReduce daemons currently do not support specifying a wildcard address for the server endpoints. This prevents the endpoints from being accessible from all interfaces on a multihomed machine. Note that if we do specify INADDR_ANY for any of the options, it will break clients as they will attempt to connect to 0.0.0.0. We need a solution that allows specifying a hostname or IP-address for clients while requesting wildcard bind for the servers. (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080297#comment-14080297 ] Craig Welch commented on YARN-1994: --- BTW, .14 patch tested successfully with host names specified to not be the value from InetAddress.getLocalHost().getHostName() Expose YARN/MR endpoints on multiple interfaces --- Key: YARN-1994 URL: https://issues.apache.org/jira/browse/YARN-1994 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Craig Welch Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch YARN and MapReduce daemons currently do not support specifying a wildcard address for the server endpoints. This prevents the endpoints from being accessible from all interfaces on a multihomed machine. Note that if we do specify INADDR_ANY for any of the options, it will break clients as they will attempt to connect to 0.0.0.0. We need a solution that allows specifying a hostname or IP-address for clients while requesting wildcard bind for the servers. (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2317) Update documentation about how to write YARN applications
[ https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-2317: Attachment: YARN-2317-073014.patch Revised: fixed a few typos, added discussions on deprecated protocol based programming model, and added pointers to synchronous clients. Update documentation about how to write YARN applications - Key: YARN-2317 URL: https://issues.apache.org/jira/browse/YARN-2317 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Li Lu Assignee: Li Lu Fix For: 2.6.0 Attachments: YARN-2317-071714.patch, YARN-2317-073014.patch Some information in WritingYarnApplications webpage is out-dated. Need some refresh work on this document to reflect the most recent changes in YARN APIs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2317) Update documentation about how to write YARN applications
[ https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-2317: Attachment: YARN-2317-073014-1.patch Fixed a typo. Update documentation about how to write YARN applications - Key: YARN-2317 URL: https://issues.apache.org/jira/browse/YARN-2317 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Li Lu Assignee: Li Lu Fix For: 2.6.0 Attachments: YARN-2317-071714.patch, YARN-2317-073014-1.patch, YARN-2317-073014.patch Some information in WritingYarnApplications webpage is out-dated. Need some refresh work on this document to reflect the most recent changes in YARN APIs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080388#comment-14080388 ] Wangda Tan commented on YARN-2008: -- Hi [~cwelch], Thanks for uploading patch, +1 for putting isInvalidDivisor to {{ResourceCalculator}}. I would suggest to add some resource usage to L2Q1 in {{testAbsoluteMaxAvailCapacityWithUse}}, and see if L2Q2 can get correct maxAbsoluteAvailableCapacity. Thanks, Wangda CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He Assignee: Craig Welch Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | | / | \ | | L1ParentQueue1 | | L1ParentQueue2| | (allowed to use up 80% of its parent)| | (allowed to use 20% in minimum of its parent)| |/ | \ || | L2LeafQueue1 |L2LeafQueue2 | | |(50% of its parent) | (50% of its parent in minimum) | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080402#comment-14080402 ] Hadoop QA commented on YARN-1994: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658793/YARN-1994.14.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/4486//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4486//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4486//console This message is automatically generated. Expose YARN/MR endpoints on multiple interfaces --- Key: YARN-1994 URL: https://issues.apache.org/jira/browse/YARN-1994 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Craig Welch Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch YARN and MapReduce daemons currently do not support specifying a wildcard address for the server endpoints. This prevents the endpoints from being accessible from all interfaces on a multihomed machine. Note that if we do specify INADDR_ANY for any of the options, it will break clients as they will attempt to connect to 0.0.0.0. We need a solution that allows specifying a hostname or IP-address for clients while requesting wildcard bind for the servers. (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-1994: -- Attachment: YARN-1994.15.patch Fix javadoc Expose YARN/MR endpoints on multiple interfaces --- Key: YARN-1994 URL: https://issues.apache.org/jira/browse/YARN-1994 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Craig Welch Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch YARN and MapReduce daemons currently do not support specifying a wildcard address for the server endpoints. This prevents the endpoints from being accessible from all interfaces on a multihomed machine. Note that if we do specify INADDR_ANY for any of the options, it will break clients as they will attempt to connect to 0.0.0.0. We need a solution that allows specifying a hostname or IP-address for clients while requesting wildcard bind for the servers. (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1354) Recover applications upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080415#comment-14080415 ] Junping Du commented on YARN-1354: -- Thanks [~jlowe] for additional explanation which sounds good to me. About changes for writable credential, I think we already have a similar JIRA in YARN-668 which is talking about TokenIdentifier. One way is to wrapper it as PB object (keep writable fields as bytes), could be something like below: {code} required bytes credential; optional new-field 1; optional new-field 2; {code} The other approach is to keep it as writable but do some extra work in readFields() that handle new field could be missing case. Any ideas on preference? However, this sounds a little away from this JIRA, may be YARN-668 is a better place for discussion there? The patch looks good in overall. Some trivial comments: {code} + public abstract RecoveredApplicationsState loadApplicationsState() + throws IOException; + + public abstract void storeApplication(ApplicationId appId, + ContainerManagerApplicationProto p) throws IOException; + + public abstract void finishApplication(ApplicationId appId) + throws IOException; + + public abstract void removeApplication(ApplicationId appId) + throws IOException; + {code} Shall we change the name of finishApplication() to storeFinishedApplication() which sounds more precisely to actual work in store layer? (just like we use storeApplication() instead of startApplication()). Recover applications upon nodemanager restart - Key: YARN-1354 URL: https://issues.apache.org/jira/browse/YARN-1354 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1354-v1.patch, YARN-1354-v2-and-YARN-1987-and-YARN-1362.patch, YARN-1354-v3.patch, YARN-1354-v4.patch, YARN-1354-v5.patch The set of active applications in the nodemanager context need to be recovered for work-preserving nodemanager restart -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2370) Made comment more accurate in AppSchedulingInfo.java
[ https://issues.apache.org/jira/browse/YARN-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080417#comment-14080417 ] Junping Du commented on YARN-2370: -- Nice catch, [~gujilangzi]! +1. Will commit it shortly. Made comment more accurate in AppSchedulingInfo.java Key: YARN-2370 URL: https://issues.apache.org/jira/browse/YARN-2370 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Wenwu Peng Assignee: Wenwu Peng Priority: Trivial Labels: newbie Attachments: YARN-2370.0.patch in method allocateOffSwitch of AppSchedulingInfo.java, only invoke update OffRack request, the comment should be Update cloned OffRack requests for recovery not Update cloned RackLocal and OffRack requests for recover {code} // Update cloned RackLocal and OffRack requests for recovery resourceRequests.add(cloneResourceRequest(offSwitchRequest)); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2370) Made comment more accurate in AppSchedulingInfo.java
[ https://issues.apache.org/jira/browse/YARN-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2370: - Labels: newbie (was: ) Made comment more accurate in AppSchedulingInfo.java Key: YARN-2370 URL: https://issues.apache.org/jira/browse/YARN-2370 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Wenwu Peng Assignee: Wenwu Peng Priority: Trivial Labels: newbie Attachments: YARN-2370.0.patch in method allocateOffSwitch of AppSchedulingInfo.java, only invoke update OffRack request, the comment should be Update cloned OffRack requests for recovery not Update cloned RackLocal and OffRack requests for recover {code} // Update cloned RackLocal and OffRack requests for recovery resourceRequests.add(cloneResourceRequest(offSwitchRequest)); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-2372: -- Attachment: YARN-2372.patch There are Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-2372.patch, YARN-2372.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-2372: -- Attachment: YARN-2372.patch There are Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080431#comment-14080431 ] Hadoop QA commented on YARN-2372: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658825/YARN-2372.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4488//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4488//console This message is automatically generated. There are Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080444#comment-14080444 ] Hadoop QA commented on YARN-2372: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658828/YARN-2372.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4489//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4489//console This message is automatically generated. There are Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080469#comment-14080469 ] Hadoop QA commented on YARN-1994: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658822/YARN-1994.15.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4487//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4487//console This message is automatically generated. Expose YARN/MR endpoints on multiple interfaces --- Key: YARN-1994 URL: https://issues.apache.org/jira/browse/YARN-1994 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Craig Welch Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch YARN and MapReduce daemons currently do not support specifying a wildcard address for the server endpoints. This prevents the endpoints from being accessible from all interfaces on a multihomed machine. Note that if we do specify INADDR_ANY for any of the options, it will break clients as they will attempt to connect to 0.0.0.0. We need a solution that allows specifying a hostname or IP-address for clients while requesting wildcard bind for the servers. (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2283) RM failed to release the AM container
[ https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080480#comment-14080480 ] Nishan Shetty commented on YARN-2283: - I checked this issue, it is not coming in trunk. This issue is reproducible in 2.4.* RM failed to release the AM container - Key: YARN-2283 URL: https://issues.apache.org/jira/browse/YARN-2283 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Environment: NM1: AM running NM2: Map task running mapreduce.map.maxattempts=1 Reporter: Nishan Shetty Priority: Critical During container stability test i faced this problem While job is running map task got killed Observe that eventhough application is FAILED MRAppMaster process is running till timeout because RM did not release the AM container {code} 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1405318134611_0002_01_05 Container Transitioned from RUNNING to COMPLETED 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Completed container: container_1405318134611_0002_01_05 in state: COMPLETED event:FINISHED 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1405318134611_0002 CONTAINERID=container_1405318134611_0002_01_05 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: Finish information of container container_1405318134611_0002_01_05 is written 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: Stored the finish data of container container_1405318134611_0002_01_05 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: Released container container_1405318134611_0002_01_05 of capacity memory:1024, vCores:1 on host HOST-10-18-40-153:45026, which currently has 1 containers, memory:2048, vCores:1 used and memory:6144, vCores:7 available, release resources=true 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default used=memory:2048, vCores:1 numContainers=1 user=testos user-resources=memory:2048, vCores:1 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: completedContainer container=Container: [ContainerId: container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, NodeHttpAddress: HOST-10-18-40-153:45025, Resource: memory:1024, vCores:1, Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=memory:2048, vCores:1, usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster=memory:8192, vCores:8 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 used=memory:2048, vCores:1 cluster=memory:8192, vCores:8 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting completed queue: root.default stats: default: capacity=1.0, absoluteCapacity=1.0, usedResources=memory:2048, vCores:1, usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application attempt appattempt_1405318134611_0002_01 released container container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 #containers=1 available=6144 used=2048 with event: FINISHED 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1405318134611_0002_01 with final state: FINISHING 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1405318134611_0002 with final state: FINISHING 2014-07-14 14:43:34,947 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: NodeDataChanged with state:SyncConnected for
[jira] [Updated] (YARN-19) 4-layer topology (with NodeGroup layer) implementation of Container Assignment and Task Scheduling (for YARN)
[ https://issues.apache.org/jira/browse/YARN-19?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenwu Peng updated YARN-19: --- Attachment: YARN-19-v4.patch Sync up with latest changes on trunk base on Junping's latest patch, should work with YARN-18.v8.1.patch(on Jira YARN-18) 4-layer topology (with NodeGroup layer) implementation of Container Assignment and Task Scheduling (for YARN) - Key: YARN-19 URL: https://issues.apache.org/jira/browse/YARN-19 Project: Hadoop YARN Issue Type: New Feature Reporter: Junping Du Assignee: Junping Du Attachments: HADOOP-8475-ContainerAssignmentTaskScheduling-withNodeGroup.patch, MAPREDUCE-4310-v1.patch, MAPREDUCE-4310.patch, YARN-19-v2.patch, YARN-19-v3-alpha.patch, YARN-19-v4.patch, YARN-19.patch There are several classes in YARN’s container assignment and task scheduling algorithms that related to data locality which were updated to give preference to running a container on the same nodegroup. This section summarized the changes in the patch that provides a new implementation to support a four-layer hierarchy. When the ApplicationMaster makes a resource allocation request to the scheduler of ResourceManager, it will add the node group to the list of attributes in the ResourceRequest. The parameters of the resource request will change from priority, (host, rack, *), memory, #containers to priority, (host, nodegroup, rack, *), memory, #containers. After receiving the ResoureRequest the RM scheduler will assign containers for requests in the sequence of data-local, nodegroup-local, rack-local and off-switch.Then, ApplicationMaster schedules tasks on allocated containers in sequence of data- local, nodegroup-local, rack-local and off-switch. In terms of code changes made to YARN task scheduling, we updated the class ContainerRequestEvent so that applications can requests for containers can include anodegroup. In RM schedulers, FifoScheduler and CapacityScheduler were updated. For the FifoScheduler, the changes were in the method assignContainers. For the Capacity Scheduler the method assignContainersOnNode in the class of LeafQueue was updated. In both changes a new method, assignNodeGroupLocalContainers() was added in between the assignment data-local and rack-local. -- This message was sent by Atlassian JIRA (v6.2#6252)