[jira] [Commented] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image
[ https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339744#comment-14339744 ] Abin Shahab commented on YARN-2981: --- [~raviprak] [~vinodkv] [~vvasudev] [~ywskycn] please review DockerContainerExecutor must support a Cluster-wide default Docker image Key: YARN-2981 URL: https://issues.apache.org/jira/browse/YARN-2981 Project: Hadoop YARN Issue Type: Bug Reporter: Abin Shahab Assignee: Abin Shahab Attachments: YARN-2981.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3262) Surface application outstanding resource requests table
[ https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339753#comment-14339753 ] Hadoop QA commented on YARN-3262: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701265/YARN-3262.4.patch against trunk revision 8ca0d95. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6773//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6773//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6773//console This message is automatically generated. Surface application outstanding resource requests table --- Key: YARN-3262 URL: https://issues.apache.org/jira/browse/YARN-3262 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Jian He Assignee: Jian He Attachments: YARN-3262.1.patch, YARN-3262.2.patch, YARN-3262.3.patch, YARN-3262.4.patch, resource requests.png It would be useful to surface the outstanding resource requests table on the application web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339780#comment-14339780 ] Gururaj Shetty commented on YARN-3168: -- Hi [~aw] All your comments are incorporated. Kindly review the latest patch attached. Convert site documentation from apt to markdown --- Key: YARN-3168 URL: https://issues.apache.org/jira/browse/YARN-3168 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.7.0 Reporter: Allen Wittenauer Assignee: Gururaj Shetty Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch, YARN-3168.20150225.2.patch, YARN-3168.20150227.3.patch YARN analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode
[ https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339796#comment-14339796 ] Varun Saxena commented on YARN-2962: [~kasha] / [~ka...@cloudera.com], for this I WILL assume that state store will be formatted before making the config change ? Backward compatibility for running apps after config change (on RM restart) will be difficult. As we may have to try all the possible appid formats. ZKRMStateStore: Limit the number of znodes under a znode Key: YARN-2962 URL: https://issues.apache.org/jira/browse/YARN-2962 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Varun Saxena Priority: Critical We ran into this issue where we were hitting the default ZK server message size configs, primarily because the message had too many znodes even though they individually they were all small. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3204) Fix new findbug warnings in hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
[ https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339811#comment-14339811 ] Chengbing Liu commented on YARN-3204: - {code} -this.reservedAppSchedulable = (FSAppAttempt) application; + if(application instanceof FSAppAttempt){ + this.reservedAppSchedulable = (FSAppAttempt) application; +} {code} Would it be better if we throw an exception if the condition is not met? {code} SetString planQueues = new HashSetString(); for (FSQueue fsQueue : queueMgr.getQueues()) { String queueName = fsQueue.getName(); - if (allocConf.isReservable(queueName)) { + boolean isReservable = false; + synchronized(this){ + isReservable = allocConf.isReservable(queueName); + } + if (isReservable) { planQueues.add(queueName); } } {code} I think we should synchronize the whole function, since {{allocConf}} may be reloaded during this loop. A dedicated lock is better than {{FairScheduler.this}} to me. Fix new findbug warnings in hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair) -- Key: YARN-3204 URL: https://issues.apache.org/jira/browse/YARN-3204 Project: Hadoop YARN Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: YARN-3204-001.patch, YARN-3204-002.patch Please check following findbug report.. https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339774#comment-14339774 ] Hadoop QA commented on YARN-2820: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701267/YARN-2820.006.patch against trunk revision 8ca0d95. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6775//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6775//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6775//console This message is automatically generated. Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. -- Key: YARN-2820 URL: https://issues.apache.org/jira/browse/YARN-2820 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0, 2.6.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2820.000.patch, YARN-2820.001.patch, YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, YARN-2820.005.patch, YARN-2820.006.patch Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} 2014-10-29 23:49:12,202 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1409135750325_109118_01 at: /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:46,283 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Error updating info for attempt: appattempt_1409135750325_109118_01 java.io.IOException: Unable to close file because the last block does not have enough number of replicas. 2014-10-29 23:49:46,284 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error storing/updating appAttempt: appattempt_1409135750325_109118_01 2014-10-29 23:49:46,916 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at
[jira] [Updated] (YARN-3168) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gururaj Shetty updated YARN-3168: - Attachment: YARN-3168.20150227.3.patch Convert site documentation from apt to markdown --- Key: YARN-3168 URL: https://issues.apache.org/jira/browse/YARN-3168 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.7.0 Reporter: Allen Wittenauer Assignee: Gururaj Shetty Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch, YARN-3168.20150225.2.patch, YARN-3168.20150227.3.patch YARN analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3122: --- Attachment: YARN-3122.005.patch The updated patch looks mostly good to me. I like that we are mimicking top; users will it easier to reason about this. I had a few nit picks that I have put into v5 patch - rename CpuTimeTracker#getCpuUsagePercent and changes to comments. [~adhoot] - can you please review and verify the changes. One last concern - we use 0 for when we cannot calculate the percentage. Shouldn't we use UNAVAILABLE instead? Metrics for container's actual CPU usage Key: YARN-3122 URL: https://issues.apache.org/jira/browse/YARN-3122 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3122.001.patch, YARN-3122.002.patch, YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track CPU usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339830#comment-14339830 ] Hadoop QA commented on YARN-3122: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701280/YARN-3122.005.patch against trunk revision 8ca0d95. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6776//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6776//console This message is automatically generated. Metrics for container's actual CPU usage Key: YARN-3122 URL: https://issues.apache.org/jira/browse/YARN-3122 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3122.001.patch, YARN-3122.002.patch, YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track CPU usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339799#comment-14339799 ] Rohith commented on YARN-3222: -- bq. NODE_USABLE event is sent regardless the reconnected node is healthy or not healthy, which is incorrect, right ? Yes, I think it was assumed like if new node is reconnecting then NM is healthy. It is better to retain the old state i.e UNHEALTHY and in the next 1st heartbeat NodeStatus can be moved from Unhealthy to Running. I see another potential issue that if old node is retaining then RMnode has to be updated {{totalCapability}} with new RMNode resource. But in flow, {{totalCapability}} is not updated. This result , scheduler has updated resources value but RMNode has stale memory. Any client getting RMnode capabilit from RMnode would end up in wrong node resource value. {code} if (noRunningApps) { // some code rmNode.context.getDispatcher().getEventHandler().handle( new NodeRemovedSchedulerEvent(rmNode)); if (rmNode.getHttpPort() == newNode.getHttpPort()) { if (rmNode.getState() != NodeState.UNHEALTHY) { // Only add new node if old state is not UNHEALTHY rmNode.context.getDispatcher().getEventHandler().handle( new NodeAddedSchedulerEvent(newNode)); // NEW NODE CAPABILITY SHOULD BE UPDATED TO OLD NODE } } else { // Reconnected node differs, so replace old node and start new node rmNode.context.getDispatcher().getEventHandler().handle( new RMNodeStartedEvent(newNode.getNodeID(), null, null)); // No need to update totalCapability since old node is replaced with new node. } } {code} RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order --- Key: YARN-3222 URL: https://issues.apache.org/jira/browse/YARN-3222 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Critical Attachments: 0001-YARN-3222.patch When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the scheduler in a events node_added,node_removed or node_resource_update. These events should be notified in an sequential order i.e node_added event and next node_resource_update events. But if the node is reconnected with different http port, the oder of scheduler events are node_removed -- node_resource_update -- node_added which causes scheduler does not find the node and throw NPE and RM exit. Node_Resource_update event should be always should be triggered via RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3122: Attachment: YARN-3122.004.patch Modified CPU usage to be percent per core and the corresponding metric also to be percent per core. Thus 2 cores used up should report as 200% Added doc comments Metrics for container's actual CPU usage Key: YARN-3122 URL: https://issues.apache.org/jira/browse/YARN-3122 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3122.001.patch, YARN-3122.002.patch, YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track CPU usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3189) Yarn application usage command should not give -appstate and -apptype
[ https://issues.apache.org/jira/browse/YARN-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338380#comment-14338380 ] Hadoop QA commented on YARN-3189: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701054/YARN-3189.patch against trunk revision 0d4296f. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6756//console This message is automatically generated. Yarn application usage command should not give -appstate and -apptype - Key: YARN-3189 URL: https://issues.apache.org/jira/browse/YARN-3189 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Anushri Assignee: Anushri Priority: Minor Attachments: YARN-3189.patch Yarn application usage command should not give -appstate and -apptype since these two are applicable to --list command.. *Can somebody please assign this issue to me* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338686#comment-14338686 ] Junping Du commented on YARN-3087: -- Thanks [~gtCarrera9] for updating the patch! Just quickly go though the patch, many changes are replacing Map with HashMap in define, like following: {code} - private MapString, SetString isRelatedToEntities = new HashMap(); - private MapString, SetString relatesToEntities = new HashMap(); + private HashMapString, SetString isRelatedToEntities = new HashMap(); + private HashMapString, SetString relatesToEntities = new HashMap(); {code} Any specific reason for doing this? Typically, we are define things (objects, interfaces) with more generic type (according to Liskov Substitution principle). [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager - Key: YARN-3087 URL: https://issues.apache.org/jira/browse/YARN-3087 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Li Lu Fix For: YARN-2928 Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch This is related to YARN-3030. YARN-3030 sets up a per-node timeline aggregator and the associated REST server. It runs fine as a standalone process, but does not work if it runs inside the node manager due to possible collisions of servlet mapping. Exception: {noformat} org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for v2 not found at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338665#comment-14338665 ] Chang Li commented on YARN-3131: [~jlowe] [~jianhe] [~leftnoteasy] Could any of you help commit this if it looks good for you now? Thanks YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, yarn_3131_v6.patch, yarn_3131_v7.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338687#comment-14338687 ] Naganarasimha G R commented on YARN-3039: - Hi [~djp] bq. Another idea (from Vinod in offline discussion) is to add a blocking call in AMRMClient to get aggregator address directly from RM +1 for this approach. Also if NM uses this new blocking call in AMRMClient to get aggregator address then there might not be any race conditions for posting AM container's life cycle events by NM immediately after creation of appAggregator through Aux service. bq. In addition, if adding a new API in AMRMClient can be accepted, NM will use TimelineClient too so can handle service discovery automatically. Are we just adding a method to get the aggregator address aggregator address ? or what other API's are planned ? bq. NM will notify RM that this new appAggregator is ready for use in next heartbeat to RM (missing in this patch). bq. RM verify the out of service for this app aggregator first and kick off rebind appAggregator to another NM's perNodeAggregatorService in next heartbeat comes. I beleive the idea of using AUX service was to to decouple NM and Timeline service. If NM will notify RM about new appAggregator creation (based on AUX service) then basically NM should be aware of PerNodeAggregatorServer is configured as AUX service, and and if it supports rebinding appAggregator for failure then it should be able to communicate with this Auxservice too, whether would this be clean approach? I also feel we need to support to start per app aggregator only if app requests for it (Zhijie also had mentioned abt this). If not we can make use of one default aggregator for all these kind of apps launched in NM, which is just used to post container entities from different NM's for these apps. Any discussions happened wrt RM having its own Aggregator ? I feel it would be better for RM to have it as it need not depend on any NM's to post any entities [Aggregator wireup] Implement ATS app-appgregator service discovery --- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, YARN-3039-no-test.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338654#comment-14338654 ] Junping Du commented on YARN-3031: -- bq. The AggregateUpTo enum has the tracks to aggregate along, the TimelineEntityType enum has the types of entities that can exist. There may not be aggregations along all entity types. I see. Thanks [~vrushalic] for explanation here. However, CLUSTER seems to be missing here as it is an aggregation of FLOWs. Isn't it? bq. The reasoning behind having two more apis for writing metrics and events in addition to the entity write is that, it would be good (efficient) to have the option to write a single metric or a single event. For example, say a job has many custom metrics and one particular metric is updated extremely frequently but not the others. We may want to write out only that particular metric without having to look through/write all other metrics and other information in that entity. Similarly for events. Perhaps we could do it differently that what is proposed in the patch, but the functionality of writing them individually would help in performance I believe. Agree that we should have separated interfaces to write single data entry quickly and aggregate data entries. Also some aggregator (like RM) won't even call aggregation interface here (according to YARN-3167). IMO, it sounds like two interfaces are good enough so we can merge addEvent() and updateMetrics() into a single data entry writer which can accept more generic type? That will make interface more concisely and hiding more details that could be changed in future. Thoughts? [Storage abstraction] Create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.2.png, Sequence_diagram_write_interaction.png, YARN-3031.01.patch, YARN-3031.02.patch, YARN-3031.03.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy
[ https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338526#comment-14338526 ] Tsuyoshi Ozawa commented on YARN-3217: -- +1 Remove httpclient dependency from hadoop-yarn-server-web-proxy -- Key: YARN-3217 URL: https://issues.apache.org/jira/browse/YARN-3217 Project: Hadoop YARN Issue Type: Task Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Brahma Reddy Battula Fix For: 2.7.0 Attachments: YARN-3217-002.patch, YARN-3217-003.patch, YARN-3217-003.patch, YARN-3217-004.patch, YARN-3217.patch Sub-task of HADOOP-10105. Remove httpclient dependency from WebAppProxyServlet.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy
[ https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3217: - Target Version/s: 2.7.0 Affects Version/s: 2.6.0 Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Remove httpclient dependency from hadoop-yarn-server-web-proxy -- Key: YARN-3217 URL: https://issues.apache.org/jira/browse/YARN-3217 Project: Hadoop YARN Issue Type: Task Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Brahma Reddy Battula Fix For: 2.7.0 Attachments: YARN-3217-002.patch, YARN-3217-003.patch, YARN-3217-003.patch, YARN-3217-004.patch, YARN-3217.patch Sub-task of HADOOP-10105. Remove httpclient dependency from WebAppProxyServlet.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3256) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338556#comment-14338556 ] Hudson commented on YARN-3256: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2066 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2066/]) YARN-3256. TestClientToAMTokens#testClientTokenRace is not running against (devaraj: rev 0d4296f0e0f545267f2e39a868d4ffefc9844db8) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase - Key: YARN-3256 URL: https://issues.apache.org/jira/browse/YARN-3256 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-3256.001.patch The test testClientTokenRace was not using the base class conf causing it to run twice on the same Scheduler configured in the default. All tests deriving from ParameterizedSchedulerTestBase should use the conf created in the base class instead of newing up inside the test and hiding the member. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options
[ https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3255: - Hadoop Flags: Reviewed RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options --- Key: YARN-3255 URL: https://issues.apache.org/jira/browse/YARN-3255 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.5.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: YARN-3255-01.patch, YARN-3255-02.patch Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore generic options, like {{-conf}} and {{-fs}}. It would be good to have the ability to pass generic options in order to specify configuration files or the NameNode location, when the services start through {{main()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options
[ https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3255: - Summary: RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options (was: RM and NM main() should support generic options) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options --- Key: YARN-3255 URL: https://issues.apache.org/jira/browse/YARN-3255 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.5.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: YARN-3255-01.patch, YARN-3255-02.patch Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore generic options, like {{-conf}} and {{-fs}}. It would be good to have the ability to pass generic options in order to specify configuration files or the NameNode location, when the services start through {{main()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3256) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338517#comment-14338517 ] Hudson commented on YARN-3256: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #116 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/116/]) YARN-3256. TestClientToAMTokens#testClientTokenRace is not running against (devaraj: rev 0d4296f0e0f545267f2e39a868d4ffefc9844db8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java * hadoop-yarn-project/CHANGES.txt TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase - Key: YARN-3256 URL: https://issues.apache.org/jira/browse/YARN-3256 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-3256.001.patch The test testClientTokenRace was not using the base class conf causing it to run twice on the same Scheduler configured in the default. All tests deriving from ParameterizedSchedulerTestBase should use the conf created in the base class instead of newing up inside the test and hiding the member. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy
[ https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338565#comment-14338565 ] Hudson commented on YARN-3217: -- FAILURE: Integrated in Hadoop-trunk-Commit #7208 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7208/]) YARN-3217. Remove httpclient dependency from hadoop-yarn-server-web-proxy. Contributed by Brahma Reddy Battula. (ozawa: rev 773b6515ac51af3484824bd6f57685a9726a1e70) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/pom.xml Remove httpclient dependency from hadoop-yarn-server-web-proxy -- Key: YARN-3217 URL: https://issues.apache.org/jira/browse/YARN-3217 Project: Hadoop YARN Issue Type: Task Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Brahma Reddy Battula Fix For: 2.7.0 Attachments: YARN-3217-002.patch, YARN-3217-003.patch, YARN-3217-003.patch, YARN-3217-004.patch, YARN-3217.patch Sub-task of HADOOP-10105. Remove httpclient dependency from WebAppProxyServlet.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params
[ https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338548#comment-14338548 ] Hudson commented on YARN-3239: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2066 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2066/]) YARN-3239. WebAppProxy does not support a final tracking url which has query fragments and params. Contributed by Jian He (jlowe: rev 1a68fc43464d3948418f453bb2f80df7ce773097) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java * hadoop-yarn-project/CHANGES.txt WebAppProxy does not support a final tracking url which has query fragments and params --- Key: YARN-3239 URL: https://issues.apache.org/jira/browse/YARN-3239 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Jian He Fix For: 2.7.0 Attachments: YARN-3239.1.patch Examples of failures: Expected: {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}} Actual: {{http://uihost:8080}} Tried with a minor change to remove the #. Saw a different issue: Expected: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}} Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}} yarn application -status appId returns the expected value correctly. However, invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3264) [Storage implementation] Create a POC only file based storage implementation for ATS writes
[ https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338584#comment-14338584 ] Junping Du commented on YARN-3264: -- +1. This is also useful for test. [Storage implementation] Create a POC only file based storage implementation for ATS writes --- Key: YARN-3264 URL: https://issues.apache.org/jira/browse/YARN-3264 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Vrushali C For the PoC, need to create a backend impl for file based storage of entities -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy
[ https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338591#comment-14338591 ] Brahma Reddy Battula commented on YARN-3217: Thanks a lot [~ozawa] !!! Remove httpclient dependency from hadoop-yarn-server-web-proxy -- Key: YARN-3217 URL: https://issues.apache.org/jira/browse/YARN-3217 Project: Hadoop YARN Issue Type: Task Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Brahma Reddy Battula Fix For: 2.7.0 Attachments: YARN-3217-002.patch, YARN-3217-003.patch, YARN-3217-003.patch, YARN-3217-004.patch, YARN-3217.patch Sub-task of HADOOP-10105. Remove httpclient dependency from WebAppProxyServlet.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338589#comment-14338589 ] Tsuyoshi Ozawa commented on YARN-2820: -- [~zxu] thanks for your updating! The implementation of FSAction looks good to me. I found following points to be fixed: 1. In startInternal, fs.mkdirs can be replaced with mkdirsWithRetries: {code} fs.mkdirs(rmDTSecretManagerRoot); fs.mkdirs(rmAppRoot); fs.mkdirs(amrmTokenSecretManagerRoot); {code} 2. All readFile() should be replaced with readFileWithRetries like writeFileWithRetries. 3. fs.listStatus() should be replaced with listStatusWithRetries. 4. We can use try-with-resources in storeRMDTMasterKeyState to close fsOut. I know it's not related to this patch, but it's better to be fixed here. {code} DataOutputStream fsOut = new DataOutputStream(os); {code} Do you mind updating a patch again? Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. -- Key: YARN-2820 URL: https://issues.apache.org/jira/browse/YARN-2820 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0, 2.6.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2820.000.patch, YARN-2820.001.patch, YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, YARN-2820.005.patch Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} 2014-10-29 23:49:12,202 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1409135750325_109118_01 at: /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:46,283 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Error updating info for attempt: appattempt_1409135750325_109118_01 java.io.IOException: Unable to close file because the last block does not have enough number of replicas. 2014-10-29 23:49:46,284 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error storing/updating appAttempt: appattempt_1409135750325_109118_01 2014-10-29 23:49:46,916 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at
[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params
[ https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338509#comment-14338509 ] Hudson commented on YARN-3239: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #116 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/116/]) YARN-3239. WebAppProxy does not support a final tracking url which has query fragments and params. Contributed by Jian He (jlowe: rev 1a68fc43464d3948418f453bb2f80df7ce773097) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java WebAppProxy does not support a final tracking url which has query fragments and params --- Key: YARN-3239 URL: https://issues.apache.org/jira/browse/YARN-3239 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Jian He Fix For: 2.7.0 Attachments: YARN-3239.1.patch Examples of failures: Expected: {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}} Actual: {{http://uihost:8080}} Tried with a minor change to remove the #. Saw a different issue: Expected: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}} Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}} yarn application -status appId returns the expected value correctly. However, invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3189) Yarn application usage command should not give -appstate and -apptype
[ https://issues.apache.org/jira/browse/YARN-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anushri updated YARN-3189: -- Attachment: YARN-3189.patch Yarn application usage command should not give -appstate and -apptype - Key: YARN-3189 URL: https://issues.apache.org/jira/browse/YARN-3189 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Anushri Assignee: Anushri Priority: Minor Attachments: YARN-3189.patch Yarn application usage command should not give -appstate and -apptype since these two are applicable to --list command.. *Can somebody please assign this issue to me* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params
[ https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338445#comment-14338445 ] Hudson commented on YARN-3239: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #107 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/107/]) YARN-3239. WebAppProxy does not support a final tracking url which has query fragments and params. Contributed by Jian He (jlowe: rev 1a68fc43464d3948418f453bb2f80df7ce773097) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java WebAppProxy does not support a final tracking url which has query fragments and params --- Key: YARN-3239 URL: https://issues.apache.org/jira/browse/YARN-3239 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Jian He Fix For: 2.7.0 Attachments: YARN-3239.1.patch Examples of failures: Expected: {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}} Actual: {{http://uihost:8080}} Tried with a minor change to remove the #. Saw a different issue: Expected: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}} Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}} yarn application -status appId returns the expected value correctly. However, invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3256) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338436#comment-14338436 ] Hudson commented on YARN-3256: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2048 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2048/]) YARN-3256. TestClientToAMTokens#testClientTokenRace is not running against (devaraj: rev 0d4296f0e0f545267f2e39a868d4ffefc9844db8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java * hadoop-yarn-project/CHANGES.txt TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase - Key: YARN-3256 URL: https://issues.apache.org/jira/browse/YARN-3256 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-3256.001.patch The test testClientTokenRace was not using the base class conf causing it to run twice on the same Scheduler configured in the default. All tests deriving from ParameterizedSchedulerTestBase should use the conf created in the base class instead of newing up inside the test and hiding the member. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params
[ https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338428#comment-14338428 ] Hudson commented on YARN-3239: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2048 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2048/]) YARN-3239. WebAppProxy does not support a final tracking url which has query fragments and params. Contributed by Jian He (jlowe: rev 1a68fc43464d3948418f453bb2f80df7ce773097) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java WebAppProxy does not support a final tracking url which has query fragments and params --- Key: YARN-3239 URL: https://issues.apache.org/jira/browse/YARN-3239 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Jian He Fix For: 2.7.0 Attachments: YARN-3239.1.patch Examples of failures: Expected: {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}} Actual: {{http://uihost:8080}} Tried with a minor change to remove the #. Saw a different issue: Expected: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}} Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}} yarn application -status appId returns the expected value correctly. However, invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3256) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338453#comment-14338453 ] Hudson commented on YARN-3256: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #107 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/107/]) YARN-3256. TestClientToAMTokens#testClientTokenRace is not running against (devaraj: rev 0d4296f0e0f545267f2e39a868d4ffefc9844db8) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase - Key: YARN-3256 URL: https://issues.apache.org/jira/browse/YARN-3256 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-3256.001.patch The test testClientTokenRace was not using the base class conf causing it to run twice on the same Scheduler configured in the default. All tests deriving from ParameterizedSchedulerTestBase should use the conf created in the base class instead of newing up inside the test and hiding the member. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3260) NPE if AM attempts to register before RM processes launch event
[ https://issues.apache.org/jira/browse/YARN-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338813#comment-14338813 ] Naganarasimha G R commented on YARN-3260: - Hi [~jlowe], Had a look at the code and some approaches which i can think of are : * ApplicationMasterService.registerAppAttempt(ApplicationAttemptId) to be called in RMAppAttemptImpl.AMLaunchedTransition instead of RMAppAttemptImpl.AttemptStartedTransition and ensuring that ClientToAMToken and registerering with ApplicationMasterService in the same block. By doing this we can throw InvalidApplicationMasterRequestException if AM tries to register to AMS before RMAppAttemptImpl processes RMAppAttempt LAUNCHED event. * Was thinking of having MultiThreadedDispatcher for processing APP and AppAttempt events similar to the one in SystemMetricsPublisher.MultiThreadedDispatcher with additional modification that instead of having {{ (event.hashCode() Integer.MAX_VALUE) % dispatchers.size();}} we can think of doing it based on applicationId. This can speed up the processing of App events ... Was not able to see any other cleaner direct fix for this issue, so was wondering whether we need to start looking at the reason for clusters was running behind on processing AsyncDispatcher events. Were these events were getting delayed to any particular reason? NPE if AM attempts to register before RM processes launch event --- Key: YARN-3260 URL: https://issues.apache.org/jira/browse/YARN-3260 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Naganarasimha G R The RM on one of our clusters was running behind on processing AsyncDispatcher events, and this caused AMs to fail to register due to an NPE. The AM was launched and attempting to register before the RMAppAttemptImpl had processed the LAUNCHED event, and the client to AM token had not been generated yet. The NPE occurred because the ApplicationMasterService tried to encode the missing token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338848#comment-14338848 ] Craig Welch commented on YARN-3251: --- Sorry if that wasn't clear, to reduce risk removed the minor changes in CSQueueUtils CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Craig Welch Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, YARN-3251.2-6-0.3.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options
[ https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338760#comment-14338760 ] Hadoop QA commented on YARN-3255: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700931/YARN-3255-02.patch against trunk revision 773b651. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6757//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6757//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6757//console This message is automatically generated. RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options --- Key: YARN-3255 URL: https://issues.apache.org/jira/browse/YARN-3255 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.5.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: YARN-3255-01.patch, YARN-3255-02.patch Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore generic options, like {{-conf}} and {{-fs}}. It would be good to have the ability to pass generic options in order to specify configuration files or the NameNode location, when the services start through {{main()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
Prakash Ramachandran created YARN-3267: -- Summary: Timelineserver applies the ACL rules after applying the limit on the number of records Key: YARN-3267 URL: https://issues.apache.org/jira/browse/YARN-3267 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Prakash Ramachandran While fetching the entities from timelineserver, the limit is applied on the entities to be fetched from leveldb, the ACL filters are applied after this (TimelineDataManager.java::getEntities). this could mean that even if there are entities available which match the query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes
[ https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338736#comment-14338736 ] Ted Yu commented on YARN-3025: -- Ping [~zjshen] Provide API for retrieving blacklisted nodes Key: YARN-3025 URL: https://issues.apache.org/jira/browse/YARN-3025 Project: Hadoop YARN Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt, yarn-3025-v3.txt We have the following method which updates blacklist: {code} public synchronized void updateBlacklist(ListString blacklistAdditions, ListString blacklistRemovals) { {code} Upon AM failover, there should be an API which returns the blacklisted nodes so that the new AM can make consistent decisions. The new API can be: {code} public synchronized ListString getBlacklistedNodes() {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Attachment: 0006-YARN-2693.patch Attaching a minimal version of Application Priority manager where only configuration support is present. YARN-3250 on longer run will handle admin cli and REST support. Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, 0006-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
[ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338830#comment-14338830 ] Karthik Kambatla commented on YARN-3231: Thanks for reporting and working on this, [~l201514]. The approach looks generally good. Few comments (some nits): # Rename {{updateRunnabilityonRefreshQueues}} to {{updateRunnabilityOnReload}}? And, add a javadoc for when it should be called and what it does. # javadoc for the newly added private method and the significance of the new integer param. # Call the above method from AllocationReloadListner#onReload after all the other queue configs are updated. # The comment here no longer applies. Remove it? {code} // No more than one app per list will be able to be made runnable, so // we can stop looking after we've found that many if (noLongerPendingApps.size() = maxRunnableApps) { break; } {code} # Indentation: {code} updateAppsRunnability(appsNowMaybeRunnable, appsNowMaybeRunnable.size()); {code} # Newly added tests: ## If it is not too much trouble, can we move them to a new test class (TestAppRunnability?) mostly because TestFairScheduler has so many tests already. ## Is it possible to reuse the code between these tests? ## Should we add tests for when the maxRunnableApps for a user or queue is decreased? If you think this might need additional work in the logic as well, I am open to filing a follow up JIRA and addressing it there. FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck -- Key: YARN-3231 URL: https://issues.apache.org/jira/browse/YARN-3231 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch When a queue is piling up with a lot of pending jobs due to the maxRunningApps limit. We want to increase this property on the fly to make some of the pending job active. However, once we increase the limit, all pending jobs were not assigned any resource, and were stuck forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338746#comment-14338746 ] Ted Yu commented on YARN-2777: -- @Varun: {code} 713 out.println(End of LogType:); 714 out.println(fileType); {code} Can you put the above two onto the same line ? Thanks Mark the end of individual log in aggregated log Key: YARN-2777 URL: https://issues.apache.org/jira/browse/YARN-2777 Project: Hadoop YARN Issue Type: Improvement Reporter: Ted Yu Assignee: Varun Saxena Labels: log-aggregation Attachments: YARN-2777.001.patch Below is snippet of aggregated log showing hbase master log: {code} LogType: hbase-hbase-master-ip-172-31-34-167.log LogUploadTime: 29-Oct-2014 22:31:55 LogLength: 24103045 Log Contents: Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 ... at org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) at org.apache.hadoop.hbase.Chore.run(Chore.java:80) at java.lang.Thread.run(Thread.java:745) LogType: hbase-hbase-master-ip-172-31-34-167.out {code} Since logs from various daemons are aggregated in one log file, it would be desirable to mark the end of one log before starting with the next. e.g. with such a line: {code} End of LogType: hbase-hbase-master-ip-172-31-34-167.log {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338782#comment-14338782 ] Hadoop QA commented on YARN-3251: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700977/YARN-3251.2-6-0.2.patch against trunk revision dce8b9c. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6759//console This message is automatically generated. CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Craig Welch Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3268) timelineserver rest api returns html page for 404 when a bad endpoint is used.
Prakash Ramachandran created YARN-3268: -- Summary: timelineserver rest api returns html page for 404 when a bad endpoint is used. Key: YARN-3268 URL: https://issues.apache.org/jira/browse/YARN-3268 Project: Hadoop YARN Issue Type: Bug Reporter: Prakash Ramachandran the timelineserver returns a 404 page instead of giving a REST response. this interferes with the end user pages which try to retrieve data using REST api. this could be due to lack of a 404 handler ex. http://timelineserver:8188/badnamespace/v1/timeline/someentity -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3251: -- Attachment: YARN-3251.2-6-0.3.patch Removing the csqueueutils CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Craig Welch Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, YARN-3251.2-6-0.3.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338860#comment-14338860 ] Wangda Tan commented on YARN-3251: -- if opposite opinions - if no opposite opinions CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Craig Welch Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, YARN-3251.2-6-0.3.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338858#comment-14338858 ] Wangda Tan commented on YARN-3251: -- LGTM +1, I will commit the patch to branch-2.6 this afternoon if opposite opinions. Thanks! CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Craig Welch Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, YARN-3251.2-6-0.3.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3251: -- Attachment: YARN-3251.2-6-0.4.patch Minor, switch to Internal, seems to be more common in the codebase CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Craig Welch Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338945#comment-14338945 ] Vinod Kumar Vavilapalli commented on YARN-3087: --- bq. The current solution is a workaround for JAXB resolver, which cannot return an interface (Map) type. This work around is consistent with the v1 version of our ATS object model (in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/, such as TimelineEvent.java, TimelineEntity.java), fixed in YARN-2804. If it's needed, maybe we'd like to keep the declarations to be Map, and do the cast in the jaxb getter? Casting it everytime will be expensive. Let's keep it as the patch currently does - we are not exposing the fact that it is a HashMap to external world, only to Jersey. [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager - Key: YARN-3087 URL: https://issues.apache.org/jira/browse/YARN-3087 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Li Lu Fix For: YARN-2928 Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch This is related to YARN-3030. YARN-3030 sets up a per-node timeline aggregator and the associated REST server. It runs fine as a standalone process, but does not work if it runs inside the node manager due to possible collisions of servlet mapping. Exception: {noformat} org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for v2 not found at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338968#comment-14338968 ] Varun Saxena commented on YARN-2777: [~tedyu], made the change. Kindly review Mark the end of individual log in aggregated log Key: YARN-2777 URL: https://issues.apache.org/jira/browse/YARN-2777 Project: Hadoop YARN Issue Type: Improvement Reporter: Ted Yu Assignee: Varun Saxena Labels: log-aggregation Attachments: YARN-2777.001.patch, YARN-2777.002.patch Below is snippet of aggregated log showing hbase master log: {code} LogType: hbase-hbase-master-ip-172-31-34-167.log LogUploadTime: 29-Oct-2014 22:31:55 LogLength: 24103045 Log Contents: Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 ... at org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) at org.apache.hadoop.hbase.Chore.run(Chore.java:80) at java.lang.Thread.run(Thread.java:745) LogType: hbase-hbase-master-ip-172-31-34-167.out {code} Since logs from various daemons are aggregated in one log file, it would be desirable to mark the end of one log before starting with the next. e.g. with such a line: {code} End of LogType: hbase-hbase-master-ip-172-31-34-167.log {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path
Xuan Gong created YARN-3269: --- Summary: Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path Key: YARN-3269 URL: https://issues.apache.org/jira/browse/YARN-3269 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Log aggregation currently is always relative to the default file system, not an arbitrary file system identified by URI. So we can't put an arbitrary fully-qualified URI into yarn.nodemanager.remote-app-log-dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339015#comment-14339015 ] Junping Du commented on YARN-3087: -- Agree with Vinod that if this is required from JAXB API then we don't have to cast it. Thanks [~gtCarrera9] for explanation on this! Patch looks good to me in overall. One comments is: we have many similar logic to cast a MAP to HashMap like below: {code} -this.relatedEntities = relatedEntities; +if (relatedEntities != null !(relatedEntities instanceof HashMap)) { + this.relatedEntities = new HashMapString, SetString(relatedEntities); +} else { + this.relatedEntities = (HashMapString, SetString) relatedEntities; +} {code} May be we can use Generics to consolidate them. [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager - Key: YARN-3087 URL: https://issues.apache.org/jira/browse/YARN-3087 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Li Lu Fix For: YARN-2928 Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch This is related to YARN-3030. YARN-3030 sets up a per-node timeline aggregator and the associated REST server. It runs fine as a standalone process, but does not work if it runs inside the node manager due to possible collisions of servlet mapping. Exception: {noformat} org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for v2 not found at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes
[ https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339012#comment-14339012 ] Vinod Kumar Vavilapalli commented on YARN-3025: --- Coming in very late, apologies. Some comments: - Echoing Bikas's first comment: Today the AMs are expected to maintain their own scheduling state. With this you are changing that - part of the scheduling state will be remembered but the remaining isn't. We should clearly draw a line somewhere, what is it? - [~zjshen] did a very good job of dividing the persistence concerns, but what is the guarantee that is given to the app writers? I'll return the list of blacklisted nodes whenever I can, but shoot I died, so I can't help you much is not going to cut it. If we want reliable notifications, we should build a protocol between AM and RM about the persistence of the blacklisted node list - too much of a complexity if you ask me. Why not leave it to the apps? - The blacklist information is per application-attempt, and scheduler will forget previous application-attempts today. So as I understand it, the patch doesn't work. Provide API for retrieving blacklisted nodes Key: YARN-3025 URL: https://issues.apache.org/jira/browse/YARN-3025 Project: Hadoop YARN Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt, yarn-3025-v3.txt We have the following method which updates blacklist: {code} public synchronized void updateBlacklist(ListString blacklistAdditions, ListString blacklistRemovals) { {code} Upon AM failover, there should be an API which returns the blacklisted nodes so that the new AM can make consistent decisions. The new API can be: {code} public synchronized ListString getBlacklistedNodes() {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339060#comment-14339060 ] Zhijie Shen commented on YARN-3087: --- Thanks for the patch, Li! Some detailed comments about the patch: 1. HierarchicalTimelineEntity is abstract, maybe not necessary. {code} // required by JAXB HierarchicalTimelineEntity() { super(); } {code} 2. Can we mark JAXB methods \@Private? 3. I think rootUnwrapping should be true to be consistent with YarnJacksonJaxbJsonProvider. It seems JAXBContextResolver is never used (I think the reason is that we are using YarnJacksonJaxbJsonProvider), maybe we want to remove the class. {code} this.context = new JSONJAXBContext(JSONConfiguration.natural().rootUnwrapping(false) .build(), cTypes) {code} 4. Does it mean if we want to add a filter, we need to hard code here? So hadoop.http.filter.initializers no longer work? Is it possible to provide some similar mechanism to replace what hadoop.http.filter.initializers does if it doesn't work. {code} 121 // TODO: replace this by an authentification filter in future. 122 HashMapString, String options = new HashMapString, String(); 123 String username = conf.get(HADOOP_HTTP_STATIC_USER, 124 DEFAULT_HADOOP_HTTP_STATIC_USER); 125 options.put(HADOOP_HTTP_STATIC_USER, username); 126 HttpServer2.defineFilter(timelineRestServer.getWebAppContext(), 127 static_user_filter_timeline, 128 StaticUserWebFilter.StaticUserFilter.class.getName(), 129 options, new String[] {/*}); {code} [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager - Key: YARN-3087 URL: https://issues.apache.org/jira/browse/YARN-3087 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Li Lu Fix For: YARN-2928 Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch This is related to YARN-3030. YARN-3030 sets up a per-node timeline aggregator and the associated REST server. It runs fine as a standalone process, but does not work if it runs inside the node manager due to possible collisions of servlet mapping. Exception: {noformat} org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for v2 not found at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3268) timelineserver rest api returns html page for 404 when a bad endpoint is used.
[ https://issues.apache.org/jira/browse/YARN-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li reassigned YARN-3268: -- Assignee: Chang Li timelineserver rest api returns html page for 404 when a bad endpoint is used. -- Key: YARN-3268 URL: https://issues.apache.org/jira/browse/YARN-3268 Project: Hadoop YARN Issue Type: Bug Reporter: Prakash Ramachandran Assignee: Chang Li the timelineserver returns a 404 page instead of giving a REST response. this interferes with the end user pages which try to retrieve data using REST api. this could be due to lack of a 404 handler ex. http://timelineserver:8188/badnamespace/v1/timeline/someentity -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li reassigned YARN-3267: -- Assignee: Chang Li Timelineserver applies the ACL rules after applying the limit on the number of records -- Key: YARN-3267 URL: https://issues.apache.org/jira/browse/YARN-3267 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Prakash Ramachandran Assignee: Chang Li While fetching the entities from timelineserver, the limit is applied on the entities to be fetched from leveldb, the ACL filters are applied after this (TimelineDataManager.java::getEntities). this could mean that even if there are entities available which match the query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
[ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-3231: -- Attachment: YARN-3231.v3.patch FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck -- Key: YARN-3231 URL: https://issues.apache.org/jira/browse/YARN-3231 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, YARN-3231.v3.patch When a queue is piling up with a lot of pending jobs due to the maxRunningApps limit. We want to increase this property on the fly to make some of the pending job active. However, once we increase the limit, all pending jobs were not assigned any resource, and were stuck forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3268) timelineserver rest api returns html page for 404 when a bad endpoint is used.
[ https://issues.apache.org/jira/browse/YARN-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3268: --- Assignee: (was: Chang Li) timelineserver rest api returns html page for 404 when a bad endpoint is used. -- Key: YARN-3268 URL: https://issues.apache.org/jira/browse/YARN-3268 Project: Hadoop YARN Issue Type: Bug Reporter: Prakash Ramachandran the timelineserver returns a 404 page instead of giving a REST response. this interferes with the end user pages which try to retrieve data using REST api. this could be due to lack of a 404 handler ex. http://timelineserver:8188/badnamespace/v1/timeline/someentity -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path
[ https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339084#comment-14339084 ] Hadoop QA commented on YARN-3269: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701154/YARN-3269.1.patch against trunk revision dce8b9c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6762//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6762//console This message is automatically generated. Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path --- Key: YARN-3269 URL: https://issues.apache.org/jira/browse/YARN-3269 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3269.1.patch Log aggregation currently is always relative to the default file system, not an arbitrary file system identified by URI. So we can't put an arbitrary fully-qualified URI into yarn.nodemanager.remote-app-log-dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338950#comment-14338950 ] Vinod Kumar Vavilapalli commented on YARN-3248: --- YARN-3025 is related to my first comment above. Display count of nodes blacklisted by apps in the web UI Key: YARN-3248 URL: https://issues.apache.org/jira/browse/YARN-3248 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screenshot.jpg, apache-yarn-3248.0.patch It would be really useful when debugging app performance and failure issues to get a count of the nodes blacklisted by individual apps displayed in the web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2777: --- Attachment: YARN-2777.002.patch Mark the end of individual log in aggregated log Key: YARN-2777 URL: https://issues.apache.org/jira/browse/YARN-2777 Project: Hadoop YARN Issue Type: Improvement Reporter: Ted Yu Assignee: Varun Saxena Labels: log-aggregation Attachments: YARN-2777.001.patch, YARN-2777.002.patch Below is snippet of aggregated log showing hbase master log: {code} LogType: hbase-hbase-master-ip-172-31-34-167.log LogUploadTime: 29-Oct-2014 22:31:55 LogLength: 24103045 Log Contents: Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 ... at org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) at org.apache.hadoop.hbase.Chore.run(Chore.java:80) at java.lang.Thread.run(Thread.java:745) LogType: hbase-hbase-master-ip-172-31-34-167.out {code} Since logs from various daemons are aggregated in one log file, it would be desirable to mark the end of one log before starting with the next. e.g. with such a line: {code} End of LogType: hbase-hbase-master-ip-172-31-34-167.log {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338975#comment-14338975 ] Ted Yu commented on YARN-2777: -- lgtm Mark the end of individual log in aggregated log Key: YARN-2777 URL: https://issues.apache.org/jira/browse/YARN-2777 Project: Hadoop YARN Issue Type: Improvement Reporter: Ted Yu Assignee: Varun Saxena Labels: log-aggregation Attachments: YARN-2777.001.patch, YARN-2777.002.patch Below is snippet of aggregated log showing hbase master log: {code} LogType: hbase-hbase-master-ip-172-31-34-167.log LogUploadTime: 29-Oct-2014 22:31:55 LogLength: 24103045 Log Contents: Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 ... at org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) at org.apache.hadoop.hbase.Chore.run(Chore.java:80) at java.lang.Thread.run(Thread.java:745) LogType: hbase-hbase-master-ip-172-31-34-167.out {code} Since logs from various daemons are aggregated in one log file, it would be desirable to mark the end of one log before starting with the next. e.g. with such a line: {code} End of LogType: hbase-hbase-master-ip-172-31-34-167.log {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3251: -- Attachment: YARN-3251.2.patch Attaching an analogue of the most recent patch against trunk. I do not believe that we will be committing this at this point as [~leftnoteasy] is working on a more significant change which will remove the need for it, but I wanted to make it available just in case. For clarity, patch against trunk is YARN-3251.2.patch and the patch to commit against 2.6 is YARN-3251.2-6-0.4.patch. CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Craig Welch Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338967#comment-14338967 ] Varun Saxena commented on YARN-2777: [~tedyu], made the change. Kindly review Mark the end of individual log in aggregated log Key: YARN-2777 URL: https://issues.apache.org/jira/browse/YARN-2777 Project: Hadoop YARN Issue Type: Improvement Reporter: Ted Yu Assignee: Varun Saxena Labels: log-aggregation Attachments: YARN-2777.001.patch, YARN-2777.002.patch Below is snippet of aggregated log showing hbase master log: {code} LogType: hbase-hbase-master-ip-172-31-34-167.log LogUploadTime: 29-Oct-2014 22:31:55 LogLength: 24103045 Log Contents: Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 ... at org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) at org.apache.hadoop.hbase.Chore.run(Chore.java:80) at java.lang.Thread.run(Thread.java:745) LogType: hbase-hbase-master-ip-172-31-34-167.out {code} Since logs from various daemons are aggregated in one log file, it would be desirable to mark the end of one log before starting with the next. e.g. with such a line: {code} End of LogType: hbase-hbase-master-ip-172-31-34-167.log {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338867#comment-14338867 ] Li Lu commented on YARN-3087: - Hi [~djp], thanks for the feedback! I totally understand your concern here. The current solution is a workaround for JAXB resolver, which cannot return an interface (Map) type. This work around is consistent with the v1 version of our ATS object model (in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/, such as TimelineEvent.java, TimelineEntity.java), fixed in YARN-2804. If it's needed, maybe we'd like to keep the declarations to be Map, and do the cast in the jaxb getter? [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager - Key: YARN-3087 URL: https://issues.apache.org/jira/browse/YARN-3087 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Li Lu Fix For: YARN-2928 Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch This is related to YARN-3030. YARN-3030 sets up a per-node timeline aggregator and the associated REST server. It runs fine as a standalone process, but does not work if it runs inside the node manager due to possible collisions of servlet mapping. Exception: {noformat} org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for v2 not found at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338914#comment-14338914 ] Hadoop QA commented on YARN-2693: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701122/0006-YARN-2693.patch against trunk revision dce8b9c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1152 javac compiler warnings (more than the trunk's current 1151 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 7 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6758//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6758//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6758//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6758//console This message is automatically generated. Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, 0006-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path
[ https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3269: Attachment: YARN-3269.1.patch Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path --- Key: YARN-3269 URL: https://issues.apache.org/jira/browse/YARN-3269 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3269.1.patch Log aggregation currently is always relative to the default file system, not an arbitrary file system identified by URI. So we can't put an arbitrary fully-qualified URI into yarn.nodemanager.remote-app-log-dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext
[ https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Agarwal updated YARN-3270: Attachment: YARN-3270.patch Attached the patch. node label expression not getting set in ApplicationSubmissionContext - Key: YARN-3270 URL: https://issues.apache.org/jira/browse/YARN-3270 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Rohit Agarwal Priority: Minor Attachments: YARN-3270.patch One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not setting the {{appLabelExpression}} passed to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339140#comment-14339140 ] Hadoop QA commented on YARN-3251: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701150/YARN-3251.2.patch against trunk revision dce8b9c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6760//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6760//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6760//console This message is automatically generated. CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Craig Welch Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext
Rohit Agarwal created YARN-3270: --- Summary: node label expression not getting set in ApplicationSubmissionContext Key: YARN-3270 URL: https://issues.apache.org/jira/browse/YARN-3270 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Rohit Agarwal Priority: Minor One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not setting the {{appLabelExpression}} passed to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3122: Attachment: YARN-3122.003.patch Addressed feedback Metrics for container's actual CPU usage Key: YARN-3122 URL: https://issues.apache.org/jira/browse/YARN-3122 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3122.001.patch, YARN-3122.002.patch, YARN-3122.003.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track CPU usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path
[ https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339170#comment-14339170 ] Vinod Kumar Vavilapalli commented on YARN-3269: --- Can you modify one of the tests to use a fully qualified patch, in order to 'prove' that this patch works? Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path --- Key: YARN-3269 URL: https://issues.apache.org/jira/browse/YARN-3269 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3269.1.patch Log aggregation currently is always relative to the default file system, not an arbitrary file system identified by URI. So we can't put an arbitrary fully-qualified URI into yarn.nodemanager.remote-app-log-dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext
[ https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339196#comment-14339196 ] Hadoop QA commented on YARN-3270: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701163/YARN-3270.patch against trunk revision 2214dab. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6764//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6764//console This message is automatically generated. node label expression not getting set in ApplicationSubmissionContext - Key: YARN-3270 URL: https://issues.apache.org/jira/browse/YARN-3270 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Rohit Agarwal Priority: Minor Attachments: YARN-3270.patch One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not setting the {{appLabelExpression}} passed to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
[ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339201#comment-14339201 ] Hadoop QA commented on YARN-3231: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701160/YARN-3231.v3.patch against trunk revision f0c980a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6763//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6763//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6763//console This message is automatically generated. FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck -- Key: YARN-3231 URL: https://issues.apache.org/jira/browse/YARN-3231 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, YARN-3231.v3.patch When a queue is piling up with a lot of pending jobs due to the maxRunningApps limit. We want to increase this property on the fly to make some of the pending job active. However, once we increase the limit, all pending jobs were not assigned any resource, and were stuck forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
[ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339214#comment-14339214 ] Siqi Li commented on YARN-3231: --- Hi [~ka...@cloudera.com], thanks for your feedback. I have updated a new patch which addressed all your comment except 6.1 and 6.3. For 6.1, it seems that there are other test cases that also might be qualified for moving to TestAppRunnability, it would be good to do a larger refactor of TestFairScheduler into TestAppRunnability. For 6.3, I don't think there is a problem with maxRunnableApps for a user or queue is decreased. FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck -- Key: YARN-3231 URL: https://issues.apache.org/jira/browse/YARN-3231 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, YARN-3231.v3.patch When a queue is piling up with a lot of pending jobs due to the maxRunningApps limit. We want to increase this property on the fly to make some of the pending job active. However, once we increase the limit, all pending jobs were not assigned any resource, and were stuck forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3087: Attachment: YARN-3087-022615.patch Updated my patch according to [~zjshen]'s comments. Addressed points 1-3. Point 4 is caused by a limitation of HttpServer2 for now. We may want to decide if we want to fix that on our side, or add support to this use case on the HttpServer2 side. For now, I think we can temporarily use our current way to make the prototype work. [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager - Key: YARN-3087 URL: https://issues.apache.org/jira/browse/YARN-3087 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Li Lu Fix For: YARN-2928 Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch, YARN-3087-022615.patch This is related to YARN-3030. YARN-3030 sets up a per-node timeline aggregator and the associated REST server. It runs fine as a standalone process, but does not work if it runs inside the node manager due to possible collisions of servlet mapping. Exception: {noformat} org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for v2 not found at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339233#comment-14339233 ] Li Lu commented on YARN-3087: - Hi [~djp], thanks for the comments! I agree that we may want to use generic types to solve the problem. Similar code also appear in v1 timeline object model, so maybe we'd like to fix both together? If that's the case we may open a separate JIRA to trace this. [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager - Key: YARN-3087 URL: https://issues.apache.org/jira/browse/YARN-3087 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Li Lu Fix For: YARN-2928 Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch, YARN-3087-022615.patch This is related to YARN-3030. YARN-3030 sets up a per-node timeline aggregator and the associated REST server. It runs fine as a standalone process, but does not work if it runs inside the node manager due to possible collisions of servlet mapping. Exception: {noformat} org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for v2 not found at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339237#comment-14339237 ] Hadoop QA commented on YARN-3122: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701167/YARN-3122.003.patch against trunk revision 2214dab. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6765//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6765//console This message is automatically generated. Metrics for container's actual CPU usage Key: YARN-3122 URL: https://issues.apache.org/jira/browse/YARN-3122 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3122.001.patch, YARN-3122.002.patch, YARN-3122.003.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track CPU usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339248#comment-14339248 ] Hadoop QA commented on YARN-3087: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701178/YARN-3087-022615.patch against trunk revision c6d5b37. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6766//console This message is automatically generated. [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager - Key: YARN-3087 URL: https://issues.apache.org/jira/browse/YARN-3087 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Li Lu Fix For: YARN-2928 Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch, YARN-3087-022615.patch This is related to YARN-3030. YARN-3030 sets up a per-node timeline aggregator and the associated REST server. It runs fine as a standalone process, but does not work if it runs inside the node manager due to possible collisions of servlet mapping. Exception: {noformat} org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for v2 not found at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3125) [Event producers] Change distributed shell to use new timeline service
[ https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339246#comment-14339246 ] Zhijie Shen commented on YARN-3125: --- Thanks for the patch, Junping! It looks good to me. Per offline discussion, we should add an integration test in TestDistributedShell. [Event producers] Change distributed shell to use new timeline service -- Key: YARN-3125 URL: https://issues.apache.org/jira/browse/YARN-3125 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Junping Du Attachments: YARN-3125.patch, YARN-3125v2.patch, YARN-3125v3.patch We can start with changing distributed shell to use new timeline service once the framework is completed, in which way we can quickly verify the next gen is working fine end-to-end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3266) RMContext inactiveNodes should have NodeId as map key
[ https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated YARN-3266: Attachment: YARN-3266.01.patch RMContext inactiveNodes should have NodeId as map key - Key: YARN-3266 URL: https://issues.apache.org/jira/browse/YARN-3266 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Rohith Attachments: YARN-3266.01.patch Under the default NM port configuration, which is 0, we have observed in the current version, lost nodes count is greater than the length of the lost node list. This will happen when we consecutively restart the same NM twice: * NM started at port 10001 * NM restarted at port 10002 * NM restarted at port 10003 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} has 1 element * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} still has 1 element Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If this will break the current API, then the key string should include the NM's port as well. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338153#comment-14338153 ] Varun Saxena commented on YARN-3197: I guess you mean no need for printing both. Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Attachments: YARN-3197.001.patch, YARN-3197.002.patch, YARN-3197.003.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key
[ https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338161#comment-14338161 ] Chengbing Liu commented on YARN-3266: - uploaded a patch, taking over... RMContext inactiveNodes should have NodeId as map key - Key: YARN-3266 URL: https://issues.apache.org/jira/browse/YARN-3266 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-3266.01.patch Under the default NM port configuration, which is 0, we have observed in the current version, lost nodes count is greater than the length of the lost node list. This will happen when we consecutively restart the same NM twice: * NM started at port 10001 * NM restarted at port 10002 * NM restarted at port 10003 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} has 1 element * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} still has 1 element Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If this will break the current API, then the key string should include the NM's port as well. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338170#comment-14338170 ] Hadoop QA commented on YARN-2820: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700999/YARN-2820.005.patch against trunk revision 71385f9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 6 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6753//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6753//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6753//console This message is automatically generated. Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. -- Key: YARN-2820 URL: https://issues.apache.org/jira/browse/YARN-2820 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0, 2.6.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2820.000.patch, YARN-2820.001.patch, YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, YARN-2820.005.patch Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} 2014-10-29 23:49:12,202 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1409135750325_109118_01 at: /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:46,283 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Error updating info for attempt: appattempt_1409135750325_109118_01 java.io.IOException: Unable to close file because the last block does not have enough number of replicas. 2014-10-29 23:49:46,284 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error storing/updating appAttempt: appattempt_1409135750325_109118_01 2014-10-29 23:49:46,916 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at
[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338154#comment-14338154 ] Varun Saxena commented on YARN-3197: Will change it back then. AppId was added to aid in quicker debugging Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Attachments: YARN-3197.001.patch, YARN-3197.002.patch, YARN-3197.003.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3266) RMContext inactiveNodes should have NodeId as map key
[ https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu reassigned YARN-3266: --- Assignee: Chengbing Liu (was: Rohith) RMContext inactiveNodes should have NodeId as map key - Key: YARN-3266 URL: https://issues.apache.org/jira/browse/YARN-3266 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-3266.01.patch Under the default NM port configuration, which is 0, we have observed in the current version, lost nodes count is greater than the length of the lost node list. This will happen when we consecutively restart the same NM twice: * NM started at port 10001 * NM restarted at port 10002 * NM restarted at port 10003 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} has 1 element * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} still has 1 element Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If this will break the current API, then the key string should include the NM's port as well. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3256) TestClientToAMToken#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338174#comment-14338174 ] Devaraj K commented on YARN-3256: - +1, lgtm, will commit it shortly. TestClientToAMToken#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase Key: YARN-3256 URL: https://issues.apache.org/jira/browse/YARN-3256 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3256.001.patch The test testClientTokenRace was not using the base class conf causing it to run twice on the same Scheduler configured in the default. All tests deriving from ParameterizedSchedulerTestBase should use the conf created in the base class instead of newing up inside the test and hiding the member. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3256) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-3256: Summary: TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase (was: TestClientToAMToken#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase - Key: YARN-3256 URL: https://issues.apache.org/jira/browse/YARN-3256 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3256.001.patch The test testClientTokenRace was not using the base class conf causing it to run twice on the same Scheduler configured in the default. All tests deriving from ParameterizedSchedulerTestBase should use the conf created in the base class instead of newing up inside the test and hiding the member. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3256) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338195#comment-14338195 ] Hudson commented on YARN-3256: -- FAILURE: Integrated in Hadoop-trunk-Commit #7207 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7207/]) YARN-3256. TestClientToAMTokens#testClientTokenRace is not running against (devaraj: rev 0d4296f0e0f545267f2e39a868d4ffefc9844db8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java * hadoop-yarn-project/CHANGES.txt TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase - Key: YARN-3256 URL: https://issues.apache.org/jira/browse/YARN-3256 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-3256.001.patch The test testClientTokenRace was not using the base class conf causing it to run twice on the same Scheduler configured in the default. All tests deriving from ParameterizedSchedulerTestBase should use the conf created in the base class instead of newing up inside the test and hiding the member. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key
[ https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338232#comment-14338232 ] Hadoop QA commented on YARN-3266: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701009/YARN-3266.01.patch against trunk revision 166eecf. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6754//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6754//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6754//console This message is automatically generated. RMContext inactiveNodes should have NodeId as map key - Key: YARN-3266 URL: https://issues.apache.org/jira/browse/YARN-3266 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-3266.01.patch, YARN-3266.02.patch Under the default NM port configuration, which is 0, we have observed in the current version, lost nodes count is greater than the length of the lost node list. This will happen when we consecutively restart the same NM twice: * NM started at port 10001 * NM restarted at port 10002 * NM restarted at port 10003 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} has 1 element * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} still has 1 element Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If this will break the current API, then the key string should include the NM's port as well. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338243#comment-14338243 ] Ryu Kobayashi commented on YARN-3249: - [~vinodkv] I see. Okay, I'll try to fix code it. Add the kill application to the Resource Manager Web UI --- Key: YARN-3249 URL: https://issues.apache.org/jira/browse/YARN-3249 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0, 2.7.0 Reporter: Ryu Kobayashi Assignee: Ryu Kobayashi Priority: Minor Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, YARN-3249.patch, killapp-failed.log, killapp-failed2.log, screenshot.png It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3266) RMContext inactiveNodes should have NodeId as map key
[ https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated YARN-3266: Attachment: YARN-3266.02.patch Added a test in {{TestRMNodeTransitions}} to prevent regression. RMContext inactiveNodes should have NodeId as map key - Key: YARN-3266 URL: https://issues.apache.org/jira/browse/YARN-3266 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-3266.01.patch, YARN-3266.02.patch Under the default NM port configuration, which is 0, we have observed in the current version, lost nodes count is greater than the length of the lost node list. This will happen when we consecutively restart the same NM twice: * NM started at port 10001 * NM restarted at port 10002 * NM restarted at port 10003 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} has 1 element * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} still has 1 element Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If this will break the current API, then the key string should include the NM's port as well. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params
[ https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338256#comment-14338256 ] Hudson commented on YARN-3239: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #116 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/116/]) YARN-3239. WebAppProxy does not support a final tracking url which has query fragments and params. Contributed by Jian He (jlowe: rev 1a68fc43464d3948418f453bb2f80df7ce773097) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java * hadoop-yarn-project/CHANGES.txt WebAppProxy does not support a final tracking url which has query fragments and params --- Key: YARN-3239 URL: https://issues.apache.org/jira/browse/YARN-3239 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Jian He Fix For: 2.7.0 Attachments: YARN-3239.1.patch Examples of failures: Expected: {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}} Actual: {{http://uihost:8080}} Tried with a minor change to remove the #. Saw a different issue: Expected: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}} Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}} yarn application -status appId returns the expected value correctly. However, invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3256) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338264#comment-14338264 ] Hudson commented on YARN-3256: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #116 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/116/]) YARN-3256. TestClientToAMTokens#testClientTokenRace is not running against (devaraj: rev 0d4296f0e0f545267f2e39a868d4ffefc9844db8) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase - Key: YARN-3256 URL: https://issues.apache.org/jira/browse/YARN-3256 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-3256.001.patch The test testClientTokenRace was not using the base class conf causing it to run twice on the same Scheduler configured in the default. All tests deriving from ParameterizedSchedulerTestBase should use the conf created in the base class instead of newing up inside the test and hiding the member. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3266) RMContext inactiveNodes should have NodeId as map key
[ https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-3266: Assignee: Rohith RMContext inactiveNodes should have NodeId as map key - Key: YARN-3266 URL: https://issues.apache.org/jira/browse/YARN-3266 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Rohith Under the default NM port configuration, which is 0, we have observed in the current version, lost nodes count is greater than the length of the lost node list. This will happen when we consecutively restart the same NM twice: * NM started at port 10001 * NM restarted at port 10002 * NM restarted at port 10003 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} has 1 element * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} still has 1 element Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If this will break the current API, then the key string should include the NM's port as well. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key
[ https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338106#comment-14338106 ] Rohith commented on YARN-3266: -- bq. the key string should include the NM's port as well This make sense to me instead of changing API. Taking over now, feel free to assign yourself if you have already started working on this. RMContext inactiveNodes should have NodeId as map key - Key: YARN-3266 URL: https://issues.apache.org/jira/browse/YARN-3266 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Under the default NM port configuration, which is 0, we have observed in the current version, lost nodes count is greater than the length of the lost node list. This will happen when we consecutively restart the same NM twice: * NM started at port 10001 * NM restarted at port 10002 * NM restarted at port 10003 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} has 1 element * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} still has 1 element Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If this will break the current API, then the key string should include the NM's port as well. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2820: Attachment: YARN-2820.005.patch Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. -- Key: YARN-2820 URL: https://issues.apache.org/jira/browse/YARN-2820 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0, 2.6.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2820.000.patch, YARN-2820.001.patch, YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, YARN-2820.005.patch Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} 2014-10-29 23:49:12,202 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1409135750325_109118_01 at: /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:46,283 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Error updating info for attempt: appattempt_1409135750325_109118_01 java.io.IOException: Unable to close file because the last block does not have enough number of replicas. 2014-10-29 23:49:46,284 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error storing/updating appAttempt: appattempt_1409135750325_109118_01 2014-10-29 23:49:46,916 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} As discussed at YARN-1778, TestFSRMStateStore failure is also due to IOException in storeApplicationStateInternal. Stack trace from TestFSRMStateStore failure: {code} 2015-02-03 00:09:19,092 INFO [Thread-110] recovery.TestFSRMStateStore (TestFSRMStateStore.java:run(285)) - testFSRMStateStoreClientRetry: Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): NameNode still not started at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkNNStartup(NameNodeRpcServer.java:1876) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:971) at
[jira] [Created] (YARN-3266) RMContext inactiveNodes should have NodeId as map key
Chengbing Liu created YARN-3266: --- Summary: RMContext inactiveNodes should have NodeId as map key Key: YARN-3266 URL: https://issues.apache.org/jira/browse/YARN-3266 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Under the default NM port configuration, which is 0, we have observed in the current version, lost nodes count is greater than the length of the lost node list. This will happen when we consecutively restart the same NM twice: * NM started at port 10001 * NM restarted at port 10002 * NM restarted at port 10003 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} has 1 element * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} still has 1 element Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If this will break the current API, then the key string should include the NM's port as well. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params
[ https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338282#comment-14338282 ] Hudson commented on YARN-3239: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #850 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/850/]) YARN-3239. WebAppProxy does not support a final tracking url which has query fragments and params. Contributed by Jian He (jlowe: rev 1a68fc43464d3948418f453bb2f80df7ce773097) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java WebAppProxy does not support a final tracking url which has query fragments and params --- Key: YARN-3239 URL: https://issues.apache.org/jira/browse/YARN-3239 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Jian He Fix For: 2.7.0 Attachments: YARN-3239.1.patch Examples of failures: Expected: {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}} Actual: {{http://uihost:8080}} Tried with a minor change to remove the #. Saw a different issue: Expected: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}} Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}} yarn application -status appId returns the expected value correctly. However, invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3256) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338290#comment-14338290 ] Hudson commented on YARN-3256: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #850 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/850/]) YARN-3256. TestClientToAMTokens#testClientTokenRace is not running against (devaraj: rev 0d4296f0e0f545267f2e39a868d4ffefc9844db8) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase - Key: YARN-3256 URL: https://issues.apache.org/jira/browse/YARN-3256 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-3256.001.patch The test testClientTokenRace was not using the base class conf causing it to run twice on the same Scheduler configured in the default. All tests deriving from ParameterizedSchedulerTestBase should use the conf created in the base class instead of newing up inside the test and hiding the member. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key
[ https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338318#comment-14338318 ] Hadoop QA commented on YARN-3266: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701022/YARN-3266.02.patch against trunk revision 0d4296f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6755//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6755//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6755//console This message is automatically generated. RMContext inactiveNodes should have NodeId as map key - Key: YARN-3266 URL: https://issues.apache.org/jira/browse/YARN-3266 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-3266.01.patch, YARN-3266.02.patch Under the default NM port configuration, which is 0, we have observed in the current version, lost nodes count is greater than the length of the lost node list. This will happen when we consecutively restart the same NM twice: * NM started at port 10001 * NM restarted at port 10002 * NM restarted at port 10003 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} has 1 element * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, {{inactiveNodes}} still has 1 element Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If this will break the current API, then the key string should include the NM's port as well. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)