[jira] [Commented] (YARN-3548) Missing doc for security configuration for timeline service feature
[ https://issues.apache.org/jira/browse/YARN-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539313#comment-14539313 ] Bibin A Chundatt commented on YARN-3548: [~ste...@apache.org] in YARN-3539 i couldn't find in TimelineServer.html the above 2 configuration . Could you please add these also to same . As i remember did face some issue during startup of server in secure mode with out those. Please do correct me if i am wrong. Missing doc for security configuration for timeline service feature --- Key: YARN-3548 URL: https://issues.apache.org/jira/browse/YARN-3548 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Bibin A Chundatt Assignee: Gururaj Shetty Priority: Minor Documentation for the below need to be added in Timeline server documentation yarn.timeline-service.http-authentication.kerberos.principal yarn.timeline-service.http-authentication.kerberos.keytab url: /hadoop-yarn/hadoop-yarn-site/TimelineServer.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once
[ https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3489: - Attachment: YARN-3489-branch-2.7.patch [~varun_saxena], I noticed there're some test failures of the patch I rebased, could you take a look at the patch I attached and run tests of yarn-server-resourcemanager? It seems like there're some test environment setup issues. RMServerUtils.validateResourceRequests should only obtain queue info once - Key: YARN-3489 URL: https://issues.apache.org/jira/browse/YARN-3489 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Varun Saxena Labels: BB2015-05-RFC Attachments: YARN-3489-branch-2.7.patch, YARN-3489.01.patch, YARN-3489.02.patch, YARN-3489.03.patch Since the label support was added we now get the queue info for each request being validated in SchedulerUtils.validateResourceRequest. If validateResourceRequests needs to validate a lot of requests at a time (e.g.: large cluster with lots of varied locality in the requests) then it will get the queue info for each request. Since we build the queue info this generates a lot of unnecessary garbage, as the queue isn't changing between requests. We should grab the queue info once and pass it down rather than building it again for each request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3526) ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster
[ https://issues.apache.org/jira/browse/YARN-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Weiwei updated YARN-3526: -- Attachment: YARN-3526.002.patch Fixed trailing white spaces ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster - Key: YARN-3526 URL: https://issues.apache.org/jira/browse/YARN-3526 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.6.0 Environment: Red Hat Enterprise Linux Server 6.4 Reporter: Yang Weiwei Assignee: Yang Weiwei Labels: BB2015-05-TBR Attachments: YARN-3526.001.patch, YARN-3526.002.patch On a QJM HA cluster, view RM web UI to track job status, it shows This is standby RM. Redirecting to the current active RM: http://active-RM:8088/proxy/application_1427338037905_0008/mapreduce it refreshes every 3 sec but never going to the correct tracking page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once
[ https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539332#comment-14539332 ] Hadoop QA commented on YARN-3489: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732137/YARN-3489-branch-2.7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | branch-2 / 12584ac | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7877/console | This message was automatically generated. RMServerUtils.validateResourceRequests should only obtain queue info once - Key: YARN-3489 URL: https://issues.apache.org/jira/browse/YARN-3489 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Varun Saxena Labels: BB2015-05-RFC Attachments: YARN-3489-branch-2.7.patch, YARN-3489.01.patch, YARN-3489.02.patch, YARN-3489.03.patch Since the label support was added we now get the queue info for each request being validated in SchedulerUtils.validateResourceRequest. If validateResourceRequests needs to validate a lot of requests at a time (e.g.: large cluster with lots of varied locality in the requests) then it will get the queue info for each request. Since we build the queue info this generates a lot of unnecessary garbage, as the queue isn't changing between requests. We should grab the queue info once and pass it down rather than building it again for each request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3629) NodeID is always printed as null in node manager initialization log.
nijel created YARN-3629: --- Summary: NodeID is always printed as null in node manager initialization log. Key: YARN-3629 URL: https://issues.apache.org/jira/browse/YARN-3629 Project: Hadoop YARN Issue Type: Bug Reporter: nijel Assignee: nijel In Node manager log during startup the following logs is printed 2015-05-12 11:20:02,347 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized nodemanager for *null* : physical-memory=4096 virtual-memory=8602 virtual-cores=8 This line is printed from NodeStatusUpdaterImpl.serviceInit. But the nodeid assignment is happening only in NodeStatusUpdaterImpl.serviceStart {code} protected void serviceStart() throws Exception { // NodeManager is the last service to start, so NodeId is available. this.nodeId = this.context.getNodeId(); {code} Assigning the node id in serviceinit is not feasible since it is generated by ContainerManagerImpl.serviceStart. The log can be moved to service start to give right information to user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly
[ https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539341#comment-14539341 ] Yang Weiwei commented on YARN-2605: --- Hello [~xgong] I noticed that you set an Ignore tag for testRMWebAppRedirect in your patch, can you please let me know why to ignore this test case ? Any idea that how to fix this ? I opened another JIRA YARN-3601, please let me know if that one is valid. [RM HA] Rest api endpoints doing redirect incorrectly - Key: YARN-2605 URL: https://issues.apache.org/jira/browse/YARN-2605 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: bc Wong Assignee: Xuan Gong Labels: newbie Fix For: 2.7.1 Attachments: YARN-2605.1.patch, YARN-2605.2.patch The standby RM's webui tries to do a redirect via meta-refresh. That is fine for pages designed to be viewed by web browsers. But the API endpoints shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd suggest HTTP 303, or return a well-defined error message (json or xml) stating that the standby status and a link to the active RM. The standby RM is returning this today: {noformat} $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics HTTP/1.1 200 OK Cache-Control: no-cache Expires: Thu, 25 Sep 2014 18:34:53 GMT Date: Thu, 25 Sep 2014 18:34:53 GMT Pragma: no-cache Expires: Thu, 25 Sep 2014 18:34:53 GMT Date: Thu, 25 Sep 2014 18:34:53 GMT Pragma: no-cache Content-Type: text/plain; charset=UTF-8 Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics Content-Length: 117 Server: Jetty(6.1.26) This is standby RM. Redirecting to the current active RM: http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3362) Add node label usage in RM CapacityScheduler web UI
[ https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3362: Attachment: YARN-3362.20150512-1.patch 2015.05.12_3362_Queue_Hierarchy.png Hi [~wangda] Have updated the patch and image with appending label with its available resource Add node label usage in RM CapacityScheduler web UI --- Key: YARN-3362 URL: https://issues.apache.org/jira/browse/YARN-3362 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager, webapp Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, 2015.05.10_3362_Queue_Hierarchy.png, 2015.05.12_3362_Queue_Hierarchy.png, CSWithLabelsView.png, No-space-between-Active_user_info-and-next-queues.png, Screen Shot 2015-04-29 at 11.42.17 AM.png, YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, YARN-3362.20150506-1.patch, YARN-3362.20150507-1.patch, YARN-3362.20150510-1.patch, YARN-3362.20150511-1.patch, YARN-3362.20150512-1.patch, capacity-scheduler.xml We don't have node label usage in RM CapacityScheduler web UI now, without this, user will be hard to understand what happened to nodes have labels assign to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1042) add ability to specify affinity/anti-affinity in container requests
[ https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539467#comment-14539467 ] Yang Weiwei commented on YARN-1042: --- There is no update for a long time, anything new here ? This looks like a nice feature that should be helping us a lot, is this a correct direction that we should put some more efforts to get this done in RM side ? I noticed there are alternatives in slider project, e.g SLIDER-82. Please advise. add ability to specify affinity/anti-affinity in container requests --- Key: YARN-1042 URL: https://issues.apache.org/jira/browse/YARN-1042 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Arun C Murthy Attachments: YARN-1042-demo.patch container requests to the AM should be able to request anti-affinity to ensure that things like Region Servers don't come up on the same failure zones. Similarly, you may be able to want to specify affinity to same host or rack without specifying which specific host/rack. Example: bringing up a small giraph cluster in a large YARN cluster would benefit from having the processes in the same rack purely for bandwidth reasons. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3629) NodeID is always printed as null in node manager initialization log.
[ https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539495#comment-14539495 ] nijel commented on YARN-3629: - Moving the logs message is bit tricky since it logs some parameters which is not available in serviceStart. So keeping this log as it is Adding a new log message to print the nodeid for information purpose Any different thought ? NodeID is always printed as null in node manager initialization log. -- Key: YARN-3629 URL: https://issues.apache.org/jira/browse/YARN-3629 Project: Hadoop YARN Issue Type: Bug Reporter: nijel Assignee: nijel Attachments: YARN-3629-1.patch In Node manager log during startup the following logs is printed 2015-05-12 11:20:02,347 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized nodemanager for *null* : physical-memory=4096 virtual-memory=8602 virtual-cores=8 This line is printed from NodeStatusUpdaterImpl.serviceInit. But the nodeid assignment is happening only in NodeStatusUpdaterImpl.serviceStart {code} protected void serviceStart() throws Exception { // NodeManager is the last service to start, so NodeId is available. this.nodeId = this.context.getNodeId(); {code} Assigning the node id in serviceinit is not feasible since it is generated by ContainerManagerImpl.serviceStart. The log can be moved to service start to give right information to user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3629) NodeID is always printed as null in node manager initialization log.
[ https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3629: Attachment: YARN-3629-1.patch Please review NodeID is always printed as null in node manager initialization log. -- Key: YARN-3629 URL: https://issues.apache.org/jira/browse/YARN-3629 Project: Hadoop YARN Issue Type: Bug Reporter: nijel Assignee: nijel Attachments: YARN-3629-1.patch In Node manager log during startup the following logs is printed 2015-05-12 11:20:02,347 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized nodemanager for *null* : physical-memory=4096 virtual-memory=8602 virtual-cores=8 This line is printed from NodeStatusUpdaterImpl.serviceInit. But the nodeid assignment is happening only in NodeStatusUpdaterImpl.serviceStart {code} protected void serviceStart() throws Exception { // NodeManager is the last service to start, so NodeId is available. this.nodeId = this.context.getNodeId(); {code} Assigning the node id in serviceinit is not feasible since it is generated by ContainerManagerImpl.serviceStart. The log can be moved to service start to give right information to user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3411: - Attachment: YARN-3411.poc.6.txt Uploading a patch with the review suggestions. No major changes, some coding updates. Also, rebased to pull in latest commits (Phoenix related changes). But now I am having some trouble getting the timelineservice module to build since it includes both the phoenix and hbase dependencies in the pom. I get Some Enforcer rules have failed errors. I am working on resolving those. Any more eyes on this build error would help! {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:1.3.1:enforce (depcheck) on project hadoop-yarn-server-timelineservice: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. - [Help 1] [ERROR] {code} {code} [INFO] --- maven-site-plugin:3.4:attach-descriptor (attach-descriptor) @ hadoop-yarn-server-timelineservice --- [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (depcheck) @ hadoop-yarn-server-timelineservice --- [WARNING] Dependency convergence error for org.apache.hbase:hbase-common:1.0.1 paths to dependency are: +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.hbase:hbase-client:1.0.1 +-org.apache.hbase:hbase-common:1.0.1 and +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.hbase:hbase-testing-util:1.0.1 +-org.apache.hbase:hbase-common:1.0.1 and +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.hbase:hbase-testing-util:1.0.1 +-org.apache.hbase:hbase-common:1.0.1 and +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.phoenix:phoenix-core:4.3.0 +-org.apache.hbase:hbase-common:0.98.9-hadoop2 and +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.phoenix:phoenix-core:4.3.0 +-org.apache.hbase:hbase-server:1.0.1 +-org.apache.hbase:hbase-common:0.98.9-hadoop2 and +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.phoenix:phoenix-core:4.3.0 +-org.apache.hbase:hbase-server:1.0.1 +-org.apache.hbase:hbase-prefix-tree:0.98.9-hadoop2 +-org.apache.hbase:hbase-common:0.98.9-hadoop2 and +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.phoenix:phoenix-core:4.3.0 +-org.apache.hbase:hbase-server:1.0.1 +-org.apache.hbase:hbase-prefix-tree:0.98.9-hadoop2 +-org.apache.hbase:hbase-common:0.98.9-hadoop2 and +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.phoenix:phoenix-core:4.3.0 +-org.apache.hbase:hbase-server:1.0.1 +-org.apache.hbase:hbase-common:0.98.9-hadoop2 [WARNING] Dependency convergence error for org.apache.hbase:hbase-protocol:1.0.1 paths to dependency are: +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.hbase:hbase-client:1.0.1 +-org.apache.hbase:hbase-protocol:1.0.1 and +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.hbase:hbase-testing-util:1.0.1 +-org.apache.hbase:hbase-protocol:1.0.1 and +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.phoenix:phoenix-core:4.3.0 +-org.apache.hbase:hbase-protocol:0.98.9-hadoop2 and +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.phoenix:phoenix-core:4.3.0 +-org.apache.hbase:hbase-server:1.0.1 +-org.apache.hbase:hbase-protocol:0.98.9-hadoop2 and +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.phoenix:phoenix-core:4.3.0 +-org.apache.hbase:hbase-server:1.0.1 +-org.apache.hbase:hbase-protocol:0.98.9-hadoop2 [WARNING] Dependency convergence error for org.apache.hbase:hbase-hadoop-compat:1.0.1 paths to dependency are: +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.hbase:hbase-testing-util:1.0.1 +-org.apache.hbase:hbase-hadoop-compat:1.0.1 and +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.hbase:hbase-testing-util:1.0.1 +-org.apache.hbase:hbase-hadoop-compat:1.0.1 and +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.hbase:hbase-testing-util:1.0.1 +-org.apache.hbase:hbase-hadoop2-compat:1.0.1 +-org.apache.hbase:hbase-hadoop-compat:1.0.1 and +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.phoenix:phoenix-core:4.3.0 +-org.apache.hbase:hbase-server:1.0.1 +-org.apache.hbase:hbase-prefix-tree:0.98.9-hadoop2 +-org.apache.hbase:hbase-hadoop-compat:1.0.1 and +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT +-org.apache.phoenix:phoenix-core:4.3.0 +-org.apache.hbase:hbase-server:1.0.1
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539425#comment-14539425 ] Hadoop QA commented on YARN-3411: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | patch | 0m 1s | The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. | | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732170/YARN-3411.poc.6.txt | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 987abc9 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7879/console | This message was automatically generated. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539421#comment-14539421 ] zhihai xu commented on YARN-3619: - thanks [~kasha] for assigning this JIRA to me. The root cause is exactly what [~jlowe] said. I just added a little more details based on [~jlowe] succinct comment. {{sampleMetrics}} will be called periodically in MetricsSystemImpl. {{sampleMetrics}} will iterate the {{sources}} in the following code: {code} for (EntryString, MetricsSourceAdapter entry : sources.entrySet()) { if (sourceFilter == null || sourceFilter.accepts(entry.getKey())) { snapshotMetrics(entry.getValue(), bufferBuilder); } } {code} {{snapshotMetrics}} will be called to process every entry from {{sources}} The calling sequence which leads to a ConcurrentModificationException is snapshotMetrics = MetricsSourceAdapter#getMetrics = ContainerMetrics#getMetrics = MetricsSystemImpl#unregisterSource = sources.remove(name) the entry in the {{sources}} is removed when iterate the {{sources}}. So unregisterSource can't be called from getMetrics. I will prepare a patch for review. ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException --- Key: YARN-3619 URL: https://issues.apache.org/jira/browse/YARN-3619 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Jason Lowe Assignee: zhihai xu ContainerMetrics is able to unregister itself during the getMetrics method, but that method can be called by MetricsSystemImpl.sampleMetrics which is trying to iterate the sources. This leads to a ConcurrentModificationException log like this: {noformat} 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN impl.MetricsSystemImpl: java.util.ConcurrentModificationException {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3630) YARN should suggest a heartbeat interval for applications
Zoltán Zvara created YARN-3630: -- Summary: YARN should suggest a heartbeat interval for applications Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: client, resourcemanager, scheduler Reporter: Zoltán Zvara It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI
[ https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539483#comment-14539483 ] Hadoop QA commented on YARN-3362: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 47s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 50s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 6s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 30s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 51m 59s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 89m 6s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732150/YARN-3362.20150512-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 987abc9 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7878/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7878/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7878/console | This message was automatically generated. Add node label usage in RM CapacityScheduler web UI --- Key: YARN-3362 URL: https://issues.apache.org/jira/browse/YARN-3362 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager, webapp Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, 2015.05.10_3362_Queue_Hierarchy.png, 2015.05.12_3362_Queue_Hierarchy.png, CSWithLabelsView.png, No-space-between-Active_user_info-and-next-queues.png, Screen Shot 2015-04-29 at 11.42.17 AM.png, YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, YARN-3362.20150506-1.patch, YARN-3362.20150507-1.patch, YARN-3362.20150510-1.patch, YARN-3362.20150511-1.patch, YARN-3362.20150512-1.patch, capacity-scheduler.xml We don't have node label usage in RM CapacityScheduler web UI now, without this, user will be hard to understand what happened to nodes have labels assign to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1042) add ability to specify affinity/anti-affinity in container requests
[ https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539481#comment-14539481 ] Weiwei Yang commented on YARN-1042: --- Sorry, not an alternative, SLIDER-82 is depending on this jira. add ability to specify affinity/anti-affinity in container requests --- Key: YARN-1042 URL: https://issues.apache.org/jira/browse/YARN-1042 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Arun C Murthy Attachments: YARN-1042-demo.patch container requests to the AM should be able to request anti-affinity to ensure that things like Region Servers don't come up on the same failure zones. Similarly, you may be able to want to specify affinity to same host or rack without specifying which specific host/rack. Example: bringing up a small giraph cluster in a large YARN cluster would benefit from having the processes in the same rack purely for bandwidth reasons. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-2336: Hadoop Flags: Incompatible change Marking this issue as incompatible change since this fix includes API change. Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1 Reporter: Kenji Kikushima Assignee: Kenji Kikushima Labels: BB2015-05-TBR Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, YARN-2336.005.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-2336: Labels: BB2015-05-RFC (was: BB2015-05-TBR) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1 Reporter: Kenji Kikushima Assignee: Kenji Kikushima Labels: BB2015-05-RFC Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, YARN-2336.005.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3513) Remove unused variables in ContainersMonitorImpl and add debug log for overall resource usage by all containers
[ https://issues.apache.org/jira/browse/YARN-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-3513: Hadoop Flags: Reviewed Thanks [~Naganarasimha] for the updated patch. +1, Latest patch looks good to me, will commit it shortly. Remove unused variables in ContainersMonitorImpl and add debug log for overall resource usage by all containers Key: YARN-3513 URL: https://issues.apache.org/jira/browse/YARN-3513 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Trivial Labels: BB2015-05-TBR, newbie Attachments: YARN-3513.20150421-1.patch, YARN-3513.20150503-1.patch, YARN-3513.20150506-1.patch, YARN-3513.20150507-1.patch, YARN-3513.20150508-1.patch, YARN-3513.20150508-1.patch, YARN-3513.20150511-1.patch Some local variables in MonitoringThread.run() : {{vmemStillInUsage and pmemStillInUsage}} are not used and just updated. Instead we need to add debug log for overall resource usage by all containers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3526) ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster
[ https://issues.apache.org/jira/browse/YARN-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539468#comment-14539468 ] Hadoop QA commented on YARN-3526: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 36s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 15s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 58s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 6m 52s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 52m 11s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 96m 46s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732139/YARN-3526.002.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 3d28611 | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/7876/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7876/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7876/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7876/console | This message was automatically generated. ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster - Key: YARN-3526 URL: https://issues.apache.org/jira/browse/YARN-3526 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.6.0 Environment: Red Hat Enterprise Linux Server 6.4 Reporter: Weiwei Yang Assignee: Weiwei Yang Labels: BB2015-05-TBR Attachments: YARN-3526.001.patch, YARN-3526.002.patch On a QJM HA cluster, view RM web UI to track job status, it shows This is standby RM. Redirecting to the current active RM: http://active-RM:8088/proxy/application_1427338037905_0008/mapreduce it refreshes every 3 sec but never going to the correct tracking page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1042) add ability to specify affinity/anti-affinity in container requests
[ https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539482#comment-14539482 ] Weiwei Yang commented on YARN-1042: --- Sorry, not an alternative, SLIDER-82 is depending on this jira. add ability to specify affinity/anti-affinity in container requests --- Key: YARN-1042 URL: https://issues.apache.org/jira/browse/YARN-1042 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Arun C Murthy Attachments: YARN-1042-demo.patch container requests to the AM should be able to request anti-affinity to ensure that things like Region Servers don't come up on the same failure zones. Similarly, you may be able to want to specify affinity to same host or rack without specifying which specific host/rack. Example: bringing up a small giraph cluster in a large YARN cluster would benefit from having the processes in the same rack purely for bandwidth reasons. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-2336: Attachment: YARN-2336.005.patch v5 patch * rebased for the latest trunk * updated the document Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1 Reporter: Kenji Kikushima Assignee: Kenji Kikushima Labels: BB2015-05-TBR Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, YARN-2336.005.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Zvara updated YARN-3630: --- Description: It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. (was: It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to application.) YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: client, resourcemanager, scheduler Reporter: Zoltán Zvara It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540229#comment-14540229 ] Hadoop QA commented on YARN-160: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 32s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 43s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 27s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 30s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | tools/hadoop tests | 14m 37s | Tests passed in hadoop-gridmix. | | {color:green}+1{color} | yarn tests | 0m 26s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 2s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 6m 2s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 65m 9s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732266/YARN-160.005.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / f4e2b3c | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7890/artifact/patchprocess/whitespace.txt | | hadoop-gridmix test log | https://builds.apache.org/job/PreCommit-YARN-Build/7890/artifact/patchprocess/testrun_hadoop-gridmix.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7890/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7890/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7890/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7890/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7890/console | This message was automatically generated. nodemanagers should obtain cpu/memory values from underlying OS --- Key: YARN-160 URL: https://issues.apache.org/jira/browse/YARN-160 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Varun Vasudev Labels: BB2015-05-TBR Attachments: YARN-160.005.patch, apache-yarn-160.0.patch, apache-yarn-160.1.patch, apache-yarn-160.2.patch, apache-yarn-160.3.patch As mentioned in YARN-2 *NM memory and CPU configs* Currently these values are coming from the config of the NM, we should be able to obtain those values from the OS (ie, in the case of Linux from /proc/meminfo /proc/cpuinfo). As this is highly OS dependent we should have an interface that obtains this information. In addition implementations of this interface should be able to specify a mem/cpu offset (amount of mem/cpu not to be avail as YARN resource), this would allow to reserve mem/cpu for the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods
[ https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540324#comment-14540324 ] Karthik Kambatla commented on YARN-3613: +1. Checking this in. TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods -- Key: YARN-3613 URL: https://issues.apache.org/jira/browse/YARN-3613 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: nijel Priority: Minor Labels: newbie Attachments: YARN-3613-1.patch In TestContainerManagerSecurity, individual tests init and start Yarn cluster. This duplication can be avoided by moving that to setup. Further, one could merge the two @Test methods to avoid bringing up another mini-cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540383#comment-14540383 ] Vrushali C commented on YARN-3411: -- Sounds good, thanks [~djp]! But, would like to request you to wait before reviewing again, I am planning to upload a new patch later today. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540278#comment-14540278 ] Zhijie Shen commented on YARN-3529: --- I think the reason why you still need version in timelineservice/pom.xml is because the non-test scope dependency is not added in hadoop-project/pom.xml {code} 125 groupIdorg.apache.phoenix/groupId 126 artifactIdphoenix-core/artifactId 127 version${phoenix.version}/version 128 exclusions 129 !-- Exclude jline from here -- 130 exclusion 131 artifactIdjline/artifactId 132 groupIdjline/groupId 133 /exclusion 134 /exclusions {code} Add miniHBase cluster and Phoenix support to ATS v2 unit tests -- Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: AbstractMiniHBaseClusterTest.java, YARN-3529-YARN-2928.000.patch, YARN-3529-YARN-2928.001.patch, YARN-3529-YARN-2928.002.patch, output_minicluster2.txt After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540297#comment-14540297 ] Vrushali C commented on YARN-3411: -- Ah, I haven't added that patch, thanks [~gtCarrera9] let me give that a try. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
[ https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540148#comment-14540148 ] Hadoop QA commented on YARN-3625: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 36s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 28s | The applied patch generated 1 new checkstyle issues (total was 6, now 5). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 48s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 3m 8s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | | | 38m 40s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732257/YARN-3625.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f4e2b3c | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7889/artifact/patchprocess/diffcheckstylehadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7889/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7889/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7889/console | This message was automatically generated. RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put -- Key: YARN-3625 URL: https://issues.apache.org/jira/browse/YARN-3625 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3625.1.patch, YARN-3625.2.patch RollingLevelDBTimelineStore batches all entities in the same put to improve performance. This causes an error when relating to an entity in the same put however. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-160: --- Attachment: YARN-160.006.patch Uploaded 006.patch to fix whitespace issues. nodemanagers should obtain cpu/memory values from underlying OS --- Key: YARN-160 URL: https://issues.apache.org/jira/browse/YARN-160 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Varun Vasudev Labels: BB2015-05-TBR Attachments: YARN-160.005.patch, YARN-160.006.patch, apache-yarn-160.0.patch, apache-yarn-160.1.patch, apache-yarn-160.2.patch, apache-yarn-160.3.patch As mentioned in YARN-2 *NM memory and CPU configs* Currently these values are coming from the config of the NM, we should be able to obtain those values from the OS (ie, in the case of Linux from /proc/meminfo /proc/cpuinfo). As this is highly OS dependent we should have an interface that obtains this information. In addition implementations of this interface should be able to specify a mem/cpu offset (amount of mem/cpu not to be avail as YARN resource), this would allow to reserve mem/cpu for the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
[ https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3625: -- Target Version/s: 2.8.0 RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put -- Key: YARN-3625 URL: https://issues.apache.org/jira/browse/YARN-3625 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3625.1.patch, YARN-3625.2.patch RollingLevelDBTimelineStore batches all entities in the same put to improve performance. This causes an error when relating to an entity in the same put however. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods
[ https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla resolved YARN-3613. Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Thanks for working on this, nijel. Just committed this to trunk and branch-2. TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods -- Key: YARN-3613 URL: https://issues.apache.org/jira/browse/YARN-3613 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: nijel Priority: Minor Labels: newbie Fix For: 2.8.0 Attachments: YARN-3613-1.patch, yarn-3613-2.patch In TestContainerManagerSecurity, individual tests init and start Yarn cluster. This duplication can be avoided by moving that to setup. Further, one could merge the two @Test methods to avoid bringing up another mini-cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3579) getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead of label name as String
[ https://issues.apache.org/jira/browse/YARN-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3579: -- Attachment: 0002-YARN-3579.patch Thank You [~leftnoteasy] for the comments. I updated the patch as per same. I used a generic method to remove the code duplication, kindly check the same. getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead of label name as String Key: YARN-3579 URL: https://issues.apache.org/jira/browse/YARN-3579 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Priority: Minor Attachments: 0001-YARN-3579.patch, 0002-YARN-3579.patch CommonNodeLabelsManager#getLabelsToNodes returns label name as string. It is not passing information such as Exclusivity etc back to REST interface apis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540211#comment-14540211 ] Li Lu commented on YARN-3411: - Hi [~vrushalic], sorry to hear the trouble. Have you tried to apply the latest patch in YARN-3529? In that JIRA I'm trying to make the Phoenix writer works with the snapshot version of Phoenix, which lives happily with HBase 1.0.1. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated YARN-3069: - Attachment: YARN-3069.008.patch Update for property added in YARN-1912. Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: BB2015-05-TBR, supportability Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store yarn.resourcemanager.delegation-token-renewer.thread-count yarn.resourcemanager.delegation.key.update-interval yarn.resourcemanager.delegation.token.max-lifetime yarn.resourcemanager.delegation.token.renew-interval yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size yarn.resourcemanager.metrics.runtime.buckets yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs yarn.resourcemanager.reservation-system.class yarn.resourcemanager.reservation-system.enable yarn.resourcemanager.reservation-system.plan.follower yarn.resourcemanager.reservation-system.planfollower.time-step yarn.resourcemanager.rm.container-allocation.expiry-interval-ms yarn.resourcemanager.webapp.spnego-keytab-file yarn.resourcemanager.webapp.spnego-principal yarn.scheduler.include-port-in-node-name yarn.timeline-service.delegation.key.update-interval yarn.timeline-service.delegation.token.max-lifetime yarn.timeline-service.delegation.token.renew-interval yarn.timeline-service.generic-application-history.enabled yarn.timeline-service.generic-application-history.fs-history-store.compression-type yarn.timeline-service.generic-application-history.fs-history-store.uri yarn.timeline-service.generic-application-history.store-class yarn.timeline-service.http-cross-origin.enabled yarn.tracking.url.generator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3634) TestMRTimelineEventHandling is broken due to timing issues
Sangjin Lee created YARN-3634: - Summary: TestMRTimelineEventHandling is broken due to timing issues Key: YARN-3634 URL: https://issues.apache.org/jira/browse/YARN-3634 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee TestMRTimelineEventHandling is broken. Relevant error message: {noformat} 2015-05-12 06:28:56,415 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:57,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:58,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:59,417 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:00,418 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:01,419 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:02,420 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:03,420 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:04,421 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:05,422 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] collector.NodeTimelineCollectorManager (NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with NM Collector Service for application_1431412130291_0001 2015-05-12 06:29:05,425 WARN [AsyncDispatcher event handler] containermanager.AuxServices (AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The auxService name is timeline_collector and it got an error at event: CONTAINER_INIT org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 to asf904.gq1.ygridcore.net:0 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.putIfAbsent(TimelineCollectorManager.java:97) at org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.addApplication(PerNodeTimelineCollectorsAuxService.java:99) at org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.initializeContainer(PerNodeTimelineCollectorsAuxService.java:126)
[jira] [Updated] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods
[ https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3613: --- Attachment: yarn-3613-2.patch At commit time, modified the comment that precedes call to {{testContainerTokenWithEpoch}}. Attaching the updated diff here. TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods -- Key: YARN-3613 URL: https://issues.apache.org/jira/browse/YARN-3613 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: nijel Priority: Minor Labels: newbie Attachments: YARN-3613-1.patch, yarn-3613-2.patch In TestContainerManagerSecurity, individual tests init and start Yarn cluster. This duplication can be avoided by moving that to setup. Further, one could merge the two @Test methods to avoid bringing up another mini-cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3529: Attachment: YARN-3529-YARN-2928.003.patch Thanks [~zjshen]! I forgot to move this part. Nice catch! Add miniHBase cluster and Phoenix support to ATS v2 unit tests -- Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: AbstractMiniHBaseClusterTest.java, YARN-3529-YARN-2928.000.patch, YARN-3529-YARN-2928.001.patch, YARN-3529-YARN-2928.002.patch, YARN-3529-YARN-2928.003.patch, output_minicluster2.txt After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3629) NodeID is always printed as null in node manager initialization log.
[ https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540225#comment-14540225 ] Hudson commented on YARN-3629: -- FAILURE: Integrated in Hadoop-trunk-Commit #7807 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7807/]) YARN-3629. NodeID is always printed as null in node manager (devaraj: rev 5c2f05cd9bad9bf9beb0f4ca18f4ae1bc3e84499) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java NodeID is always printed as null in node manager initialization log. -- Key: YARN-3629 URL: https://issues.apache.org/jira/browse/YARN-3629 Project: Hadoop YARN Issue Type: Bug Reporter: nijel Assignee: nijel Fix For: 2.8.0 Attachments: YARN-3629-1.patch In Node manager log during startup the following logs is printed 2015-05-12 11:20:02,347 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized nodemanager for *null* : physical-memory=4096 virtual-memory=8602 virtual-cores=8 This line is printed from NodeStatusUpdaterImpl.serviceInit. But the nodeid assignment is happening only in NodeStatusUpdaterImpl.serviceStart {code} protected void serviceStart() throws Exception { // NodeManager is the last service to start, so NodeId is available. this.nodeId = this.context.getNodeId(); {code} Assigning the node id in serviceinit is not feasible since it is generated by ContainerManagerImpl.serviceStart. The log can be moved to service start to give right information to user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
[ https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540263#comment-14540263 ] Jonathan Eagles commented on YARN-3625: --- [~zjshen], this is a small bug that was found. The end result is that sometimes there are missing related entities. As a domain is required now, it is safe to say that a missing domain can represent an entity that is permitted to relate to. Checkstyle issue is existing for this method. Jon RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put -- Key: YARN-3625 URL: https://issues.apache.org/jira/browse/YARN-3625 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3625.1.patch, YARN-3625.2.patch RollingLevelDBTimelineStore batches all entities in the same put to improve performance. This causes an error when relating to an entity in the same put however. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3633) With Fair Scheduler, cluster can logjam when there are too many queues
Rohit Agarwal created YARN-3633: --- Summary: With Fair Scheduler, cluster can logjam when there are too many queues Key: YARN-3633 URL: https://issues.apache.org/jira/browse/YARN-3633 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: Rohit Agarwal Priority: Critical It's possible to logjam a cluster by submitting many applications at once in different queues. For example, let's say there is a cluster with 20GB of total memory. Let's say 4 users submit applications at the same time. The fair share of each queue is 5GB. Let's say that maxAMShare is 0.5. So, each queue has at most 2.5GB memory for AMs. If all the users requested AMs of size 3GB - the cluster logjams. Nothing gets scheduled even when 20GB of resources are available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540303#comment-14540303 ] Vrushali C commented on YARN-3411: -- bq. Also comments on latest (v5) patch Thanks [~djp] for the review but the latest patch is v6. So some of the code lines are no longer there but I will make the other changes like class name rename and making the table name consistent with Phoenix writer table names and the code changes for creating the connection etc. I am working on updating the patch with your review suggestions and the patch in YARN-3529 which will help me build. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540373#comment-14540373 ] Junping Du commented on YARN-3411: -- Sure. I start this rounds of review start several days ago, but cannot get finished mostly until today. Sorry for missing the new version of patch, will check it quite soon. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API
[ https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540113#comment-14540113 ] Hadoop QA commented on YARN-3539: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 38s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 59s | Site still builds. | | {color:green}+1{color} | checkstyle | 3m 31s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 1m 38s | The patch has 18 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 15s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | common tests | 23m 28s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-common. | | | | 76m 58s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732244/YARN-3539-010.patch | | Optional Tests | site javadoc javac unit findbugs checkstyle | | git revision | trunk / 6d5da94 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7886/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7886/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7886/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7886/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7886/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7886/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7886/console | This message was automatically generated. Compatibility doc to state that ATS v1 is a stable REST API --- Key: YARN-3539 URL: https://issues.apache.org/jira/browse/YARN-3539 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.7.0 Reporter: Steve Loughran Assignee: Steve Loughran Labels: BB2015-05-TBR Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, TimelineServer.html, YARN-3539-003.patch, YARN-3539-004.patch, YARN-3539-005.patch, YARN-3539-006.patch, YARN-3539-007.patch, YARN-3539-008.patch, YARN-3539-009.patch, YARN-3539-010.patch, timeline_get_api_examples.txt The ATS v2 discussion and YARN-2423 have raised the question: how stable are the ATSv1 APIs? The existing compatibility document actually states that the History Server is [a stable REST API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs], which effectively means that ATSv1 has already been declared as a stable API. Clarify this by patching the compatibility document appropriately -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1297) Miscellaneous Fair Scheduler speedups
[ https://issues.apache.org/jira/browse/YARN-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540184#comment-14540184 ] Hadoop QA commented on YARN-1297: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 14s | The applied patch generated 2 new checkstyle issues (total was 180, now 179). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 52m 19s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 89m 20s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732250/YARN-1297.4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6d5da94 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7887/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7887/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7887/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7887/console | This message was automatically generated. Miscellaneous Fair Scheduler speedups - Key: YARN-1297 URL: https://issues.apache.org/jira/browse/YARN-1297 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Sandy Ryza Assignee: Arun Suresh Attachments: YARN-1297-1.patch, YARN-1297-2.patch, YARN-1297.3.patch, YARN-1297.4.patch, YARN-1297.4.patch, YARN-1297.patch, YARN-1297.patch I ran the Fair Scheduler's core scheduling loop through a profiler tool and identified a bunch of minimally invasive changes that can shave off a few milliseconds. The main one is demoting a couple INFO log messages to DEBUG, which brought my benchmark down from 16000 ms to 6000. A few others (which had way less of an impact) were * Most of the time in comparisons was being spent in Math.signum. I switched this to direct ifs and elses and it halved the percent of time spent in comparisons. * I removed some unnecessary instantiations of Resource objects * I made it so that queues' usage wasn't calculated from the applications up each time getResourceUsage was called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540220#comment-14540220 ] Junping Du commented on YARN-3411: -- Also comments on latest (v5) patch: {code} +public class CreateSchema { {code} Can we rename it to a more concrete name, something like: TimelineSchemaCreator? {code} + private static int createTimelineEntityTable() { +try { + Configuration config = HBaseConfiguration.create(); + // add the hbase configuration details from classpath + config.addResource(hbase-site.xml); + Connection conn = ConnectionFactory.createConnection(config); + Admin admin = conn.getAdmin(); ... {code} All of these code should be reused by create other tables. May be we should move it out of createTimelineEntityTable() and make it as static part of Class? {code} + if (admin.tableExists(table)) { +// do not disable / delete existing table +// similar to the approach taken by map-reduce jobs when +// output directory exists +LOG.error(Table + table.getNameAsString() + already exists.); +return 1; + } {code} We would like to throw exception here so user can get notified the failed reason immediately? {code} + // TTL is 30 days, need to make it configurable perhaps + cf3.setTimeToLive(2592000); {code} We shouldn't have a hard code value here. At least, add a TODO in comment to fix it later. In HBaseTimelineWriterImpl.java, {code} +// TODO right now using a default table name +// change later to use a config driven table name +entityTableName = TableName +.valueOf(EntityTableDetails.DEFAULT_ENTITY_TABLE_NAME); {code} Shall we consistent with table name of Pheonix writer if haven't make it configurable? Or we intent to do so for some reasons? {code} + if (entityPuts.size() 0) { +LOG.info(Storing + entityPuts.size() + to ++ this.entityTableName.getNameAsString()); +entityTable.put(entityPuts); + } else { +LOG.warn(empty entity object?); + } {code} The first log should be DEBUG level and wrap with if block of LOG.isDebugEnabled() which help performance. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540371#comment-14540371 ] Wangda Tan commented on YARN-3635: -- +[~vinodkv], [~kasha], [~jlowe], [~jianhe] Get-queue-mapping should be a common interface of YarnScheduler --- Key: YARN-3635 URL: https://issues.apache.org/jira/browse/YARN-3635 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Wangda Tan Assignee: Wangda Tan Currently, both of fair/capacity scheduler support queue mapping, which makes scheduler can change queue of an application after submitted to scheduler. One issue of doing this in specific scheduler is: If the queue after mapping has different maximum_allocation/default-node-label-expression of the original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks the wrong queue. I propose to make the queue mapping as a common interface of scheduler, and RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
Wangda Tan created YARN-3635: Summary: Get-queue-mapping should be a common interface of YarnScheduler Key: YARN-3635 URL: https://issues.apache.org/jira/browse/YARN-3635 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Wangda Tan Assignee: Wangda Tan Currently, both of fair/capacity scheduler support queue mapping, which makes scheduler can change queue of an application after submitted to scheduler. One issue of doing this in specific scheduler is: If the queue after mapping has different maximum_allocation/default-node-label-expression of the original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks the wrong queue. I propose to make the queue mapping as a common interface of scheduler, and RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3632: -- Attachment: YARN-3632.0.patch Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Bug Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3627) Preemption not triggered in Fair scheduler when maxResources is set on parent queue
[ https://issues.apache.org/jira/browse/YARN-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540111#comment-14540111 ] Karthik Kambatla commented on YARN-3627: I see this as a duplicate or closely related to YARN-3405. [~bibinchundatt] - are you able to try out the patch there and see if it solves the issue for here. I ll try and get to YARN-3405 later this week. Preemption not triggered in Fair scheduler when maxResources is set on parent queue --- Key: YARN-3627 URL: https://issues.apache.org/jira/browse/YARN-3627 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler, scheduler Environment: Suse 11 SP3, 2 NM Reporter: Bibin A Chundatt Consider the below scenario of fair configuration Root (10Gb cluster resource) --Q1 (maxResources 4gb) Q1.1 (maxResources 4gb) Q1.2 (maxResources 4gb) --Q2 (maxResources 6GB) No applications are running in Q2 Submit one application with to Q1.1 with 50 maps 4Gb gets allocated to Q1.1 Now submit application to Q1.2 the same will be starving for memory always. Preemption will never get triggered since yarn.scheduler.fair.preemption.cluster-utilization-threshold =.8 and the cluster utilization is below .8. *Fairscheduler.java* {code} private boolean shouldAttemptPreemption() { if (preemptionEnabled) { return (preemptionUtilizationThreshold Math.max( (float) rootMetrics.getAllocatedMB() / clusterResource.getMemory(), (float) rootMetrics.getAllocatedVirtualCores() / clusterResource.getVirtualCores())); } return false; } {code} Are we supposed to configure in running cluster maxResources 0mb and 0 cores so that all queues can take full cluster resources always if available?? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3629) NodeID is always printed as null in node manager initialization log.
[ https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-3629: Target Version/s: 2.8.0 Hadoop Flags: Reviewed Thanks [~nijel] for your contribution. +1, patch looks good to me. NodeID is always printed as null in node manager initialization log. -- Key: YARN-3629 URL: https://issues.apache.org/jira/browse/YARN-3629 Project: Hadoop YARN Issue Type: Bug Reporter: nijel Assignee: nijel Attachments: YARN-3629-1.patch In Node manager log during startup the following logs is printed 2015-05-12 11:20:02,347 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized nodemanager for *null* : physical-memory=4096 virtual-memory=8602 virtual-cores=8 This line is printed from NodeStatusUpdaterImpl.serviceInit. But the nodeid assignment is happening only in NodeStatusUpdaterImpl.serviceStart {code} protected void serviceStart() throws Exception { // NodeManager is the last service to start, so NodeId is available. this.nodeId = this.context.getNodeId(); {code} Assigning the node id in serviceinit is not feasible since it is generated by ContainerManagerImpl.serviceStart. The log can be moved to service start to give right information to user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3579) getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead of label name as String
[ https://issues.apache.org/jira/browse/YARN-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540210#comment-14540210 ] Hadoop QA commented on YARN-3579: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 32s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 55s | The applied patch generated 1 new checkstyle issues (total was 33, now 33). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 15 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 28s | The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | | | 38m 42s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-common | | | Comparison of String objects using == or != in org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.createNodeLabelFromLabelNames(Set) At CommonNodeLabelsManager.java:== or != in org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.createNodeLabelFromLabelNames(Set) At CommonNodeLabelsManager.java:[line 1014] | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732271/0002-YARN-3579.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f4e2b3c | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7892/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7892/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7892/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7892/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7892/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7892/console | This message was automatically generated. getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead of label name as String Key: YARN-3579 URL: https://issues.apache.org/jira/browse/YARN-3579 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Priority: Minor Attachments: 0001-YARN-3579.patch, 0002-YARN-3579.patch CommonNodeLabelsManager#getLabelsToNodes returns label name as string. It is not passing information such as Exclusivity etc back to REST interface apis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540288#comment-14540288 ] Robert Kanter commented on YARN-2942: - Thanks [~jlowe] for your feedback. It's good to get more views on this. {quote} If I understand them correctly they both propose that the NMs upload the original per-node aggregated log to HDFS and then something (either the NMs or the RM) later comes along and creates the aggregate-of-aggregates log{quote} Yes. That's correct. {quote}However I didn't see details on solving the race condition where a log reader comes along, sees from the index file that the desired log isn't in the aggregate-of-aggregates, then opens the log and reads from it just as the log is deleted by the entity appending to the aggregate-of-aggregates.{quote} That's a good point. I hadn't thought of that issue. Thinking about it now, I think there's a few options here: - We could simply have the reader try again if it runs into a problem - We could have the last NM delete the aggregated log files, so that it's less likely that this situation can occur - Each NM could wait some amount of time (e.g. a few mins) after appending it's log file before deleting the original file, so that it's less likely that this situation can occur {quote}We have an internal solution where we create per-application har files of the logs{quote} Can you give some more details on this? Is it something you can share? If you've already solved this issue, then perhaps we can just use that. Though doesn't creating har files require running an MR job? {quote}Another issue from log aggregation we've seen in practice is that the proposals don't address the significant write load the per-node aggregate files place on the namenode.{quote} That's a good point. Shortly after a job finishes, all of the involved NMs would upload their log files around the same time, which puts stress on the NN. The NM giving the RM reports of the current aggregation progress was recently added by YARN-1376 and related. Having the RM coordinate the aggregation is similar to my design with ZK, but instead of a ZK lock, the RM orchestrates things. I like the idea of getting rid of the original aggregation and having the NMs all write to HDFS once, in the combined file directly. We'd have to implement your last bullet point to have the NMs serve the logs in the meantime, as I don't think that's there today. I'll try to flesh this design out a bit more and see where it goes. Unless we should use har files; though that adds an MR dependency. Aggregated Log Files should be combined --- Key: YARN-2942 URL: https://issues.apache.org/jira/browse/YARN-2942 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: CombinedAggregatedLogsProposal_v3.pdf, CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, CompactedAggregatedLogsProposal_v1.pdf, CompactedAggregatedLogsProposal_v2.pdf, ConcatableAggregatedLogsProposal_v4.pdf, ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, YARN-2942.003.patch Turning on log aggregation allows users to easily store container logs in HDFS and subsequently view them in the YARN web UIs from a central place. Currently, there is a separate log file for each Node Manager. This can be a problem for HDFS if you have a cluster with many nodes as you’ll slowly start accumulating many (possibly small) files per YARN application. The current “solution” for this problem is to configure YARN (actually the JHS) to automatically delete these files after some amount of time. We should improve this by compacting the per-node aggregated log files into one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods
[ https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540337#comment-14540337 ] Hudson commented on YARN-3613: -- FAILURE: Integrated in Hadoop-trunk-Commit #7808 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7808/]) YARN-3613. TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods. (nijel via kasha) (kasha: rev fe0df596271340788095cb43a1944e19ac4c2cf7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods -- Key: YARN-3613 URL: https://issues.apache.org/jira/browse/YARN-3613 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: nijel Priority: Minor Labels: newbie Fix For: 2.8.0 Attachments: YARN-3613-1.patch, yarn-3613-2.patch In TestContainerManagerSecurity, individual tests init and start Yarn cluster. This duplication can be avoided by moving that to setup. Further, one could merge the two @Test methods to avoid bringing up another mini-cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540377#comment-14540377 ] Hadoop QA commented on YARN-2556: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 20s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 33s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 41s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | mapreduce tests | 109m 52s | Tests failed in hadoop-mapreduce-client-jobclient. | | | | 126m 40s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.mapred.TestMRIntermediateDataEncryption | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732269/YARN-2556.5.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / f4e2b3c | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7891/artifact/patchprocess/whitespace.txt | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/7891/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7891/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7891/console | This message was automatically generated. Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Chang Li Labels: BB2015-05-TBR Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, YARN-2556.1.patch, YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, YARN-2556.5.patch, YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540411#comment-14540411 ] Hadoop QA commented on YARN-3069: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 41s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 51s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 23s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 1m 55s | Tests passed in hadoop-yarn-common. | | | | 38m 35s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732298/YARN-3069.008.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / fe0df59 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7894/artifact/patchprocess/whitespace.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7894/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7894/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7894/console | This message was automatically generated. Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: BB2015-05-TBR, supportability Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store yarn.resourcemanager.delegation-token-renewer.thread-count yarn.resourcemanager.delegation.key.update-interval yarn.resourcemanager.delegation.token.max-lifetime yarn.resourcemanager.delegation.token.renew-interval
[jira] [Commented] (YARN-3633) With Fair Scheduler, cluster can logjam when there are too many queues
[ https://issues.apache.org/jira/browse/YARN-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540491#comment-14540491 ] Rohit Agarwal commented on YARN-3633: - So, in essence the problem is that when there are too many queues, the fair share of each queue gets low and thus the maxAMShare, which is calculated from the fairShare of each queue, gets too low to run any container. I propose the following solution: Instead of setting {code} maxAMShare = 0.5*fairShare {code} we set it to {code} maxAMShare = max(0.5*fairShare, SomeMinimumSizeEnoughToRunOneContainer) {code} And then add a cluster-wide maxAMShare to be {{0.5*totalClusterCapacity}} All these ratios/values can be configurable. So, in the scenario described in the JIRA, we would still run AMs in some queues but we won't overrun the cluster with AMs because it will hit the cluster-wide limit. If this proposal sounds reasonable, I can start working on this. However, I am not sure how this would interact with preemption. With Fair Scheduler, cluster can logjam when there are too many queues -- Key: YARN-3633 URL: https://issues.apache.org/jira/browse/YARN-3633 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: Rohit Agarwal Priority: Critical It's possible to logjam a cluster by submitting many applications at once in different queues. For example, let's say there is a cluster with 20GB of total memory. Let's say 4 users submit applications at the same time. The fair share of each queue is 5GB. Let's say that maxAMShare is 0.5. So, each queue has at most 2.5GB memory for AMs. If all the users requested AMs of size 3GB - the cluster logjams. Nothing gets scheduled even when 20GB of resources are available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540497#comment-14540497 ] Hadoop QA commented on YARN-3529: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 25s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 58s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 11s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 16s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 56s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 47s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 44s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 1m 3s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 39m 52s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732304/YARN-3529-YARN-2928.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / b3b791b | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7895/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7895/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7895/console | This message was automatically generated. Add miniHBase cluster and Phoenix support to ATS v2 unit tests -- Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: AbstractMiniHBaseClusterTest.java, YARN-3529-YARN-2928.000.patch, YARN-3529-YARN-2928.001.patch, YARN-3529-YARN-2928.002.patch, YARN-3529-YARN-2928.003.patch, output_minicluster2.txt After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3634) TestMRTimelineEventHandling and TestApplication are broken
[ https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3634: -- Attachment: YARN-3634-YARN-2928.002.patch Patch v.2 posted. - fixed the findbugs issue - fixed the TestApplication tests (existing failure) TestMRTimelineEventHandling and TestApplication are broken -- Key: YARN-3634 URL: https://issues.apache.org/jira/browse/YARN-3634 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3634-YARN-2928.001.patch, YARN-3634-YARN-2928.002.patch TestMRTimelineEventHandling is broken. Relevant error message: {noformat} 2015-05-12 06:28:56,415 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:57,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:58,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:59,417 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:00,418 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:01,419 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:02,420 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:03,420 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:04,421 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:05,422 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] collector.NodeTimelineCollectorManager (NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with NM Collector Service for application_1431412130291_0001 2015-05-12 06:29:05,425 WARN [AsyncDispatcher event handler] containermanager.AuxServices (AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The auxService name is timeline_collector and it got an error at event: CONTAINER_INIT org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 to asf904.gq1.ygridcore.net:0 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540407#comment-14540407 ] Jason Lowe commented on YARN-2942: -- bq. Can you give some more details on this? Is it something you can share? It's a hack to help mitigate the log aggregation namespace scaling issues on our large clusters. Essentially its a periodic process to run an Oozie workflow that does the following: # determines which applications are good candidates for log archiving (i.e.: lots of files and total size is not that big) # runs a streaming job with a shell script that uses the list of applications to aggregate as input # for each application it runs a local-mode archive job to archive the log contents # when the archive has been created it swaps out the application directory with a symlink into the har archive The symlink makes the archive transparent to the readers. Both the JHS and the yarn logs command use FileContext and just worked with the symlink into the har without modifications. So yes, we are running a MapReduce job to archive the logs which itself will create more logs. However it processes many application logs for each archiving job. If there is sufficient interest we can pursue how to share it, but the script is specific to how we configure our nodes and clusters and relies on unsupported symlinks. I'm hoping the outcome of this JIRA allows us to move away from the need for it. bq. We'd have to implement your last bullet point to have the NMs serve the logs in the meantime, as I don't think that's there today. That feature is indeed there today. Links to the app logs on the NM will try to serve the local app logs first, then redirect to the log server if the local logs are unavailable. See NMController and ContainerLogsPage. It only becomes an issue when things link to the aggregated log server directly before the NM has finished aggregating them. Aggregated Log Files should be combined --- Key: YARN-2942 URL: https://issues.apache.org/jira/browse/YARN-2942 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: CombinedAggregatedLogsProposal_v3.pdf, CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, CompactedAggregatedLogsProposal_v1.pdf, CompactedAggregatedLogsProposal_v2.pdf, ConcatableAggregatedLogsProposal_v4.pdf, ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, YARN-2942.003.patch Turning on log aggregation allows users to easily store container logs in HDFS and subsequently view them in the YARN web UIs from a central place. Currently, there is a separate log file for each Node Manager. This can be a problem for HDFS if you have a cluster with many nodes as you’ll slowly start accumulating many (possibly small) files per YARN application. The current “solution” for this problem is to configure YARN (actually the JHS) to automatically delete these files after some amount of time. We should improve this by compacting the per-node aggregated log files into one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3624) ApplicationHistoryServer reverses the order of the filters it gets
[ https://issues.apache.org/jira/browse/YARN-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-3624: Target Version/s: 2.7.1 (was: 2.6.1) ApplicationHistoryServer reverses the order of the filters it gets -- Key: YARN-3624 URL: https://issues.apache.org/jira/browse/YARN-3624 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-3624.patch AppliactionHistoryServer should not alter the order in which it gets the filter chain. Additional filters should be added at the end of the chain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540511#comment-14540511 ] Naganarasimha G R commented on YARN-3044: - bq. Sorry to put my comments at last minute. No probs, better late than never :) bq. 1. I incline to only having ContainerEntity, but RM and NM may put different info/event about it based on their knowledge. +1 for different event, this would be sufficient to capture difference in time being displayed when published from RM and NM(earlier we were having different container entity for this reason.). [~djp] please inform if anything else can differ if published from RM and NM which needs to be captured seperately. bq. 2. Should v1 and v2 publisher only differentiate at publishEvent, however, it seems that we duplicate code more than that. And perhaps defining and implementing SystemMetricsEvent.toTimelineEvent can further cleanup the code. May be i did not get this clearly, but AFAIK the packages and classes for the Timeline events entities are different and the way we publish entities is also different, so though the code looks duplicated i think nutting further to be reduced/cleaned up here. bq. 3. I saw v2 is going to send config, but where the config is coming from. Did we conclude who and how to send the config? IAC, sending config seems to be half done. Well i had raised config related queries earlier as it dint get concluded was planning to get it done as part of a new jira, AFAIK intention in ATS is to collect the App side configs than server side ones. And RM will not be aware of App configs, so my initial idea was to support additional interface in the client to publish Application specific configs. Correct me if i am wrong and also inform whether its ok to handle configs in another jira. {quote} And we can use entity.addConfigs(event.getConfig());. No need to iterate over config collection and put each config one-by-one. 4. yarn.system-metrics-publisher.rm.publish.container-metrics - yarn.rm.system-metrics-publisher.emit-container-events? 5. Methods/innner classes in SystemMetricsPublisher don't need to be changed to public. Default is enough to access them? {quote} will get these corrected. bq. Moreover, I also think we should not have yarn.system-metrics-publisher.enabled too, and reuse the existing config. And it's not limited to RM metrics publisher, but all existing ATS service. IMHO, the better practice is to reuse the existing config. And we can have a global config (or env var) timeline-service.version to determine the service is enabled with v1 or v2 implementation. Anyway, it's a separate problem, I'll file a separate jira for it. {{yarn.system-metrics-publisher.rm.publish.container-metrics}} has been additionally added just to ensure container life cycle metrics are not emitted always from RM and only if required we will publishing it. Initially in YARN-3034 [discussions|https://issues.apache.org/jira/browse/YARN-3034?focusedCommentId=14359174page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14359174] we wanted to proceed with having single config like {{yarn.system-metrics-publisher.enabled}} (as existing one {{yarn.resourcemanager.system-metrics-publisher.enabled}} was specific to RM) and have {{yarn.timeline-service.version}} but you had [commented|https://issues.apache.org/jira/browse/YARN-3034?focusedCommentId=14376575page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14376575] to have single config {{yarn.system-metrics-publisher.enabled}} and hence i had remodified. [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Labels: BB2015-05-TBR Attachments: YARN-3044-YARN-2928.004.patch, YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, YARN-3044-YARN-2928.007.patch, YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3634) TestMRTimelineEventHandling and TestApplication are broken
[ https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3634: -- Description: TestMRTimelineEventHandling is broken. Relevant error message: {noformat} 2015-05-12 06:28:56,415 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:57,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:58,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:59,417 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:00,418 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:01,419 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:02,420 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:03,420 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:04,421 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:05,422 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] collector.NodeTimelineCollectorManager (NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with NM Collector Service for application_1431412130291_0001 2015-05-12 06:29:05,425 WARN [AsyncDispatcher event handler] containermanager.AuxServices (AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The auxService name is timeline_collector and it got an error at event: CONTAINER_INIT org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 to asf904.gq1.ygridcore.net:0 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.putIfAbsent(TimelineCollectorManager.java:97) at org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.addApplication(PerNodeTimelineCollectorsAuxService.java:99) at org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.initializeContainer(PerNodeTimelineCollectorsAuxService.java:126) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.handle(AuxServices.java:226) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.handle(AuxServices.java:49) at
[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-2556: --- Attachment: YARN-2556.7.patch Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Chang Li Labels: BB2015-05-TBR Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, YARN-2556.1.patch, YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, YARN-2556.5.patch, YARN-2556.6.patch, YARN-2556.7.patch, YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3635: -- Issue Type: Sub-task (was: Bug) Parent: YARN-1317 Get-queue-mapping should be a common interface of YarnScheduler --- Key: YARN-3635 URL: https://issues.apache.org/jira/browse/YARN-3635 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Wangda Tan Assignee: Wangda Tan Currently, both of fair/capacity scheduler support queue mapping, which makes scheduler can change queue of an application after submitted to scheduler. One issue of doing this in specific scheduler is: If the queue after mapping has different maximum_allocation/default-node-label-expression of the original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks the wrong queue. I propose to make the queue mapping as a common interface of scheduler, and RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2421) CapacityScheduler still allocates containers to an app in the FINISHING state
[ https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-2421: --- Attachment: YARN-2421.6.patch CapacityScheduler still allocates containers to an app in the FINISHING state - Key: YARN-2421 URL: https://issues.apache.org/jira/browse/YARN-2421 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Thomas Graves Assignee: Chang Li Attachments: YARN-2421.4.patch, YARN-2421.5.patch, YARN-2421.6.patch, yarn2421.patch, yarn2421.patch, yarn2421.patch I saw an instance of a bad application master where it unregistered with the RM but then continued to call into allocate. The RMAppAttempt went to the FINISHING state, but the capacity scheduler kept allocating it containers. We should probably have the capacity scheduler check that the application isn't in one of the terminal states before giving it containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540586#comment-14540586 ] Vinod Kumar Vavilapalli commented on YARN-3635: --- +10 for this. In addition to the interfaces themselves, I'd like us to consolidate the concrete mapping-rules that we have in each scheduler. Ideally, we only need one set of rules acceptable by all schedulers. If not that, I'd live with ~80% common rules. BTW, thematically this fits into YARN-1317, making it a sub-task. Get-queue-mapping should be a common interface of YarnScheduler --- Key: YARN-3635 URL: https://issues.apache.org/jira/browse/YARN-3635 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Wangda Tan Assignee: Wangda Tan Currently, both of fair/capacity scheduler support queue mapping, which makes scheduler can change queue of an application after submitted to scheduler. One issue of doing this in specific scheduler is: If the queue after mapping has different maximum_allocation/default-node-label-expression of the original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks the wrong queue. I propose to make the queue mapping as a common interface of scheduler, and RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2369) Environment variable handling assumes values should be appended
[ https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated YARN-2369: -- Attachment: YARN-2369-3.patch [~vinodkv] here's v3 of the patch. I've got a new unit test in this one and I'm using MRJobConfig now for the property (now with a new and improved name). I think I've trimmed down the lines, but if something looks misplaced please let me know. Thanks! Environment variable handling assumes values should be appended --- Key: YARN-2369 URL: https://issues.apache.org/jira/browse/YARN-2369 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jason Lowe Assignee: Dustin Cote Attachments: YARN-2369-1.patch, YARN-2369-2.patch, YARN-2369-3.patch When processing environment variables for a container context the code assumes that the value should be appended to any pre-existing value in the environment. This may be desired behavior for handling path-like environment variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a non-intuitive and harmful way to handle any variable that does not have path-like semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1317) Make Queue, QueueACLs and QueueMetrics first class citizens in YARN
[ https://issues.apache.org/jira/browse/YARN-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540588#comment-14540588 ] Vinod Kumar Vavilapalli commented on YARN-1317: --- Also added YARN-3635 as a sub-task - the goal is to make mapping-rules a first class citizen. Make Queue, QueueACLs and QueueMetrics first class citizens in YARN --- Key: YARN-1317 URL: https://issues.apache.org/jira/browse/YARN-1317 Project: Hadoop YARN Issue Type: Improvement Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Today, we are duplicating the exact same code in all the schedulers. Queue is a top class concept - clientService, web-services etc already recognize queue as a top level concept. We need to move Queue, QueueMetrics and QueueACLs to be top level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540416#comment-14540416 ] Hadoop QA commented on YARN-160: \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 44s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 27s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 31s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | tools/hadoop tests | 14m 47s | Tests passed in hadoop-gridmix. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 6m 3s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 65m 21s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732288/YARN-160.006.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 5c2f05c | | hadoop-gridmix test log | https://builds.apache.org/job/PreCommit-YARN-Build/7893/artifact/patchprocess/testrun_hadoop-gridmix.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7893/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7893/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7893/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7893/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7893/console | This message was automatically generated. nodemanagers should obtain cpu/memory values from underlying OS --- Key: YARN-160 URL: https://issues.apache.org/jira/browse/YARN-160 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Varun Vasudev Labels: BB2015-05-TBR Attachments: YARN-160.005.patch, YARN-160.006.patch, apache-yarn-160.0.patch, apache-yarn-160.1.patch, apache-yarn-160.2.patch, apache-yarn-160.3.patch As mentioned in YARN-2 *NM memory and CPU configs* Currently these values are coming from the config of the NM, we should be able to obtain those values from the OS (ie, in the case of Linux from /proc/meminfo /proc/cpuinfo). As this is highly OS dependent we should have an interface that obtains this information. In addition implementations of this interface should be able to specify a mem/cpu offset (amount of mem/cpu not to be avail as YARN resource), this would allow to reserve mem/cpu for the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-2556: --- Attachment: YARN-2556.6.patch Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Chang Li Labels: BB2015-05-TBR Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, YARN-2556.1.patch, YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, YARN-2556.5.patch, YARN-2556.6.patch, YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540581#comment-14540581 ] Hadoop QA commented on YARN-2556: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 13s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 4m 43s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732342/YARN-2556.6.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / f24452d | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7897/console | This message was automatically generated. Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Chang Li Labels: BB2015-05-TBR Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, YARN-2556.1.patch, YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, YARN-2556.5.patch, YARN-2556.6.patch, YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3634) TestMRTimelineEventHandling and TestApplication are broken
[ https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3634: -- Summary: TestMRTimelineEventHandling and TestApplication are broken (was: TestMRTimelineEventHandling is broken due to timing issues) TestMRTimelineEventHandling and TestApplication are broken -- Key: YARN-3634 URL: https://issues.apache.org/jira/browse/YARN-3634 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3634-YARN-2928.001.patch TestMRTimelineEventHandling is broken. Relevant error message: {noformat} 2015-05-12 06:28:56,415 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:57,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:58,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:59,417 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:00,418 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:01,419 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:02,420 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:03,420 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:04,421 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:05,422 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] collector.NodeTimelineCollectorManager (NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with NM Collector Service for application_1431412130291_0001 2015-05-12 06:29:05,425 WARN [AsyncDispatcher event handler] containermanager.AuxServices (AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The auxService name is timeline_collector and it got an error at event: CONTAINER_INIT org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 to asf904.gq1.ygridcore.net:0 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at
[jira] [Commented] (YARN-3634) TestMRTimelineEventHandling is broken due to timing issues
[ https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540526#comment-14540526 ] Hadoop QA commented on YARN-3634: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 0s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 46s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 32s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 39s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 39s | The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 5m 53s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 43m 42s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-nodemanager | | | Boxing/unboxing to parse a primitive org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(NMTokenIdentifier, ContainerTokenIdentifier, StartContainerRequest) At ContainerManagerImpl.java:org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(NMTokenIdentifier, ContainerTokenIdentifier, StartContainerRequest) At ContainerManagerImpl.java:[line 881] | | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.application.TestApplication | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732309/YARN-3634-YARN-2928.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / b3b791b | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7896/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7896/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7896/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7896/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7896/console | This message was automatically generated. TestMRTimelineEventHandling is broken due to timing issues -- Key: YARN-3634 URL: https://issues.apache.org/jira/browse/YARN-3634 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3634-YARN-2928.001.patch TestMRTimelineEventHandling is broken. Relevant error message: {noformat} 2015-05-12 06:28:56,415 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:57,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:58,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried
[jira] [Commented] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
[ https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540730#comment-14540730 ] Zhijie Shen commented on YARN-3625: --- Hi, Jonathan! Would you mind give me an example to help understanding why the entity exists but the entity's domain is missing? BTW, there's a special logic for LeveldbTimelineStore, where domain is implemented after the first version of store is done. So we need to to be compatible with the existing db data, which doesn't have domain. For RollingLevelDBTimelineStore, this shouldn't be a problem, right? We don't need the special treatment as well as the test case for it. RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put -- Key: YARN-3625 URL: https://issues.apache.org/jira/browse/YARN-3625 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3625.1.patch, YARN-3625.2.patch RollingLevelDBTimelineStore batches all entities in the same put to improve performance. This causes an error when relating to an entity in the same put however. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps
[ https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3505: Attachment: YARN-3505.5.patch new patch addressed all the latest comments Node's Log Aggregation Report with SUCCEED should not cached in RMApps -- Key: YARN-3505 URL: https://issues.apache.org/jira/browse/YARN-3505 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation Affects Versions: 2.8.0 Reporter: Junping Du Assignee: Xuan Gong Priority: Critical Attachments: YARN-3505.1.patch, YARN-3505.2.patch, YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch, YARN-3505.5.patch Per discussions in YARN-1402, we shouldn't cache all node's log aggregation reports in RMApps for always, especially for those finished with SUCCEED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI
[ https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540756#comment-14540756 ] Wangda Tan commented on YARN-3362: -- [~Naganarasimha], thanks for updating, the latest result looks great! About the queue-hierarchy discussion, I think maybe one alternative is, make a hierarchy queue names, but align usage bars, like following: {code} root[-- 100% used] - a [- 60% used] - a1[- 40% used] - a2[---] - b [-] - b1[-] - b11 [--] {code} Which can also help comparing queue's resource but doesn't need a extra button to hide/show queue hierarchy? Add node label usage in RM CapacityScheduler web UI --- Key: YARN-3362 URL: https://issues.apache.org/jira/browse/YARN-3362 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager, webapp Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, 2015.05.10_3362_Queue_Hierarchy.png, 2015.05.12_3362_Queue_Hierarchy.png, CSWithLabelsView.png, No-space-between-Active_user_info-and-next-queues.png, Screen Shot 2015-04-29 at 11.42.17 AM.png, YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, YARN-3362.20150506-1.patch, YARN-3362.20150507-1.patch, YARN-3362.20150510-1.patch, YARN-3362.20150511-1.patch, YARN-3362.20150512-1.patch, capacity-scheduler.xml We don't have node label usage in RM CapacityScheduler web UI now, without this, user will be hard to understand what happened to nodes have labels assign to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540794#comment-14540794 ] Wangda Tan commented on YARN-1197: -- [~mding], Thanks for interesting in this ticket, some comments: bq. For JVM based containers (e.g., container running HBase), it is not possible right now to change the heap size of JVM without restarting the Java process. Even if we can implement a wrapper in the container to relaunch a Java process when resource is changed for a container, we still need to implement an interface between node manager and container to trigger the relaunch action. Good point, this is one thing we noted as well. I don't think there's any easy solution to shrink JVM. Relaunch the container could be one method, but it will be hard to make a generic container wrapper since kill and relaunch will make data in memory lost. But since the shrink memory is a proactive action, when a process wants to shrink its resource, it can use its own container wrapper to relaunch the process if it has some data recovery mechanism. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: mapreduce-project.patch.ver.1, tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
[ https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540808#comment-14540808 ] Jonathan Eagles commented on YARN-3625: --- [~zjshen], one difference between Leveldb and RollingLevelDB is in the way that batch writes are done. In LeveldbTimelineStore, each entity is processed and written to the db before the next. In RollingLevelDBTimelineStore, all entities in the put are processed one after then next, however they are written all together in one batch. This has created a temporary inconsistency for RollingLevelDB where related entities in the same put have the start time in the db, but nothing else until the last entity in the put is processed. To handle this scenario, I relax the domain checking to mean if a domain is non-existent we can treat this as if we are in the temporary state we have staged the domain to be written for the related entity but have not yet written it to the database. Jon RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put -- Key: YARN-3625 URL: https://issues.apache.org/jira/browse/YARN-3625 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3625.1.patch, YARN-3625.2.patch RollingLevelDBTimelineStore batches all entities in the same put to improve performance. This causes an error when relating to an entity in the same put however. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2369) Environment variable handling assumes values should be appended
[ https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540781#comment-14540781 ] Hadoop QA commented on YARN-2369: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 3s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 9m 0s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 17s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 27s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 18s | The applied patch generated 7 new checkstyle issues (total was 176, now 182). | | {color:red}-1{color} | checkstyle | 3m 56s | The applied patch generated 15 new checkstyle issues (total was 509, now 524). | | {color:red}-1{color} | checkstyle | 4m 32s | The applied patch generated 2 new checkstyle issues (total was 211, now 213). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 14 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 58s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 8m 2s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | common tests | 24m 45s | Tests passed in hadoop-common. | | {color:green}+1{color} | mapreduce tests | 0m 46s | Tests passed in hadoop-mapreduce-client-common. | | {color:green}+1{color} | mapreduce tests | 1m 36s | Tests passed in hadoop-mapreduce-client-core. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | | | 83m 46s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732347/YARN-2369-3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f24452d | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/diffcheckstylehadoop-common.txt https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-core.txt https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-mapreduce-client-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/testrun_hadoop-mapreduce-client-common.txt | | hadoop-mapreduce-client-core test log | https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7898/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7898/console | This message was automatically generated. Environment variable handling assumes values should be appended --- Key: YARN-2369 URL: https://issues.apache.org/jira/browse/YARN-2369 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jason Lowe Assignee: Dustin Cote Attachments: YARN-2369-1.patch, YARN-2369-2.patch, YARN-2369-3.patch When processing environment variables for a container context the code assumes that the value should be appended to any pre-existing value in the environment. This may be desired behavior for handling path-like environment variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a non-intuitive and harmful way to handle any variable that does not have path-like semantics.
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540783#comment-14540783 ] Zhijie Shen commented on YARN-3044: --- bq. Well i had raised config related queries earlier as it dint get concluded was planning to get it done as part of a new jira, AFAIK intention in ATS is to collect the App side configs than server side ones. And RM will not be aware of App configs, so my initial idea was to support additional interface in the client to publish Application specific configs. Correct me if i am wrong and also inform whether its ok to handle configs in another jira. So can we undo the code change related to config for this jira? bq. May be i did not get this clearly, but AFAIK the packages and classes for the Timeline events entities are different and the way we publish entities is also different, so though the code looks duplicated i think nutting further to be reduced/cleaned up here. In general, I'm suggesting the code style in MAPREDUCE-6335, which seems to be more clear. However, I'm okay to keep to the current code if it's complex to refactor it. bq. but you had commented to have single config yarn.system-metrics-publisher.enabled and hence i had remodified. Right, but it turns out to be that RM and NM seem not to be able to uniform the configs (such as the new config added here) between them, and more than one feature require somewhat version flag to differentiate the behavior. Never mind. I'll take care of it separately. [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Labels: BB2015-05-TBR Attachments: YARN-3044-YARN-2928.004.patch, YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, YARN-3044-YARN-2928.007.patch, YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3557) Support Intel Trusted Execution Technology(TXT) in YARN scheduler
[ https://issues.apache.org/jira/browse/YARN-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540800#comment-14540800 ] Wangda Tan commented on YARN-3557: -- [~dian.fu], Thanks for sharing your idea about it, it is definitely an interesting idea, but I don't have immediate feeling about should we do it or not. We can continue discuss it along with design of YARN-3409. Support Intel Trusted Execution Technology(TXT) in YARN scheduler - Key: YARN-3557 URL: https://issues.apache.org/jira/browse/YARN-3557 Project: Hadoop YARN Issue Type: New Feature Reporter: Dian Fu Attachments: Support TXT in YARN high level design doc.pdf Intel TXT defines platform-level enhancements that provide the building blocks for creating trusted platforms. A TXT aware YARN scheduler can schedule security sensitive jobs on TXT enabled nodes only. YARN-2492 provides the capacity to restrict YARN applications to run only on cluster nodes that have a specified node label. This is a good mechanism that be utilized for TXT aware YARN scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3634) TestMRTimelineEventHandling and TestApplication are broken
[ https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540693#comment-14540693 ] Hadoop QA commented on YARN-3634: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 27s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 54s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 34s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 42s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 39s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 37s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 6m 2s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 44m 33s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732348/YARN-3634-YARN-2928.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / b3b791b | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7899/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7899/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7899/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7899/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7899/console | This message was automatically generated. TestMRTimelineEventHandling and TestApplication are broken -- Key: YARN-3634 URL: https://issues.apache.org/jira/browse/YARN-3634 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3634-YARN-2928.001.patch, YARN-3634-YARN-2928.002.patch TestMRTimelineEventHandling is broken. Relevant error message: {noformat} 2015-05-12 06:28:56,415 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:57,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:58,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:59,417 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:00,418 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12
[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540715#comment-14540715 ] zhihai xu commented on YARN-3591: - Hi [~lavkesh], thanks for working on this issue. It looks like a good catch. The parent directory is generated by {{uniqueNumberGenerator}} for each LocalizedResource, so most likely fileList.length will be one. Some comments about your patch: {{getParentFile}} may return null, Should we check whether it is null to avoid NPE? Can we add comments in the code about the change? Resource Localisation on a bad disk causes subsequent containers failure - Key: YARN-3591 URL: https://issues.apache.org/jira/browse/YARN-3591 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Lavkesh Lahngir Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch It happens when a resource is localised on the disk, after localising that disk has gone bad. NM keeps paths for localised resources in memory. At the time of resource request isResourcePresent(rsrc) will be called which calls file.exists() on the localised path. In some cases when disk has gone bad, inodes are stilled cached and file.exists() returns true. But at the time of reading, file will not open. Note: file.exists() actually calls stat64 natively which returns true because it was able to find inode information from the OS. A proposal is to call file.list() on the parent path of the resource, which will call open() natively. If the disk is good it should return an array of paths with length at-least 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3634) TestMRTimelineEventHandling and TestApplication are broken
[ https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3634: -- Attachment: YARN-3634-YARN-2928.003.patch Patch v3. posted - fixed whitespace TestMRTimelineEventHandling and TestApplication are broken -- Key: YARN-3634 URL: https://issues.apache.org/jira/browse/YARN-3634 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3634-YARN-2928.001.patch, YARN-3634-YARN-2928.002.patch, YARN-3634-YARN-2928.003.patch TestMRTimelineEventHandling is broken. Relevant error message: {noformat} 2015-05-12 06:28:56,415 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:57,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:58,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:59,417 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:00,418 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:01,419 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:02,420 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:03,420 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:04,421 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:05,422 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] collector.NodeTimelineCollectorManager (NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with NM Collector Service for application_1431412130291_0001 2015-05-12 06:29:05,425 WARN [AsyncDispatcher event handler] containermanager.AuxServices (AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The auxService name is timeline_collector and it got an error at event: CONTAINER_INIT org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 to asf904.gq1.ygridcore.net:0 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at
[jira] [Commented] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540633#comment-14540633 ] Zhijie Shen commented on YARN-3529: --- +1 for the last patch. Will commit it. Noticed there's performance related properties used for test case. We should evaluate if they could help POC perf too. We can deal with it later. {code} 120 props.put(QueryServices.QUEUE_SIZE_ATTRIB, Integer.toString(5000)); 121 props.put(IndexWriterUtils.HTABLE_THREAD_KEY, Integer.toString(100)); 122 // Make a small batch size to test multiple calls to reserve sequences 123 props.put(QueryServices.SEQUENCE_CACHE_SIZE_ATTRIB, 124 Long.toString(BATCH_SIZE)); {code} Add miniHBase cluster and Phoenix support to ATS v2 unit tests -- Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: AbstractMiniHBaseClusterTest.java, YARN-3529-YARN-2928.000.patch, YARN-3529-YARN-2928.001.patch, YARN-3529-YARN-2928.002.patch, YARN-3529-YARN-2928.003.patch, output_minicluster2.txt After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3636) Abstraction for LocalDirAllocator
[ https://issues.apache.org/jira/browse/YARN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe moved HADOOP-11905 to YARN-3636: - Component/s: (was: fs) Fix Version/s: (was: 2.7.1) Assignee: (was: Kannan Rajah) Affects Version/s: (was: 2.5.2) 2.5.2 Issue Type: New Feature (was: Bug) Key: YARN-3636 (was: HADOOP-11905) Project: Hadoop YARN (was: Hadoop Common) Abstraction for LocalDirAllocator - Key: YARN-3636 URL: https://issues.apache.org/jira/browse/YARN-3636 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.5.2 Reporter: Kannan Rajah Labels: BB2015-05-TBR Attachments: 0001-Abstraction-for-local-disk-path-allocation.patch There are 2 abstractions used to write data to local disk. LocalDirAllocator: Allocate paths from a set of configured local directories. LocalFileSystem/RawLocalFileSystem: Read/write using java.io.* and java.nio.* In the current implementation, local disk is managed by guest OS and not HDFS. The proposal is to provide a new abstraction that encapsulates the above 2 abstractions and hides who manages the local disks. This enables us to provide an alternate implementation where a DFS can manage the local disks and it can be accessed using HDFS APIs. This means the DFS maintains a namespace for node local directories and can create paths that are guaranteed to be present on a specific node. Here is an example use case for Shuffle: When a mapper writes intermediate data using this new implementation, it will continue write to local disk. When a reducer needs to access data from a remote node, it can use HDFS APIs with a path that points to that node’s local namespace instead of having to use HTTP server to transfer the data across nodes. New Abstractions 1. LocalDiskPathAllocator Interface to get file/directory paths from the local disk namespace. This contains all the APIs that are currently supported by LocalDirAllocator. So we just need to change LocalDirAllocator to implement this new interface. 2. LocalDiskUtil Helper class to get a handle to LocalDiskPathAllocator and the FileSystem that is used to manage those paths. By default, it will return LocalDirAllocator and LocalFileSystem. A supporting DFS can return DFSLocalDirAllocator and an instance of DFS. 3. DFSLocalDirAllocator This is a generic implementation. An allocator is created for a specific node. It uses Configuration object to get user configured base directory and appends the node hostname to it. Hence the returned paths are within the node local namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2421) CapacityScheduler still allocates containers to an app in the FINISHING state
[ https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540829#comment-14540829 ] Hadoop QA commented on YARN-2421: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 38s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 46s | The applied patch generated 2 new checkstyle issues (total was 30, now 31). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 14s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 52m 11s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 88m 37s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732346/YARN-2421.6.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f24452d | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7900/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7900/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7900/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7900/console | This message was automatically generated. CapacityScheduler still allocates containers to an app in the FINISHING state - Key: YARN-2421 URL: https://issues.apache.org/jira/browse/YARN-2421 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Thomas Graves Assignee: Chang Li Attachments: YARN-2421.4.patch, YARN-2421.5.patch, YARN-2421.6.patch, yarn2421.patch, yarn2421.patch, yarn2421.patch I saw an instance of a bad application master where it unregistered with the RM but then continued to call into allocate. The RMAppAttempt went to the FINISHING state, but the capacity scheduler kept allocating it containers. We should probably have the capacity scheduler check that the application isn't in one of the terminal states before giving it containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3637) Handle localization sym-linking correctly at the YARN level
Chris Trezzo created YARN-3637: -- Summary: Handle localization sym-linking correctly at the YARN level Key: YARN-3637 URL: https://issues.apache.org/jira/browse/YARN-3637 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo The shared cache needs to handle resource sym-linking at the YARN layer. Currently, we let the application layer (i.e. mapreduce) handle this, but it is probably better for all applications if it is handled transparently. Here is the scenario: Imagine two separate jars (with unique checksums) that have the same name job.jar. They are stored in the shared cache as two separate resources: checksum1/job.jar checksum2/job.jar A new application tries to use both of these resources, but internally refers to them as different names: foo.jar maps to checksum1 bar.jar maps to checksum2 When the shared cache returns the path to the resources, both resources are named the same (i.e. job.jar). Because of this, when the resources are localized one of them clobbers the other. This is because both symlinks in the container_id directory are the same name (i.e. job.jar) even though they point to two separate resource directories. Originally we tackled this in the MapReduce client by using the fragment portion of the resource url. This, however, seems like something that should be solved at the YARN layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540844#comment-14540844 ] Li Lu commented on YARN-3529: - Thanks [~zjshen] for the review and commit! [~vrushalic] if you observe any problems with the new pom settings, please feel free to reopen it. Thanks! Add miniHBase cluster and Phoenix support to ATS v2 unit tests -- Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Fix For: YARN-2928 Attachments: AbstractMiniHBaseClusterTest.java, YARN-3529-YARN-2928.000.patch, YARN-3529-YARN-2928.001.patch, YARN-3529-YARN-2928.002.patch, YARN-3529-YARN-2928.003.patch, output_minicluster2.txt After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2421) CapacityScheduler still allocates containers to an app in the FINISHING state
[ https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-2421: --- Attachment: YARN-2421.7.patch CapacityScheduler still allocates containers to an app in the FINISHING state - Key: YARN-2421 URL: https://issues.apache.org/jira/browse/YARN-2421 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Thomas Graves Assignee: Chang Li Attachments: YARN-2421.4.patch, YARN-2421.5.patch, YARN-2421.6.patch, YARN-2421.7.patch, yarn2421.patch, yarn2421.patch, yarn2421.patch I saw an instance of a bad application master where it unregistered with the RM but then continued to call into allocate. The RMAppAttempt went to the FINISHING state, but the capacity scheduler kept allocating it containers. We should probably have the capacity scheduler check that the application isn't in one of the terminal states before giving it containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3634) TestMRTimelineEventHandling is broken due to timing issues
[ https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540328#comment-14540328 ] Sangjin Lee commented on YARN-3634: --- It is not so much caused by YARN-3562 as uncovered by it. First, the NMCollectorService does not update the config upon binding to the port. Second, the NodeTimelineCollectorManager reads the NMCollectorService too early (serviceInit) so that it does not get the updated address. The fix is to update the config when NMCollectorService binds to a port and initialize the address as late as possible in the NodeTimelineCollectorManager. TestMRTimelineEventHandling is broken due to timing issues -- Key: YARN-3634 URL: https://issues.apache.org/jira/browse/YARN-3634 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee TestMRTimelineEventHandling is broken. Relevant error message: {noformat} 2015-05-12 06:28:56,415 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:57,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:58,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:59,417 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:00,418 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:01,419 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:02,420 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:03,420 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:04,421 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:05,422 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] collector.NodeTimelineCollectorManager (NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with NM Collector Service for application_1431412130291_0001 2015-05-12 06:29:05,425 WARN [AsyncDispatcher event handler] containermanager.AuxServices (AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The auxService name is timeline_collector and it got an error at event: CONTAINER_INIT org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
[jira] [Updated] (YARN-3634) TestMRTimelineEventHandling is broken due to timing issues
[ https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3634: -- Attachment: YARN-3634-YARN-2928.001.patch Patch v.1 posted. TestMRTimelineEventHandling is broken due to timing issues -- Key: YARN-3634 URL: https://issues.apache.org/jira/browse/YARN-3634 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3634-YARN-2928.001.patch TestMRTimelineEventHandling is broken. Relevant error message: {noformat} 2015-05-12 06:28:56,415 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:57,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:58,416 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:28:59,417 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:00,418 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:01,419 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:02,420 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:03,420 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:04,421 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:05,422 INFO [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] collector.NodeTimelineCollectorManager (NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with NM Collector Service for application_1431412130291_0001 2015-05-12 06:29:05,425 WARN [AsyncDispatcher event handler] containermanager.AuxServices (AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The auxService name is timeline_collector and it got an error at event: CONTAINER_INIT org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 to asf904.gq1.ygridcore.net:0 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at
[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540393#comment-14540393 ] Karthik Kambatla commented on YARN-3635: I am in favor of doing this, but would like to be extra careful. Can we make sure either [~sandyr] or I get to review this before it is committed. Get-queue-mapping should be a common interface of YarnScheduler --- Key: YARN-3635 URL: https://issues.apache.org/jira/browse/YARN-3635 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Wangda Tan Assignee: Wangda Tan Currently, both of fair/capacity scheduler support queue mapping, which makes scheduler can change queue of an application after submitted to scheduler. One issue of doing this in specific scheduler is: If the queue after mapping has different maximum_allocation/default-node-label-expression of the original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks the wrong queue. I propose to make the queue mapping as a common interface of scheduler, and RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541154#comment-14541154 ] Karthik Kambatla commented on YARN-1197: bq. We thought about launching the JVM based container with -Xmx set to the physical memory of the node, and use cgroup memory control to enforce the resource limit, but we don't think LCE supports memory isolation right now . We cannot use YARN's default memory enforcement as we don't want long running services to be killed. A JVM with a larger value for Xmx will *likely* be less aggressive with GC. Any resultant increase in heap size might or might not be a good thing. If you think this is something viable that people care about, we could consider adding a memory-enforcement option to LCE. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: mapreduce-project.patch.ver.1, tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3627) Preemption not triggered in Fair scheduler when maxResources is set on parent queue
[ https://issues.apache.org/jira/browse/YARN-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541201#comment-14541201 ] Bibin A Chundatt commented on YARN-3627: [~kasha] seems related to YARN-3405 .Will try the patch soon. Would be great if YARN-3405 gets resolved. Preemption not triggered in Fair scheduler when maxResources is set on parent queue --- Key: YARN-3627 URL: https://issues.apache.org/jira/browse/YARN-3627 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler, scheduler Environment: Suse 11 SP3, 2 NM Reporter: Bibin A Chundatt Consider the below scenario of fair configuration Root (10Gb cluster resource) --Q1 (maxResources 4gb) Q1.1 (maxResources 4gb) Q1.2 (maxResources 4gb) --Q2 (maxResources 6GB) No applications are running in Q2 Submit one application with to Q1.1 with 50 maps 4Gb gets allocated to Q1.1 Now submit application to Q1.2 the same will be starving for memory always. Preemption will never get triggered since yarn.scheduler.fair.preemption.cluster-utilization-threshold =.8 and the cluster utilization is below .8. *Fairscheduler.java* {code} private boolean shouldAttemptPreemption() { if (preemptionEnabled) { return (preemptionUtilizationThreshold Math.max( (float) rootMetrics.getAllocatedMB() / clusterResource.getMemory(), (float) rootMetrics.getAllocatedVirtualCores() / clusterResource.getVirtualCores())); } return false; } {code} Are we supposed to configure in running cluster maxResources 0mb and 0 cores so that all queues can take full cluster resources always if available?? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3127) Apphistory url crashes when RM switches with ATS enabled
[ https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3127: --- Priority: Critical (was: Major) Apphistory url crashes when RM switches with ATS enabled Key: YARN-3127 URL: https://issues.apache.org/jira/browse/YARN-3127 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: RM HA with ATS Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Priority: Critical Attachments: YARN-3127.20150213-1.patch, YARN-3127.20150329-1.patch 1.Start RM with HA and ATS configured and run some yarn applications 2.Once applications are finished sucessfully start timeline server 3.Now failover HA form active to standby 4.Access timeline server URL IP:PORT/applicationhistory Result: Application history URL fails with below info {quote} 2015-02-03 20:28:09,511 ERROR org.apache.hadoop.yarn.webapp.View: Failed to read the applications. java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643) at org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:80) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) ... Caused by: org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: The entity for application attempt appattempt_1422972608379_0001_01 doesn't exist in the timeline store at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplicationAttempt(ApplicationHistoryManagerOnTimelineStore.java:151) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.generateApplicationReport(ApplicationHistoryManagerOnTimelineStore.java:499) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAllApplications(ApplicationHistoryManagerOnTimelineStore.java:108) at org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:84) at org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:81) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) ... 51 more 2015-02-03 20:28:09,512 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /applicationhistory org.apache.hadoop.yarn.webapp.WebAppException: Error rendering block: nestLevel=6 expected 5 at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) {quote} Behaviour with AHS with file based history store -Apphistory url is working -No attempt entries are shown for each application. Based on inital analysis when RM switches ,application attempts from state store are not replayed but only applications are. So when /applicaitonhistory url is accessed it tries for all attempt id and fails -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3628) ContainerMetrics should support always-flush mode.
[ https://issues.apache.org/jira/browse/YARN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541212#comment-14541212 ] Hadoop QA commented on YARN-3628: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 37s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 23s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 6m 1s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 44m 49s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732440/YARN-3628.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f24452d | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7907/artifact/patchprocess/whitespace.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7907/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7907/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7907/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7907/console | This message was automatically generated. ContainerMetrics should support always-flush mode. -- Key: YARN-3628 URL: https://issues.apache.org/jira/browse/YARN-3628 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3628.000.patch ContainerMetrics should support always-flush mode. It will be good to set ContainerMetrics as always-flush mode if yarn.nodemanager.container-metrics.period-ms is configured as 0. Currently both 0 and -1 mean flush on completion. Also the current default value for yarn.nodemanager.container-metrics.period-ms is -1 and the default value for yarn.nodemanager.container-metrics.enable is true. So the empty content is shown for the active container metrics until it is finished. The default value for yarn.nodemanager.container-metrics.period-ms should not be -1. flushOnPeriod is always false if flushPeriodMs is -1, the content will only be shown when the container is finished. {code} if (finished || flushOnPeriod) { registry.snapshot(collector.addRecord(registry.info()), all); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3629) NodeID is always printed as null in node manager initialization log.
[ https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541225#comment-14541225 ] nijel commented on YARN-3629: - Thanks [~devaraj.k] for the reviewing and committing the patch NodeID is always printed as null in node manager initialization log. -- Key: YARN-3629 URL: https://issues.apache.org/jira/browse/YARN-3629 Project: Hadoop YARN Issue Type: Bug Reporter: nijel Assignee: nijel Fix For: 2.8.0 Attachments: YARN-3629-1.patch In Node manager log during startup the following logs is printed 2015-05-12 11:20:02,347 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized nodemanager for *null* : physical-memory=4096 virtual-memory=8602 virtual-cores=8 This line is printed from NodeStatusUpdaterImpl.serviceInit. But the nodeid assignment is happening only in NodeStatusUpdaterImpl.serviceStart {code} protected void serviceStart() throws Exception { // NodeManager is the last service to start, so NodeId is available. this.nodeId = this.context.getNodeId(); {code} Assigning the node id in serviceinit is not feasible since it is generated by ContainerManagerImpl.serviceStart. The log can be moved to service start to give right information to user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541233#comment-14541233 ] MENG DING commented on YARN-1197: - Thanks guys for the comments. Yes, I believe memory enforcement option to LCE is definitely a desirable feature and the proper way to handle memory enforcement for long running services. Looks like YARN-2793 is related, and YARN-3 already had a patch for this? Then we also need the capability to dynamically update cgroup that a process is run under, which I believe is not supported today either, right? Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: mapreduce-project.patch.ver.1, tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)