[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654854#comment-14654854 ] Hadoop QA commented on YARN-221: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 21m 44s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:red}-1{color} | javac | 7m 43s | The applied patch generated 1 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 45s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 19s | The applied patch generated 2 new checkstyle issues (total was 212, now 213). | | {color:red}-1{color} | whitespace | 1m 46s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 21s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 7m 31s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 22m 21s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 0m 21s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 54s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 7m 23s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 52m 27s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 139m 15s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748767/YARN-221-7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d540374 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/8770/artifact/patchprocess/diffJavacWarnings.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8770/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8770/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8770/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8770/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8770/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8770/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8770/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8770/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8770/console | This message was automatically generated. > NM should provide a way for AM to tell it not to aggregate logs. > > > Key: YARN-221 > URL: https://issues.apache.org/jira/browse/YARN-221 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation, nodemanager >Reporter: Robert Joseph Evans >Assignee: Ming Ma > Attachments: YARN-221-6.patch, YARN-221-7.patch, > YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, > YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch > > > The NodeManager should provide a way for an AM to tell it that either the > logs should not be aggregated, that they should be aggregated with a high > priority, or that they should be aggregated but with a lower priority. The > AM should be able to do this in the ContainerLaunch context to provide a > default value, but should also be able to update the value when the container > is released. > This would allow for the NM to not aggregate logs in some cases, and avoid > connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4015) Is there any way to dynamically change container size after allocation.
[ https://issues.apache.org/jira/browse/YARN-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S resolved YARN-4015. - Resolution: Invalid Hi [~dhruv007] If any queries, post in hadoop user mailing list u...@hadoop.apache.org. JIRA is for tracking development issues. > Is there any way to dynamically change container size after allocation. > --- > > Key: YARN-4015 > URL: https://issues.apache.org/jira/browse/YARN-4015 > Project: Hadoop YARN > Issue Type: Wish >Reporter: dhruv >Priority: Minor > > Hadoop yarn assumes that container size won't be changed after allocation. > It is possible that job do not use resource allocated fully or required more > resource for container.so is there any way so that container size change > according to run time after allocation of container.Means elasticity for both > memory and cpu. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4019) Add JvmPauseMonitor to ResourceManager and NodeManager
[ https://issues.apache.org/jira/browse/YARN-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654824#comment-14654824 ] Hadoop QA commented on YARN-4019: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 21m 7s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 9m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 11s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 49s | The applied patch generated 1 new checkstyle issues (total was 50, now 50). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 48s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 43s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 58s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 7m 8s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 53m 52s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 110m 42s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748757/YARN-4019.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d540374 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8768/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8768/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8768/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8768/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8768/console | This message was automatically generated. > Add JvmPauseMonitor to ResourceManager and NodeManager > -- > > Key: YARN-4019 > URL: https://issues.apache.org/jira/browse/YARN-4019 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-4019.001.patch, YARN-4019.002.patch > > > We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the ResourceManager > and NodeManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654813#comment-14654813 ] tangjunjie commented on YARN-2902: -- I encounter this issue。 Enviroment: hadoop 2.3.0-cdh5.0.2 {noformat} 2015-08-05 12:11:18,275 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Attempt to remove resource: { { hdfs:///user/xxx/.staging/job_1433219182593_445774/libjars/zookeeper-3.4.5-cdh5.0.2.jar, 1436271746726, FILE, null },pending,[],1624036313203834,DOWNLOADING} with non-zero refcount /var/log/hadoop-yarn/hadoop-cmf-yarn-NODEMANAG {noformat} > Killing a container that is localizing can orphan resources in the > DOWNLOADING state > > > Key: YARN-2902 > URL: https://issues.apache.org/jira/browse/YARN-2902 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Attachments: YARN-2902.002.patch, YARN-2902.03.patch, > YARN-2902.04.patch, YARN-2902.patch > > > If a container is in the process of localizing when it is stopped/killed then > resources are left in the DOWNLOADING state. If no other container comes > along and requests these resources they linger around with no reference > counts but aren't cleaned up during normal cache cleanup scans since it will > never delete resources in the DOWNLOADING state even if their reference count > is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
[ https://issues.apache.org/jira/browse/YARN-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654769#comment-14654769 ] Rohith Sharma K S commented on YARN-3992: - The patch looks overall good, nit: Can you add new API with additional parameter for host instead of changing existing API {{allocateAndWaitForContainers}} arguments? > TestApplicationPriority.testApplicationPriorityAllocation fails intermittently > -- > > Key: YARN-3992 > URL: https://issues.apache.org/jira/browse/YARN-3992 > Project: Hadoop YARN > Issue Type: Test >Reporter: Zhijie Shen >Assignee: Sunil G > Attachments: 0001-YARN-3992.patch, 0002-YARN-3992.patch > > > {code} > java.lang.AssertionError: expected:<7> but was:<5> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654744#comment-14654744 ] Hadoop QA commented on YARN-3045: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748765/YARN-3045-YARN-2928.007.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / bf65663 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8769/console | This message was automatically generated. > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, > YARN-3045-YARN-2928.007.patch, YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654737#comment-14654737 ] Hadoop QA commented on YARN-3049: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 34s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 0s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 58s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 19s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 9s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 26s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 22s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 24s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 43m 44s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748758/YARN-3049-YARN-2928.6.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / bf65663 | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8767/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8767/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8767/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8767/console | This message was automatically generated. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654724#comment-14654724 ] Naganarasimha G R commented on YARN-3045: - hi [~djp], Thanks for the feedback. bq. Ok. Let's get basic things in first then discuss/work on other details later if that move work quickly. Have uploaded a WIP patch, can you take a look at it bq. What's the concern to have NodeManagerEntity? Without this, how could we store something like NM's configuration? Well here what Sangjin's & as well my thoughts are if are going to query NM level Application events as part of application then it should be under ApplicationEntity and if for other scenarios we req then we can have NodeManagerEntity, i think. bq. If collector has no add-on knowledge against client, it could be simpler to pass down sync/async() from client to sync/async in writer. Did you mean by {{sync/async in writer}} as {{putEntities & Flush / putEntities}} respectively on writer ? or as [~sjlee0] mentioned 2*2 matrix ? bq. YARN-3367 already track this. It is painful for the client to wrap putEntity() with thread-pool or dispatcher to achieve non-blocking way. I was under the impression that YARN-3367 is only for invoking REST calls in nonblocking way and thus avoiding threads in the clients. Is it also related to flush when called only {{putEntities}} and not on {{putEntitiesAsync}}? I see currently "async" parameter as part of REST request is ignored now, so i thought based on this param we may need to further flush the writer or is your thoughts similar to support 2*2 matrix as Sangjin was informing ? > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, > YARN-3045-YARN-2928.007.patch, YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated YARN-221: - Attachment: YARN-221-7.patch Thanks [~xgong]! Here is the updated patch with your suggestions. {{ContainerLogAggregationPolicy}} is changed to use {{ContainerTokenIdentifier}} so that the policy can get the {{ContainerType}} of the container. > NM should provide a way for AM to tell it not to aggregate logs. > > > Key: YARN-221 > URL: https://issues.apache.org/jira/browse/YARN-221 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation, nodemanager >Reporter: Robert Joseph Evans >Assignee: Ming Ma > Attachments: YARN-221-6.patch, YARN-221-7.patch, > YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, > YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch > > > The NodeManager should provide a way for an AM to tell it that either the > logs should not be aggregated, that they should be aggregated with a high > priority, or that they should be aggregated but with a lower priority. The > AM should be able to do this in the ContainerLaunch context to provide a > default value, but should also be able to update the value when the container > is released. > This would allow for the NM to not aggregate logs in some cases, and avoid > connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3045: Attachment: YARN-3045-YARN-2928.007.patch > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, > YARN-3045-YARN-2928.007.patch, YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3045: Attachment: (was: YARN-3045-YARN-2928.007.patch) > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, > YARN-3045-YARN-2928.007.patch, YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654644#comment-14654644 ] Naganarasimha G R commented on YARN-3045: - Based on the feed back may be can remove some classes (have kept it as some of the events cannot be captured through state machine transitions on events like container metrics)... also was wondering for localization should we need to capture ResourceLocalizationService events or {{LocalizedResource}} state machine's events which is for each resource which is getting localized. Currently in the patch have wrapped ResourceLocalizationService and capturing its events, but if we need to capture events of LocalizedResource, then need to modify for it. > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, > YARN-3045-YARN-2928.007.patch, YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3045: Attachment: YARN-3045-YARN-2928.007.patch Uploading WIP patch based on approach which was discussed earlier(Approach 2), test cases are passing for container events publishing. Once Application and localization events handling are finalized i can further make progress on it. > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, > YARN-3045-YARN-2928.007.patch, YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3049: -- Attachment: YARN-3049-YARN-2928.6.patch Upload a new patch which makes HBase backend to make the decision locally. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4019) Add JvmPauseMonitor to ResourceManager and NodeManager
[ https://issues.apache.org/jira/browse/YARN-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654612#comment-14654612 ] Robert Kanter commented on YARN-4019: - Test failure also appears to be unrelated. > Add JvmPauseMonitor to ResourceManager and NodeManager > -- > > Key: YARN-4019 > URL: https://issues.apache.org/jira/browse/YARN-4019 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-4019.001.patch, YARN-4019.002.patch > > > We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the ResourceManager > and NodeManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4019) Add JvmPauseMonitor to ResourceManager and NodeManager
[ https://issues.apache.org/jira/browse/YARN-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-4019: Attachment: YARN-4019.002.patch The 002 patch fixes the NM checkstyle warning and the trailing whitespace. The RM checkstyle warning is due to a method that is "too long", but it's the {{serviceInit}} method, which is setting things up, so I think we can ignore that. I verified in a cluster that the metrics and log messages that come from the {{JvmPauseMonitor}} show up. > Add JvmPauseMonitor to ResourceManager and NodeManager > -- > > Key: YARN-4019 > URL: https://issues.apache.org/jira/browse/YARN-4019 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-4019.001.patch, YARN-4019.002.patch > > > We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the ResourceManager > and NodeManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers
[ https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3920: Attachment: yARN-3920.002.patch Updated patch to use the right resourceCalculator and updated unit test to verify CPU as dominant resource and the threshold > FairScheduler Reserving a node for a container should be configurable to > allow it used only for large containers > > > Key: YARN-3920 > URL: https://issues.apache.org/jira/browse/YARN-3920 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: yARN-3920.001.patch, yARN-3920.002.patch > > > Reserving a node for a container was designed for preventing large containers > from starvation from small requests that keep getting into a node. Today we > let this be used even for a small container request. This has a huge impact > on scheduling since we block other scheduling requests until that reservation > is fulfilled. We should make this configurable so its impact can be minimized > by limiting it for large container requests as originally intended. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4019) Add JvmPauseMonitor to ResourceManager and NodeManager
[ https://issues.apache.org/jira/browse/YARN-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654590#comment-14654590 ] Hadoop QA commented on YARN-4019: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 8s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 57s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 27s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 27s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 6s | The applied patch generated 1 new checkstyle issues (total was 49, now 50). | | {color:red}-1{color} | checkstyle | 1m 35s | The applied patch generated 1 new checkstyle issues (total was 50, now 50). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 24s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 45s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 6m 13s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 53m 12s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 102m 45s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.TestDeletionService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748746/YARN-4019.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c95993c | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8766/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt https://builds.apache.org/job/PreCommit-YARN-Build/8766/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8766/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8766/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8766/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8766/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8766/console | This message was automatically generated. > Add JvmPauseMonitor to ResourceManager and NodeManager > -- > > Key: YARN-4019 > URL: https://issues.apache.org/jira/browse/YARN-4019 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-4019.001.patch > > > We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the ResourceManager > and NodeManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3736) Add RMStateStore apis to store and load accepted reservations for failover
[ https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654559#comment-14654559 ] Hadoop QA commented on YARN-3736: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 14s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 23s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 1m 44s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 25s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 52m 54s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 91m 35s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748742/YARN-3736.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c95993c | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8765/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8765/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8765/console | This message was automatically generated. > Add RMStateStore apis to store and load accepted reservations for failover > -- > > Key: YARN-3736 > URL: https://issues.apache.org/jira/browse/YARN-3736 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Subru Krishnan >Assignee: Anubhav Dhoot > Attachments: YARN-3736.001.patch, YARN-3736.001.patch, > YARN-3736.002.patch, YARN-3736.003.patch, YARN-3736.004.patch, > YARN-3736.005.patch > > > We need to persist the current state of the plan, i.e. the accepted > ReservationAllocations & corresponding RLESpareseResourceAllocations to the > RMStateStore so that we can recover them on RM failover. This involves making > all the reservation system data structures protobuf friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers
[ https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654511#comment-14654511 ] Anubhav Dhoot commented on YARN-3920: - I am trying to have a config that works out of the box via the default configuration and needs tweaking only if required. Having an absolute value will require it to be resized for every cluster setup based on its specific size. The other option is a multiple of the increment allocation thats used by FS. We could a pick a default ratio of 2 as default. The problem is if you get that too high (such that it exceeds maximum resource allocation) one can accidentally disable reservation. > FairScheduler Reserving a node for a container should be configurable to > allow it used only for large containers > > > Key: YARN-3920 > URL: https://issues.apache.org/jira/browse/YARN-3920 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: yARN-3920.001.patch > > > Reserving a node for a container was designed for preventing large containers > from starvation from small requests that keep getting into a node. Today we > let this be used even for a small container request. This has a huge impact > on scheduling since we block other scheduling requests until that reservation > is fulfilled. We should make this configurable so its impact can be minimized > by limiting it for large container requests as originally intended. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4019) Add JvmPauseMonitor to ResourceManager and NodeManager
[ https://issues.apache.org/jira/browse/YARN-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-4019: Attachment: YARN-4019.001.patch Most of the code was already there; it was just not hooked up. The patch adds the {{JvmPauseMonitor}} to the existing {{JvmMetrics}}. > Add JvmPauseMonitor to ResourceManager and NodeManager > -- > > Key: YARN-4019 > URL: https://issues.apache.org/jira/browse/YARN-4019 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-4019.001.patch > > > We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the ResourceManager > and NodeManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4019) Add JvmPauseMonitor to ResourceManager and NodeManager
Robert Kanter created YARN-4019: --- Summary: Add JvmPauseMonitor to ResourceManager and NodeManager Key: YARN-4019 URL: https://issues.apache.org/jira/browse/YARN-4019 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the ResourceManager and NodeManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3736) Add RMStateStore apis to store and load accepted reservations for failover
[ https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3736: Attachment: YARN-3736.005.patch Addressed findbugs warnings > Add RMStateStore apis to store and load accepted reservations for failover > -- > > Key: YARN-3736 > URL: https://issues.apache.org/jira/browse/YARN-3736 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Subru Krishnan >Assignee: Anubhav Dhoot > Attachments: YARN-3736.001.patch, YARN-3736.001.patch, > YARN-3736.002.patch, YARN-3736.003.patch, YARN-3736.004.patch, > YARN-3736.005.patch > > > We need to persist the current state of the plan, i.e. the accepted > ReservationAllocations & corresponding RLESpareseResourceAllocations to the > RMStateStore so that we can recover them on RM failover. This involves making > all the reservation system data structures protobuf friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4016) docker container is still running when app is killed
[ https://issues.apache.org/jira/browse/YARN-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654345#comment-14654345 ] Abin Shahab commented on YARN-4016: --- [~zhiguohong] Thanks for sending it. Please use the trunk/branch-2 version and run Docker containers using LCE. this bug is fixed there. > docker container is still running when app is killed > > > Key: YARN-4016 > URL: https://issues.apache.org/jira/browse/YARN-4016 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo > > The docker_container_executor_session.sh is generated like below: > {code} > ### get the pid of docker container by "docker inspect" > echo `/usr/bin/docker inspect --format {{.State.Pid}} > container_1438681002528_0001_01_02` > > .../container_1438681002528_0001_01_02.pid.tmp > ### rename *.pid.tmp to *.pid > /bin/mv -f .../container_1438681002528_0001_01_02.pid.tmp > .../container_1438681002528_0001_01_02.pid > ### launch the docker container > /usr/bin/docker run --rm --net=host --name > container_1438681002528_0001_01_02 -v ... library/mysql > /container_1438681002528_0001_01_02/launch_container.sh" > {code} > This is obviously wrong because you can not get the pid of a docker container > before starting it. When NodeManager try to kill the container, pid zero is > always read from the pid file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654341#comment-14654341 ] Sangjin Lee commented on YARN-3049: --- {quote} I'm trying to understand the discussion here. Yes, and what we worked quite hard to avoid is to identify the types of the incoming entities, in a writer, so that we can apply different write code paths. If this is the case, maybe we can refactor the write method so that it contains an expandable context object? We can easily encapsulate flags in a BitSet-like object, and we may add more if needed. The only problem I'm wondering about is, is it possible for the caller to easily generate a context with all required information (such as isNewApp or appFinish)? BTW, I believe we need to refactor the interface of the read and write methods to use some sorts of contexts anyways. Our current argument lists are not expandable. So if this helps, maybe we can move forward by refactor the write interfaces? {quote} Another place where {{HBaseTimelineWriterImpl}} would check for the entity type (being the application) is splitting the application table (YARN-3906). The current patch checks the type of the entity to be able to send writes to different tables. So that would need to be included in the discussion as well. I completely understand the desire that we want to make writers as much agnostic about entity types and data as possible. However, since a lot of things in the schema need to be based on the applications (flow context, the application table, flow run aggregation, etc.), the need to support that strongly is real. We can either go the route of having the write recognize applications and some of their events strongly (at the expense of making the separation between entities and writers a little weaker), or try to create a context for this decision (as [~gtCarrera9] suggested) and have the writer act on it. As for the latter option, while it still shields the writer from knowing details about entities, it would still need to know similar attributes (e.g. "application created", "whether the entity is an application", etc.), only in a more passive manner. Thoughts? > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3926) Extend the YARN resource model for easier resource-type management and profiles
[ https://issues.apache.org/jira/browse/YARN-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654319#comment-14654319 ] Varun Vasudev commented on YARN-3926: - Thanks for comments [~jlowe] and [~asuresh]. My apologies for not responding earlier. {quote} As for RM and NM config mismatches, there can always be a problem where the RM is configured to understand resources A, B, and C while the nodemanager is configured to provide A, B and D. Handshaking during NM registration seems the appropriate way to mitigate this possibility, although I'm not sure it's necessary to shutdown the NM if it is providing a superset of what the RM schedules. Reading later in the doc it appears this is actually intended to be supported by adding it to NMs then later the RM for rolling upgrades, but earlier it states that any mismatch, even additional resources, is fatal to NM registration. That needs to be cleaned up. {quote} I think most people feel that shutting down the NM is not a good idea. I'm going to go with just printing out warning messages in the RM and NM. Does that seem ok? {quote} A little confused why the sample xml config has mappings of pf1,pf2, etc. to profile names rather than using the profile names in the config properties directly like is done with the concise format examples later. {quote} Good point. Arun had similar feedback. I'll change this. {quote} Overall seems like a reasonable approach to make handling of resource types data driven. I have some performance concerns on the memory footprint impact of adding a Map to every resource and needing to hash/compare strings every time we try to do any computations on it. The scheduler loop is already too slow, and this looks like it could add significant overhead to it. Hopefully we can mitigate that if it does become a concern, e.g.: translating Resource records coming across the wire into an efficient internal representation optimized for the resource types configured. {quote} I'll make sure to do some performance tests as part of the development. {quote} Instead of having to explicitly mark a resource as “countable”, can’t we just assume thats the default and instead require “uncountable” types to be explicitly specified (once we start supporting it) {quote} Fair point. I'll use this approach. > Extend the YARN resource model for easier resource-type management and > profiles > --- > > Key: YARN-3926 > URL: https://issues.apache.org/jira/browse/YARN-3926 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Proposal for modifying resource model and profiles.pdf > > > Currently, there are efforts to add support for various resource-types such > as disk(YARN-2139), network(YARN-2140), and HDFS bandwidth(YARN-2681). These > efforts all aim to add support for a new resource type and are fairly > involved efforts. In addition, once support is added, it becomes harder for > users to specify the resources they need. All existing jobs have to be > modified, or have to use the minimum allocation. > This ticket is a proposal to extend the YARN resource model to a more > flexible model which makes it easier to support additional resource-types. It > also considers the related aspect of “resource profiles” which allow users to > easily specify the various resources they need for any given container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654311#comment-14654311 ] Subru Krishnan commented on YARN-2884: -- Thanks [~kishorch] for addressing all my comments. The latest patch LGTM. [~jianhe], can you please take a look. > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, > YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, > YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654298#comment-14654298 ] Junping Du commented on YARN-3816: -- Sorry for coming late on this. Thanks [~sjlee0], [~vrushalic] and [~gtCarrera9] for review and comments! bq. I propose to introduce the second dimension to the metrics explicitly. This second dimension nearly maps to "toAggregate" (and/or the REP/SUM distinction) in your patch. But I think it's probably better to introduce the metric types explicitly as another enum or by subclassing TimelineMetric. Let me know what you think. Do you suggest to use gauge and counter type to replace "toAggregate"? But no matter counter or gauge type of metrics, we may need to do aggregation. e.g. CPU usage as guage, or map task number (launched, failed, etc.) as counter (assume value is tack-on instead of accumulated). The idea to involve "toAggregate" in metric is for client to indicate if this metric value should be added/aggregated with other values or is a final value. If a client put a metrics value that is already aggregated (like HDFS bytes written/read), collector won't apply any aggregation logic on it. bq. I'm still very confused by the usage of the word "aggregate". In this patch, "aggregate" really means accumulating values of a metric along the time dimension, which is completely different than the notion of aggregation we have used all along. The aggregation has always been about rolling up values from children to parents. Can we choose a different word to describe this aspect of accumulating values along the time dimension, and avoid using "aggregation" for this? "Accumulate"? "Cumulative"? Any suggestion? Actually, v2 patch has both. In TimelineCollector, AggregatedMetrics mean rolling up values from children to parents while AggregatedArea means accumulating aggregated values of a metric along the time dimension. It may not be necessary to separate calculating AggregatedArea out as a separated method. Isn't it? It is a bit rush of naming for poc but we can have some better one later. bq. For example, consider HDFS bytes written. The time accumulation is already built into it (see (1)). If you further accumulate this along the time dimension, it becomes quadratic (doubly integrated) in time. I don't see how that can be useful. You are right. For some cases as you mentioned here, time accumulation is not very useful. So beside "toAggregate", we may also need another flag (like: "toAccumulate") to indicate if metric value need to be accumulated along the time dimension? As we don't have assumption that all counters are already accumulated over time or not, the client has flexibility to put accumulated or tack-on values for counter. Thoughts? bq. I think it would be OK to do this and not the average/max of the previous discussion. I'd like to hear what others think about this. Either way should work as we have area value at all timestamps, we can recalculate average/max later if necessary. Would like to hear others' comments too. bq. Can we introduce a configuration that disables this time accumulation feature? As we discussed, some may not want to have this feature enabled and are perfectly happy with simple aggregation (from children to parents). It would be good to isolate this part and be able to enable/disable it. We surely can disable accumulation in system level. We also can disable accumulation in metrics level (like proposed above) even accumulation is enabled in system level. bq. For timeseries, we need to decide what aggregation means. One option is that we could normalize the values to a minute level granularity. For example, add up values per min across each time. So anything that occurred within a minute will be assigned to the top of that minute: eg if something happening at 2 min 10 seconds is considered to have occurred at 2 min. That way we can sum up across flows/users/runs etc. The other option is we only record /store accumulated values at different timestamp and we do delta calculation later if necessary. This can address more time granularity as query could apply on different granularity. > [Aggregation] App-level Aggregation for YARN system metrics > --- > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all container
[jira] [Commented] (YARN-3736) Add RMStateStore apis to store and load accepted reservations for failover
[ https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654252#comment-14654252 ] Arun Suresh commented on YARN-3736: --- The FindBugs link seems to be incorrect. The correct link : https://builds.apache.org/job/PreCommit-YARN-Build/8755/artifact/patchprocess/patchFindbugsWarningshadoop-yarn-server-resourcemanager.html +1 pending addressing it.. > Add RMStateStore apis to store and load accepted reservations for failover > -- > > Key: YARN-3736 > URL: https://issues.apache.org/jira/browse/YARN-3736 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Subru Krishnan >Assignee: Anubhav Dhoot > Attachments: YARN-3736.001.patch, YARN-3736.001.patch, > YARN-3736.002.patch, YARN-3736.003.patch, YARN-3736.004.patch > > > We need to persist the current state of the plan, i.e. the accepted > ReservationAllocations & corresponding RLESpareseResourceAllocations to the > RMStateStore so that we can recover them on RM failover. This involves making > all the reservation system data structures protobuf friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3176) In Fair Scheduler, child queue should inherit maxApp from its parent
[ https://issues.apache.org/jira/browse/YARN-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654130#comment-14654130 ] Arun Suresh commented on YARN-3176: --- Thanks for the patch [~l201514], The patch itself looks fine. But, currently, I see that quite a lot of queue properties do not inherit from the parent. for eg. Min(Max)Resources, Preemption Timeouts etc. Should we broaden the scope of the JIRA to include these as well ? Also, I was also thinking, is it right simply inherit the maxApp ? The queue in question can hog all the apps and not leave its siblings any. Should we use the queue share to determine the max apps ? Thoughts ? > In Fair Scheduler, child queue should inherit maxApp from its parent > > > Key: YARN-3176 > URL: https://issues.apache.org/jira/browse/YARN-3176 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-3176.v1.patch, YARN-3176.v2.patch > > > if the child queue does not have a maxRunningApp limit, it will use the > queueMaxAppsDefault. This behavior is not quite right, since > queueMaxAppsDefault is normally a small number, whereas some parent queues do > have maxRunningApp set to be more than the default -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654132#comment-14654132 ] Sangjin Lee commented on YARN-3045: --- Thanks for your input [~djp]! Just wanted to clarify a few things. {quote} Sorry. I wasn't at that meeting. What's the concern to have NodeManagerEntity? Without this, how could we store something like NM's configuration? {quote} Naga is referring to the Wednesday status call. What I said is that we do not need a separate entity to handle *application*-related events coming out of node managers. If these events are attributes of applications, then they should be on the application entities. If I want to find out all events for some application, then I should be able to query only the application entity and get all events. The need to have NodeManagerEntity is something different IMO. Note that today there are challenges in emitting data without any application context (e.g. node manager's configuration) as we discussed a few times. If we need to support that, that needs a different discussion. {quote} This is true today. However, it may not be precisely for all cases/scenarios. Some implementation of TimelineWriter, like: FS, may only have sync semantics for write(), and flush() could do nothing. {quote} That's correct. What I meant was in general the *contract* of write() may not provide a guarantee that the data will be written completely synchronously. For FS, yes, it will sync. Thus the operative word "may". :) {quote} Do we need to differentiate synchronous with critical in put operation from TimelineClient prospective? Sync most likely mean the client logic rely on the return result of the put call and async put just mean we call put in a non-blocking way. Critical and non-critical for messages(entities) is a relative concept and could be various under different system configurations. Thus, I won't be surprised if we put some critical entities in async way as very rare case we do need sync put in client. Actually, I was convinced in YARN-3949 (https://issues.apache.org/jira/browse/YARN-3949?focusedCommentId=14640910&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14640910) that collector level knows better than writer to decide if it should flush. I would also like to claim that collector could also know better than client on the boundary between critical and non-critical due to the knowledge on system configuration, e.g. less types of entities should be counted as critical for a large scale cluster but client has no knowledge about it. If collector has no add-on knowledge against client, it could be simpler to pass down sync/async() from client to sync/async in writer. Isn't it? {quote} Hmm, my assumption was that the sync/async distinction from the client perspective mapped to whether the writer may be flushed or not. If not, then we need to support a 2x2 matrix of possibilities: sync put w/ flush, sync put w/o flush, async put w/ flush, and async put w/o flush. I thought it would be a simplifying assumption to align those dimensions. My main point in YARN-3949 is that it is sufficient for the writer to provide write() and flush(). The timeline collector can then support all possible semantics, even including the 2x2 matrix behavior if needed. > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, > YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3948) Display Application Priority in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654143#comment-14654143 ] Hadoop QA commented on YARN-3948: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 25s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 42s | The applied patch generated 1 new checkstyle issues (total was 15, now 16). | | {color:red}-1{color} | whitespace | 0m 2s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 24s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 6m 32s | The patch appears to introduce 7 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 7m 0s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 53m 22s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 109m 9s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-common | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748690/0004-YARN-3948.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 9a08999 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8764/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8764/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8764/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8764/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8764/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8764/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8764/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8764/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8764/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8764/console | This message was automatically generated. > Display Application Priority in RM Web UI > - > > Key: YARN-3948 > URL: https://issues.apache.org/jira/browse/YARN-3948 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3948.patch, 0002-YARN-3948.patch, > 0003-YARN-3948.patch, 0004-YARN-3948.patch, ApplicationPage.png, > ClusterPage.png > > > Application Priority can be displayed in RM Web UI Application page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4001) normalizeHostName takes too much of execution time
[ https://issues.apache.org/jira/browse/YARN-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654144#comment-14654144 ] Lei Guo commented on YARN-4001: --- [~zhiguohong], can you share some number about the execution time in your profiling? > normalizeHostName takes too much of execution time > -- > > Key: YARN-4001 > URL: https://issues.apache.org/jira/browse/YARN-4001 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > > For each NodeHeartbeatRequest, NetUtils.normalizeHostName is called under a > lock. I did profiling for a very large cluster and found out > NetUtils.normalizeHostName takes most execution time of > ResourceTrackerService.nodeHeartbeat(...). > We'd better have an option to use raw IP (plus port) as the Node identity to > scale for large clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1645) ContainerManager implementation to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653994#comment-14653994 ] dhruv commented on YARN-1645: - Does is include resizing for both cup and memory ? > ContainerManager implementation to support container resizing > - > > Key: YARN-1645 > URL: https://issues.apache.org/jira/browse/YARN-1645 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Wangda Tan >Assignee: MENG DING > Fix For: YARN-1197 > > Attachments: YARN-1645-YARN-1197.3.patch, > YARN-1645-YARN-1197.4.patch, YARN-1645-YARN-1197.5.patch, YARN-1645.1.patch, > YARN-1645.2.patch, yarn-1645.1.patch > > > Implementation of ContainerManager for container resize, including: > 1) ContainerManager resize logic > 2) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653984#comment-14653984 ] Kishore Chaliparambil commented on YARN-2884: - The test failure (TestLogAggregationService.testLogAggregationServiceWithInterval) is unrelated to the patch. It looks like a transient failure. The same tests are working on local dev machine. > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, > YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, > YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653971#comment-14653971 ] dhruv commented on YARN-1197: - hi can i apply these patches to current stable version 2.7.1 ? > Support changing resources of an allocated container > > > Key: YARN-1197 > URL: https://issues.apache.org/jira/browse/YARN-1197 > Project: Hadoop YARN > Issue Type: Task > Components: api, nodemanager, resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Wangda Tan > Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, > YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, > YARN-1197_Design.pdf > > > The current YARN resource management logic assumes resource allocated to a > container is fixed during the lifetime of it. When users want to change a > resource > of an allocated container the only way is releasing it and allocating a new > container with expected size. > Allowing run-time changing resources of an allocated container will give us > better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4003) ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is not consistent
[ https://issues.apache.org/jira/browse/YARN-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653964#comment-14653964 ] Wangda Tan commented on YARN-4003: -- [~curino], It's a very good point. I would prefer let AM start if there's any idle resource, I think we can use parent's available-capacity to compute AM-resource-limit, and preempt if other queues ask resource. Currently in ProportionalCPP, it will preempt AM container only if there's no other container to preempt, I feel it may not work properly if we compute AM-resource-limit according to parent's available-capacity. Considering a case that there're lots of AM launched in a queue with small guaranteed capacity, it can end up with a queue's resources are all exhausted by AM containers. Thoughts? > ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is > not consistent > > > Key: YARN-4003 > URL: https://issues.apache.org/jira/browse/YARN-4003 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Carlo Curino > Attachments: YARN-4003.patch > > > The inherited behavior from LeafQueue (limit AM % based on capacity) is not a > good fit for ReservationQueue (that have highly dynamic capacity). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3948) Display Application Priority in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3948: -- Attachment: 0004-YARN-3948.patch Rebased patch against latest trunk. > Display Application Priority in RM Web UI > - > > Key: YARN-3948 > URL: https://issues.apache.org/jira/browse/YARN-3948 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3948.patch, 0002-YARN-3948.patch, > 0003-YARN-3948.patch, 0004-YARN-3948.patch, ApplicationPage.png, > ClusterPage.png > > > Application Priority can be displayed in RM Web UI Application page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653928#comment-14653928 ] Hudson commented on YARN-3543: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #274 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/274/]) YARN-3543. ApplicationReport should be able to tell whether the (xgong: rev 0306d902f53582320aa5895ca9f5c31f64aaaff6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationReportPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md > ApplicationReport should be able to tell whether the Application is AM > managed or not. > --- > > Key: YARN-3543 > URL: https://issues.apache.org/jira/browse/YARN-3543 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Affects Versions: 2.6.0 >Reporter: Spandan Dutta >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, > 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, > 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, > 0005-YARN-3543.patch, 0006-YARN-3543.patch, 0007-YARN-3543.patch, > YARN-3543-AH.PNG, YARN-3543-RM.PNG > > > Currently we can know whether the application submitted by the user is AM > managed from the applicationSubmissionContext. This can be only done at the > time when the user submits the job. We should have access to this info from > the ApplicationReport as well so that we can check whether an app is AM > managed or not anytime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4004) container-executor should print output of docker logs if the docker container exits with non-0 exit status
[ https://issues.apache.org/jira/browse/YARN-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653929#comment-14653929 ] Hudson commented on YARN-4004: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #274 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/274/]) YARN-4004. container-executor should print output of docker logs if the (xgong: rev c3364ca8e75acfb911ab92e19f357b132f128123) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c > container-executor should print output of docker logs if the docker container > exits with non-0 exit status > -- > > Key: YARN-4004 > URL: https://issues.apache.org/jira/browse/YARN-4004 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: YARN-4004.001.patch, YARN-4004.002.patch, > YARN-4004.003.patch > > > When a docker container exits with a non-0 exit code, we should print the > docker logs to make debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3965) Add startup timestamp to nodemanager UI
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653931#comment-14653931 ] Hudson commented on YARN-3965: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #274 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/274/]) YARN-3965. Add startup timestamp to nodemanager UI. Contributed by Hong Zhiguo (jlowe: rev 469cfcd695da979e56c83d9303f9bc1f898c08ce) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/NodeInfo.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java > Add startup timestamp to nodemanager UI > --- > > Key: YARN-3965 > URL: https://issues.apache.org/jira/browse/YARN-3965 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3965-2.patch, YARN-3965-3.patch, YARN-3965-4.patch, > YARN-3965.patch > > > We have startup timestamp for RM already, but don't for NM. > Sometimes cluster operator modified configuration of all nodes and kicked off > command to restart all NMs. He found out it's hard for him to check whether > all NMs are restarted. Actually there's always some NMs didn't restart as he > expected, which leads to some error later due to inconsistent configuration. > If we have startup timestamp for NM, the operator could easily fetch it via > NM webservice and find out which NM didn't restart, and take mannaul action > for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3978) Configurably turn off the saving of container info in Generic AHS
[ https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653927#comment-14653927 ] Hudson commented on YARN-3978: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #274 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/274/]) YARN-3978. Configurably turn off the saving of container info in Generic AHS (Eric Payne via jeagles) (jeagles: rev 3cd02b95224e9d43fd63a4ef9ac5c44f113f710d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java > Configurably turn off the saving of container info in Generic AHS > - > > Key: YARN-3978 > URL: https://issues.apache.org/jira/browse/YARN-3978 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver, yarn >Affects Versions: 2.8.0, 2.7.1 >Reporter: Eric Payne >Assignee: Eric Payne > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-3978.001.patch, YARN-3978.002.patch, > YARN-3978.003.patch, YARN-3978.004.patch > > > Depending on how each application's metadata is stored, one week's worth of > data stored in the Generic Application History Server's database can grow to > be almost a terabyte of local disk space. In order to alleviate this, I > suggest that there is a need for a configuration option to turn off saving of > non-AM container metadata in the GAHS data store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4018) correct docker image name is rejected by DockerContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653833#comment-14653833 ] Hadoop QA commented on YARN-4018: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 22m 44s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 10m 22s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 52s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 24s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 13s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 18s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 56m 27s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748678/YARN-4018.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 9a08999 | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8763/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8763/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8763/console | This message was automatically generated. > correct docker image name is rejected by DockerContainerExecutor > > > Key: YARN-4018 > URL: https://issues.apache.org/jira/browse/YARN-4018 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo > Attachments: YARN-4018.patch > > > For example: > "www.dockerbase.net/library/mongo" > "www.dockerbase.net:5000/library/mongo:latest" > leads to error: > Image: www.dockerbase.net/library/mongo is not a proper docker image > Image: www.dockerbase.net:5000/library/mongo:latest is not a proper docker > image -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653754#comment-14653754 ] Hudson commented on YARN-3543: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #266 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/266/]) YARN-3543. ApplicationReport should be able to tell whether the (xgong: rev 0306d902f53582320aa5895ca9f5c31f64aaaff6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationReportPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java > ApplicationReport should be able to tell whether the Application is AM > managed or not. > --- > > Key: YARN-3543 > URL: https://issues.apache.org/jira/browse/YARN-3543 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Affects Versions: 2.6.0 >Reporter: Spandan Dutta >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, > 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, > 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, > 0005-YARN-3543.patch, 0006-YARN-3543.patch, 0007-YARN-3543.patch, > YARN-3543-AH.PNG, YARN-3543-RM.PNG > > > Currently we can know whether the application submitted by the user is AM > managed from the applicationSubmissionContext. This can be only done at the > time when the user submits the job. We should have access to this info from > the ApplicationReport as well so that we can check whether an app is AM > managed or not anytime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4004) container-executor should print output of docker logs if the docker container exits with non-0 exit status
[ https://issues.apache.org/jira/browse/YARN-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653755#comment-14653755 ] Hudson commented on YARN-4004: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #266 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/266/]) YARN-4004. container-executor should print output of docker logs if the (xgong: rev c3364ca8e75acfb911ab92e19f357b132f128123) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c > container-executor should print output of docker logs if the docker container > exits with non-0 exit status > -- > > Key: YARN-4004 > URL: https://issues.apache.org/jira/browse/YARN-4004 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: YARN-4004.001.patch, YARN-4004.002.patch, > YARN-4004.003.patch > > > When a docker container exits with a non-0 exit code, we should print the > docker logs to make debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3965) Add startup timestamp to nodemanager UI
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653757#comment-14653757 ] Hudson commented on YARN-3965: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #266 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/266/]) YARN-3965. Add startup timestamp to nodemanager UI. Contributed by Hong Zhiguo (jlowe: rev 469cfcd695da979e56c83d9303f9bc1f898c08ce) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/NodeInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java * hadoop-yarn-project/CHANGES.txt > Add startup timestamp to nodemanager UI > --- > > Key: YARN-3965 > URL: https://issues.apache.org/jira/browse/YARN-3965 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3965-2.patch, YARN-3965-3.patch, YARN-3965-4.patch, > YARN-3965.patch > > > We have startup timestamp for RM already, but don't for NM. > Sometimes cluster operator modified configuration of all nodes and kicked off > command to restart all NMs. He found out it's hard for him to check whether > all NMs are restarted. Actually there's always some NMs didn't restart as he > expected, which leads to some error later due to inconsistent configuration. > If we have startup timestamp for NM, the operator could easily fetch it via > NM webservice and find out which NM didn't restart, and take mannaul action > for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3978) Configurably turn off the saving of container info in Generic AHS
[ https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653753#comment-14653753 ] Hudson commented on YARN-3978: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #266 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/266/]) YARN-3978. Configurably turn off the saving of container info in Generic AHS (Eric Payne via jeagles) (jeagles: rev 3cd02b95224e9d43fd63a4ef9ac5c44f113f710d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java > Configurably turn off the saving of container info in Generic AHS > - > > Key: YARN-3978 > URL: https://issues.apache.org/jira/browse/YARN-3978 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver, yarn >Affects Versions: 2.8.0, 2.7.1 >Reporter: Eric Payne >Assignee: Eric Payne > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-3978.001.patch, YARN-3978.002.patch, > YARN-3978.003.patch, YARN-3978.004.patch > > > Depending on how each application's metadata is stored, one week's worth of > data stored in the Generic Application History Server's database can grow to > be almost a terabyte of local disk space. In order to alleviate this, I > suggest that there is a need for a configuration option to turn off saving of > non-AM container metadata in the GAHS data store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4018) correct docker image name is rejected by DockerContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4018: -- Attachment: YARN-4018.patch > correct docker image name is rejected by DockerContainerExecutor > > > Key: YARN-4018 > URL: https://issues.apache.org/jira/browse/YARN-4018 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo > Attachments: YARN-4018.patch > > > For example: > "www.dockerbase.net/library/mongo" > "www.dockerbase.net:5000/library/mongo:latest" > leads to error: > Image: www.dockerbase.net/library/mongo is not a proper docker image > Image: www.dockerbase.net:5000/library/mongo:latest is not a proper docker > image -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4018) correct docker image name is rejected by DockerContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4018: -- Description: For example: "www.dockerbase.net/library/mongo" "www.dockerbase.net:5000/library/mongo:latest" leads to error: Image: www.dockerbase.net/library/mongo is not a proper docker image Image: www.dockerbase.net:5000/library/mongo:latest is not a proper docker image was: For example: "www.dockerbase.net/library/mongo" "www.dockerbase.net:5000/library/mongo:latest" > correct docker image name is rejected by DockerContainerExecutor > > > Key: YARN-4018 > URL: https://issues.apache.org/jira/browse/YARN-4018 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo > > For example: > "www.dockerbase.net/library/mongo" > "www.dockerbase.net:5000/library/mongo:latest" > leads to error: > Image: www.dockerbase.net/library/mongo is not a proper docker image > Image: www.dockerbase.net:5000/library/mongo:latest is not a proper docker > image -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4018) correct docker image name is rejected by DockerContainerExecutor
Hong Zhiguo created YARN-4018: - Summary: correct docker image name is rejected by DockerContainerExecutor Key: YARN-4018 URL: https://issues.apache.org/jira/browse/YARN-4018 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo For example: "www.dockerbase.net/library/mongo" "www.dockerbase.net:5000/library/mongo:latest" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653707#comment-14653707 ] Hudson commented on YARN-3543: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2223 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2223/]) YARN-3543. ApplicationReport should be able to tell whether the (xgong: rev 0306d902f53582320aa5895ca9f5c31f64aaaff6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationReportPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java > ApplicationReport should be able to tell whether the Application is AM > managed or not. > --- > > Key: YARN-3543 > URL: https://issues.apache.org/jira/browse/YARN-3543 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Affects Versions: 2.6.0 >Reporter: Spandan Dutta >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, > 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, > 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, > 0005-YARN-3543.patch, 0006-YARN-3543.patch, 0007-YARN-3543.patch, > YARN-3543-AH.PNG, YARN-3543-RM.PNG > > > Currently we can know whether the application submitted by the user is AM > managed from the applicationSubmissionContext. This can be only done at the > time when the user submits the job. We should have access to this info from > the ApplicationReport as well so that we can check whether an app is AM > managed or not anytime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3978) Configurably turn off the saving of container info in Generic AHS
[ https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653706#comment-14653706 ] Hudson commented on YARN-3978: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2223 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2223/]) YARN-3978. Configurably turn off the saving of container info in Generic AHS (Eric Payne via jeagles) (jeagles: rev 3cd02b95224e9d43fd63a4ef9ac5c44f113f710d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java * hadoop-yarn-project/CHANGES.txt > Configurably turn off the saving of container info in Generic AHS > - > > Key: YARN-3978 > URL: https://issues.apache.org/jira/browse/YARN-3978 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver, yarn >Affects Versions: 2.8.0, 2.7.1 >Reporter: Eric Payne >Assignee: Eric Payne > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-3978.001.patch, YARN-3978.002.patch, > YARN-3978.003.patch, YARN-3978.004.patch > > > Depending on how each application's metadata is stored, one week's worth of > data stored in the Generic Application History Server's database can grow to > be almost a terabyte of local disk space. In order to alleviate this, I > suggest that there is a need for a configuration option to turn off saving of > non-AM container metadata in the GAHS data store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3965) Add startup timestamp to nodemanager UI
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653710#comment-14653710 ] Hudson commented on YARN-3965: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2223 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2223/]) YARN-3965. Add startup timestamp to nodemanager UI. Contributed by Hong Zhiguo (jlowe: rev 469cfcd695da979e56c83d9303f9bc1f898c08ce) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/NodeInfo.java > Add startup timestamp to nodemanager UI > --- > > Key: YARN-3965 > URL: https://issues.apache.org/jira/browse/YARN-3965 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3965-2.patch, YARN-3965-3.patch, YARN-3965-4.patch, > YARN-3965.patch > > > We have startup timestamp for RM already, but don't for NM. > Sometimes cluster operator modified configuration of all nodes and kicked off > command to restart all NMs. He found out it's hard for him to check whether > all NMs are restarted. Actually there's always some NMs didn't restart as he > expected, which leads to some error later due to inconsistent configuration. > If we have startup timestamp for NM, the operator could easily fetch it via > NM webservice and find out which NM didn't restart, and take mannaul action > for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4004) container-executor should print output of docker logs if the docker container exits with non-0 exit status
[ https://issues.apache.org/jira/browse/YARN-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653708#comment-14653708 ] Hudson commented on YARN-4004: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2223 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2223/]) YARN-4004. container-executor should print output of docker logs if the (xgong: rev c3364ca8e75acfb911ab92e19f357b132f128123) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c > container-executor should print output of docker logs if the docker container > exits with non-0 exit status > -- > > Key: YARN-4004 > URL: https://issues.apache.org/jira/browse/YARN-4004 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: YARN-4004.001.patch, YARN-4004.002.patch, > YARN-4004.003.patch > > > When a docker container exits with a non-0 exit code, we should print the > docker logs to make debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient
[ https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653651#comment-14653651 ] Junping Du commented on YARN-3367: -- We need to implement a separated API - putEntitiesAsync() in TimelineClient to put entities asynchronously. > Replace starting a separate thread for post entity with event loop in > TimelineClient > > > Key: YARN-3367 > URL: https://issues.apache.org/jira/browse/YARN-3367 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Junping Du > > Since YARN-3039, we add loop in TimelineClient to wait for > collectorServiceAddress ready before posting any entity. In consumer of > TimelineClient (like AM), we are starting a new thread for each call to get > rid of potential deadlock in main thread. This way has at least 3 major > defects: > 1. The consumer need some additional code to wrap a thread before calling > putEntities() in TimelineClient. > 2. It cost many thread resources which is unnecessary. > 3. The sequence of events could be out of order because each posting > operation thread get out of waiting loop randomly. > We should have something like event loop in TimelineClient side, > putEntities() only put related entities into a queue of entities and a > separated thread handle to deliver entities in queue to collector via REST > call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653640#comment-14653640 ] Junping Du commented on YARN-3045: -- Sorry for coming late on this as in travelling recently. Thanks for good discussions and comments above. bq. I am fine with single jira, but only trouble is as and when the scope increases there will be more delay in the jira as more discussions will be required(in this case which entity to publish NM App & localization events ) and also as its been long since i am holding this jira so thought of getting the basic one out and develop on top of it. Ok. Let's get basic things in first then discuss/work on other details later if that move work quickly. bq. We had a discussion on this topic today in the meeting and Sangjin Lee was of the opinion not to have another entity here. I think we need more discussions on this as it involves querying too. Sorry. I wasn't at that meeting. What's the concern to have NodeManagerEntity? Without this, how could we store something like NM's configuration? bq. Along with that, it is implicitly understood that TimelineWriter.write() may be asynchronous (i.e. may not write the data to the storage synchronously or promptly). This is true today. However, it may not be precisely for all cases/scenarios. Some implementation of TimelineWriter, like: FS, may only have sync semantics for write(), and flush() could do nothing. bq. This API should be sufficient for TimelineCollector to express synchronous/critical put operations and asynchronous/non-critical put operations. TimelineCollector will not expose flush() directly to clients. Instead, it may use things like putEntity() and putEntityAsync() to expose that semantics to the client. In the simplest terms, TimelineCollector could implement putEntity() as putEntityAsync() + TimelineWriter.flush(). This is not the actual suggestion of the implementation, but that would be an idea. We already have putEntity() and putEntityAsync(), but we haven't yet used flush() to do this behavior. Do we need to differentiate synchronous with critical in put operation from TimelineClient prospective? Sync most likely mean the client logic rely on the return result of the put call and async put just mean we call put in a non-blocking way. Critical and non-critical for messages(entities) is a relative concept and could be various under different system configurations. Thus, I won't be surprised if we put some critical entities in async way as very rare case we do need sync put in client. Actually, I was convinced in YARN-3949 (https://issues.apache.org/jira/browse/YARN-3949?focusedCommentId=14640910&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14640910) that collector level knows better than writer to decide if it should flush. I would also like to claim that collector could also know better than client on the boundary between critical and non-critical due to the knowledge on system configuration, e.g. less types of entities should be counted as critical for a large scale cluster but client has no knowledge about it. If collector has no add-on knowledge against client, it could be simpler to pass down sync/async() from client to sync/async in writer. Isn't it? bq. And yes, we'll need another JIRA to differentiate putEntity() and putEntityAsync() and use them at the right places. Currently, putEntity() does not call flush(), and all client calls are using putEntity(). YARN-3367 already track this. It is painful for the client to wrap putEnity() with thread-pool or dispatcher to achieve non-blocking way. > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, > YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3978) Configurably turn off the saving of container info in Generic AHS
[ https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653631#comment-14653631 ] Hudson commented on YARN-3978: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2204 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2204/]) YARN-3978. Configurably turn off the saving of container info in Generic AHS (Eric Payne via jeagles) (jeagles: rev 3cd02b95224e9d43fd63a4ef9ac5c44f113f710d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java > Configurably turn off the saving of container info in Generic AHS > - > > Key: YARN-3978 > URL: https://issues.apache.org/jira/browse/YARN-3978 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver, yarn >Affects Versions: 2.8.0, 2.7.1 >Reporter: Eric Payne >Assignee: Eric Payne > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-3978.001.patch, YARN-3978.002.patch, > YARN-3978.003.patch, YARN-3978.004.patch > > > Depending on how each application's metadata is stored, one week's worth of > data stored in the Generic Application History Server's database can grow to > be almost a terabyte of local disk space. In order to alleviate this, I > suggest that there is a need for a configuration option to turn off saving of > non-AM container metadata in the GAHS data store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3965) Add startup timestamp to nodemanager UI
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653636#comment-14653636 ] Hudson commented on YARN-3965: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2204 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2204/]) YARN-3965. Add startup timestamp to nodemanager UI. Contributed by Hong Zhiguo (jlowe: rev 469cfcd695da979e56c83d9303f9bc1f898c08ce) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/NodeInfo.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java > Add startup timestamp to nodemanager UI > --- > > Key: YARN-3965 > URL: https://issues.apache.org/jira/browse/YARN-3965 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3965-2.patch, YARN-3965-3.patch, YARN-3965-4.patch, > YARN-3965.patch > > > We have startup timestamp for RM already, but don't for NM. > Sometimes cluster operator modified configuration of all nodes and kicked off > command to restart all NMs. He found out it's hard for him to check whether > all NMs are restarted. Actually there's always some NMs didn't restart as he > expected, which leads to some error later due to inconsistent configuration. > If we have startup timestamp for NM, the operator could easily fetch it via > NM webservice and find out which NM didn't restart, and take mannaul action > for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653633#comment-14653633 ] Hudson commented on YARN-3543: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2204 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2204/]) YARN-3543. ApplicationReport should be able to tell whether the (xgong: rev 0306d902f53582320aa5895ca9f5c31f64aaaff6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationReportPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java > ApplicationReport should be able to tell whether the Application is AM > managed or not. > --- > > Key: YARN-3543 > URL: https://issues.apache.org/jira/browse/YARN-3543 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Affects Versions: 2.6.0 >Reporter: Spandan Dutta >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, > 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, > 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, > 0005-YARN-3543.patch, 0006-YARN-3543.patch, 0007-YARN-3543.patch, > YARN-3543-AH.PNG, YARN-3543-RM.PNG > > > Currently we can know whether the application submitted by the user is AM > managed from the applicationSubmissionContext. This can be only done at the > time when the user submits the job. We should have access to this info from > the ApplicationReport as well so that we can check whether an app is AM > managed or not anytime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4004) container-executor should print output of docker logs if the docker container exits with non-0 exit status
[ https://issues.apache.org/jira/browse/YARN-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653634#comment-14653634 ] Hudson commented on YARN-4004: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2204 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2204/]) YARN-4004. container-executor should print output of docker logs if the (xgong: rev c3364ca8e75acfb911ab92e19f357b132f128123) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/CHANGES.txt > container-executor should print output of docker logs if the docker container > exits with non-0 exit status > -- > > Key: YARN-4004 > URL: https://issues.apache.org/jira/browse/YARN-4004 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: YARN-4004.001.patch, YARN-4004.002.patch, > YARN-4004.003.patch > > > When a docker container exits with a non-0 exit code, we should print the > docker logs to make debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4004) container-executor should print output of docker logs if the docker container exits with non-0 exit status
[ https://issues.apache.org/jira/browse/YARN-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653490#comment-14653490 ] Hudson commented on YARN-4004: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1007 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1007/]) YARN-4004. container-executor should print output of docker logs if the (xgong: rev c3364ca8e75acfb911ab92e19f357b132f128123) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/CHANGES.txt > container-executor should print output of docker logs if the docker container > exits with non-0 exit status > -- > > Key: YARN-4004 > URL: https://issues.apache.org/jira/browse/YARN-4004 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: YARN-4004.001.patch, YARN-4004.002.patch, > YARN-4004.003.patch > > > When a docker container exits with a non-0 exit code, we should print the > docker logs to make debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3965) Add startup timestamp to nodemanager UI
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653493#comment-14653493 ] Hudson commented on YARN-3965: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1007 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1007/]) YARN-3965. Add startup timestamp to nodemanager UI. Contributed by Hong Zhiguo (jlowe: rev 469cfcd695da979e56c83d9303f9bc1f898c08ce) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/NodeInfo.java > Add startup timestamp to nodemanager UI > --- > > Key: YARN-3965 > URL: https://issues.apache.org/jira/browse/YARN-3965 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3965-2.patch, YARN-3965-3.patch, YARN-3965-4.patch, > YARN-3965.patch > > > We have startup timestamp for RM already, but don't for NM. > Sometimes cluster operator modified configuration of all nodes and kicked off > command to restart all NMs. He found out it's hard for him to check whether > all NMs are restarted. Actually there's always some NMs didn't restart as he > expected, which leads to some error later due to inconsistent configuration. > If we have startup timestamp for NM, the operator could easily fetch it via > NM webservice and find out which NM didn't restart, and take mannaul action > for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3978) Configurably turn off the saving of container info in Generic AHS
[ https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653488#comment-14653488 ] Hudson commented on YARN-3978: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1007 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1007/]) YARN-3978. Configurably turn off the saving of container info in Generic AHS (Eric Payne via jeagles) (jeagles: rev 3cd02b95224e9d43fd63a4ef9ac5c44f113f710d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java > Configurably turn off the saving of container info in Generic AHS > - > > Key: YARN-3978 > URL: https://issues.apache.org/jira/browse/YARN-3978 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver, yarn >Affects Versions: 2.8.0, 2.7.1 >Reporter: Eric Payne >Assignee: Eric Payne > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-3978.001.patch, YARN-3978.002.patch, > YARN-3978.003.patch, YARN-3978.004.patch > > > Depending on how each application's metadata is stored, one week's worth of > data stored in the Generic Application History Server's database can grow to > be almost a terabyte of local disk space. In order to alleviate this, I > suggest that there is a need for a configuration option to turn off saving of > non-AM container metadata in the GAHS data store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653489#comment-14653489 ] Hudson commented on YARN-3543: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1007 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1007/]) YARN-3543. ApplicationReport should be able to tell whether the (xgong: rev 0306d902f53582320aa5895ca9f5c31f64aaaff6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationReportPBImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java > ApplicationReport should be able to tell whether the Application is AM > managed or not. > --- > > Key: YARN-3543 > URL: https://issues.apache.org/jira/browse/YARN-3543 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Affects Versions: 2.6.0 >Reporter: Spandan Dutta >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, > 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, > 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, > 0005-YARN-3543.patch, 0006-YARN-3543.patch, 0007-YARN-3543.patch, > YARN-3543-AH.PNG, YARN-3543-RM.PNG > > > Currently we can know whether the application submitted by the user is AM > managed from the applicationSubmissionContext. This can be only done at the > time when the user submits the job. We should have access to this info from > the ApplicationReport as well so that we can check whether an app is AM > managed or not anytime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4017) container-executor overuses PATH_MAX
[ https://issues.apache.org/jira/browse/YARN-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-4017: --- Priority: Major (was: Blocker) > container-executor overuses PATH_MAX > > > Key: YARN-4017 > URL: https://issues.apache.org/jira/browse/YARN-4017 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer > > Lots of places in container-executor are now using PATH_MAX, which is simply > too small on a lot of platforms. We should use a larger buffer size and be > done with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4017) container-executor overuses PATH_MAX
[ https://issues.apache.org/jira/browse/YARN-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653448#comment-14653448 ] Allen Wittenauer edited comment on YARN-4017 at 8/4/15 10:55 AM: - http://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html and https://www.gnu.org/software/hurd/hurd/porting/guidelines.html has some good discussion. was (Author: aw): http://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html has some good discussion. > container-executor overuses PATH_MAX > > > Key: YARN-4017 > URL: https://issues.apache.org/jira/browse/YARN-4017 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Priority: Blocker > > Lots of places in container-executor are now using PATH_MAX, which is simply > too small on a lot of platforms. We should use a larger buffer size and be > done with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4017) container-executor overuses PATH_MAX
[ https://issues.apache.org/jira/browse/YARN-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653448#comment-14653448 ] Allen Wittenauer commented on YARN-4017: http://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html has some good discussion. > container-executor overuses PATH_MAX > > > Key: YARN-4017 > URL: https://issues.apache.org/jira/browse/YARN-4017 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Priority: Blocker > > Lots of places in container-executor are now using PATH_MAX, which is simply > too small on a lot of platforms. We should use a larger buffer size and be > done with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4017) container-executor overuses PATH_MAX
Allen Wittenauer created YARN-4017: -- Summary: container-executor overuses PATH_MAX Key: YARN-4017 URL: https://issues.apache.org/jira/browse/YARN-4017 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Allen Wittenauer Priority: Blocker Lots of places in container-executor are now using PATH_MAX, which is simply too small on a lot of platforms. We should use a larger buffer size and be done with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4004) container-executor should print output of docker logs if the docker container exits with non-0 exit status
[ https://issues.apache.org/jira/browse/YARN-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653437#comment-14653437 ] Hudson commented on YARN-4004: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #277 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/277/]) YARN-4004. container-executor should print output of docker logs if the (xgong: rev c3364ca8e75acfb911ab92e19f357b132f128123) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/CHANGES.txt > container-executor should print output of docker logs if the docker container > exits with non-0 exit status > -- > > Key: YARN-4004 > URL: https://issues.apache.org/jira/browse/YARN-4004 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: YARN-4004.001.patch, YARN-4004.002.patch, > YARN-4004.003.patch > > > When a docker container exits with a non-0 exit code, we should print the > docker logs to make debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3978) Configurably turn off the saving of container info in Generic AHS
[ https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653435#comment-14653435 ] Hudson commented on YARN-3978: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #277 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/277/]) YARN-3978. Configurably turn off the saving of container info in Generic AHS (Eric Payne via jeagles) (jeagles: rev 3cd02b95224e9d43fd63a4ef9ac5c44f113f710d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java > Configurably turn off the saving of container info in Generic AHS > - > > Key: YARN-3978 > URL: https://issues.apache.org/jira/browse/YARN-3978 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver, yarn >Affects Versions: 2.8.0, 2.7.1 >Reporter: Eric Payne >Assignee: Eric Payne > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-3978.001.patch, YARN-3978.002.patch, > YARN-3978.003.patch, YARN-3978.004.patch > > > Depending on how each application's metadata is stored, one week's worth of > data stored in the Generic Application History Server's database can grow to > be almost a terabyte of local disk space. In order to alleviate this, I > suggest that there is a need for a configuration option to turn off saving of > non-AM container metadata in the GAHS data store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653436#comment-14653436 ] Hudson commented on YARN-3543: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #277 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/277/]) YARN-3543. ApplicationReport should be able to tell whether the (xgong: rev 0306d902f53582320aa5895ca9f5c31f64aaaff6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationReportPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java > ApplicationReport should be able to tell whether the Application is AM > managed or not. > --- > > Key: YARN-3543 > URL: https://issues.apache.org/jira/browse/YARN-3543 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Affects Versions: 2.6.0 >Reporter: Spandan Dutta >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, > 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, > 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, > 0005-YARN-3543.patch, 0006-YARN-3543.patch, 0007-YARN-3543.patch, > YARN-3543-AH.PNG, YARN-3543-RM.PNG > > > Currently we can know whether the application submitted by the user is AM > managed from the applicationSubmissionContext. This can be only done at the > time when the user submits the job. We should have access to this info from > the ApplicationReport as well so that we can check whether an app is AM > managed or not anytime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3965) Add startup timestamp to nodemanager UI
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653439#comment-14653439 ] Hudson commented on YARN-3965: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #277 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/277/]) YARN-3965. Add startup timestamp to nodemanager UI. Contributed by Hong Zhiguo (jlowe: rev 469cfcd695da979e56c83d9303f9bc1f898c08ce) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/NodeInfo.java > Add startup timestamp to nodemanager UI > --- > > Key: YARN-3965 > URL: https://issues.apache.org/jira/browse/YARN-3965 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3965-2.patch, YARN-3965-3.patch, YARN-3965-4.patch, > YARN-3965.patch > > > We have startup timestamp for RM already, but don't for NM. > Sometimes cluster operator modified configuration of all nodes and kicked off > command to restart all NMs. He found out it's hard for him to check whether > all NMs are restarted. Actually there's always some NMs didn't restart as he > expected, which leads to some error later due to inconsistent configuration. > If we have startup timestamp for NM, the operator could easily fetch it via > NM webservice and find out which NM didn't restart, and take mannaul action > for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653425#comment-14653425 ] Jun Gong commented on YARN-3896: [~devaraj.k], could you please help review the latest patch? Thanks. > RMNode transitioned from RUNNING to REBOOTED because its response id had not > been reset > --- > > Key: YARN-3896 > URL: https://issues.apache.org/jira/browse/YARN-3896 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3896.01.patch, YARN-3896.02.patch, > YARN-3896.03.patch, YARN-3896.04.patch, YARN-3896.05.patch, YARN-3896.06.patch > > > {noformat} > 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: > Resolved 10.208.132.153 to /default-rack > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Reconnect from the node at: 10.208.132.153 > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered > with capability: , assigned nodeId > 10.208.132.153:8041 > 2015-07-03 16:49:39,104 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far > behind rm response id:2506413 nm response id:0 > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node 10.208.132.153:8041 as it is now REBOOTED > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED > {noformat} > The node(10.208.132.153) reconnected with RM. When it registered with RM, RM > set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's > heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4016) docker container is still running when app is killed
[ https://issues.apache.org/jira/browse/YARN-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4016: -- Description: The docker_container_executor_session.sh is generated like below: {code} ### get the pid of docker container by "docker inspect" echo `/usr/bin/docker inspect --format {{.State.Pid}} container_1438681002528_0001_01_02` > .../container_1438681002528_0001_01_02.pid.tmp ### rename *.pid.tmp to *.pid /bin/mv -f .../container_1438681002528_0001_01_02.pid.tmp .../container_1438681002528_0001_01_02.pid ### launch the docker container /usr/bin/docker run --rm --net=host --name container_1438681002528_0001_01_02 -v ... library/mysql /container_1438681002528_0001_01_02/launch_container.sh" {code} This is obviously wrong because you can not get the pid of a docker container before starting it. When NodeManager try to kill the container, pid zero is always read from the pid file. was: The docker_container_executor_session.sh is generated like below: {code} ### get the pid of docker container by "docker inspect" echo `/usr/bin/docker inspect --format {{.State.Pid}} container_1438681002528_0001_01_02` > .../container_1438681002528_0001_01_02.pid.tmp ### rename *.pid.tmp to *.pid /bin/mv -f .../container_1438681002528_0001_01_02.pid.tmp .../container_1438681002528_0001_01_02.pid ### launch the docker container /usr/bin/docker run --rm --net=host --name container_1438681002528_0001_01_02 -v ... library/mysql /container_1438681002528_0001_01_02/launch_container.sh" {code} This is obviously wrong because you can not get the pid of a docker container before starting it. > docker container is still running when app is killed > > > Key: YARN-4016 > URL: https://issues.apache.org/jira/browse/YARN-4016 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo > > The docker_container_executor_session.sh is generated like below: > {code} > ### get the pid of docker container by "docker inspect" > echo `/usr/bin/docker inspect --format {{.State.Pid}} > container_1438681002528_0001_01_02` > > .../container_1438681002528_0001_01_02.pid.tmp > ### rename *.pid.tmp to *.pid > /bin/mv -f .../container_1438681002528_0001_01_02.pid.tmp > .../container_1438681002528_0001_01_02.pid > ### launch the docker container > /usr/bin/docker run --rm --net=host --name > container_1438681002528_0001_01_02 -v ... library/mysql > /container_1438681002528_0001_01_02/launch_container.sh" > {code} > This is obviously wrong because you can not get the pid of a docker container > before starting it. When NodeManager try to kill the container, pid zero is > always read from the pid file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4016) docker container is still running when app is killed
Hong Zhiguo created YARN-4016: - Summary: docker container is still running when app is killed Key: YARN-4016 URL: https://issues.apache.org/jira/browse/YARN-4016 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo The docker_container_executor_session.sh is generated like below: {code} ### get the pid of docker container by "docker inspect" echo `/usr/bin/docker inspect --format {{.State.Pid}} container_1438681002528_0001_01_02` > .../container_1438681002528_0001_01_02.pid.tmp ### rename *.pid.tmp to *.pid /bin/mv -f .../container_1438681002528_0001_01_02.pid.tmp .../container_1438681002528_0001_01_02.pid ### launch the docker container /usr/bin/docker run --rm --net=host --name container_1438681002528_0001_01_02 -v ... library/mysql /container_1438681002528_0001_01_02/launch_container.sh" {code} This is obviously wrong because you can not get the pid of a docker container before starting it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3039) [Collector wireup] Implement timeline app-level collector service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3039: - Target Version/s: YARN-2928 (was: 2.8.0) > [Collector wireup] Implement timeline app-level collector service discovery > --- > > Key: YARN-3039 > URL: https://issues.apache.org/jira/browse/YARN-3039 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Junping Du > Fix For: YARN-2928 > > Attachments: Service Binding for applicationaggregator of ATS > (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, > YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, > YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch, > YARN-3039-v6.patch, YARN-3039-v7.patch, YARN-3039-v8.patch, YARN-3039.9.patch > > > Per design in YARN-2928, implement ATS writer service discovery. This is > essential for off-node clients to send writes to the right ATS writer. This > should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4013) Publisher V2 should write the unmanaged AM flag too
[ https://issues.apache.org/jira/browse/YARN-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653241#comment-14653241 ] Sunil G commented on YARN-4013: --- Thanks [~Naganarasimha] I started some work on that. :) > Publisher V2 should write the unmanaged AM flag too > --- > > Key: YARN-4013 > URL: https://issues.apache.org/jira/browse/YARN-4013 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Sunil G > > Upon rebase the branch, I find we need to redo the similar work for V2 > publisher: > https://issues.apache.org/jira/browse/YARN-3543 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4013) Publisher V2 should write the unmanaged AM flag too
[ https://issues.apache.org/jira/browse/YARN-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-4013: - Assignee: Sunil G (was: Naganarasimha G R) > Publisher V2 should write the unmanaged AM flag too > --- > > Key: YARN-4013 > URL: https://issues.apache.org/jira/browse/YARN-4013 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Sunil G > > Upon rebase the branch, I find we need to redo the similar work for V2 > publisher: > https://issues.apache.org/jira/browse/YARN-3543 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4015) Is there any way to dynamically change container size after allocation.
[ https://issues.apache.org/jira/browse/YARN-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653230#comment-14653230 ] Yong Zhang commented on YARN-4015: -- Please see YARN-1197 > Is there any way to dynamically change container size after allocation. > --- > > Key: YARN-4015 > URL: https://issues.apache.org/jira/browse/YARN-4015 > Project: Hadoop YARN > Issue Type: Wish >Reporter: dhruv >Priority: Minor > > Hadoop yarn assumes that container size won't be changed after allocation. > It is possible that job do not use resource allocated fully or required more > resource for container.so is there any way so that container size change > according to run time after allocation of container.Means elasticity for both > memory and cpu. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4015) Is there any way to dynamically change container size after allocation.
dhruv created YARN-4015: --- Summary: Is there any way to dynamically change container size after allocation. Key: YARN-4015 URL: https://issues.apache.org/jira/browse/YARN-4015 Project: Hadoop YARN Issue Type: Wish Reporter: dhruv Priority: Minor Hadoop yarn assumes that container size won't be changed after allocation. It is possible that job do not use resource allocated fully or required more resource for container.so is there any way so that container size change according to run time after allocation of container.Means elasticity for both memory and cpu. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4013) Publisher V2 should write the unmanaged AM flag too
[ https://issues.apache.org/jira/browse/YARN-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653197#comment-14653197 ] Naganarasimha G R commented on YARN-4013: - Oops did not see the comment, [~sunilg], Please reassign if you have started with it ! > Publisher V2 should write the unmanaged AM flag too > --- > > Key: YARN-4013 > URL: https://issues.apache.org/jira/browse/YARN-4013 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Naganarasimha G R > > Upon rebase the branch, I find we need to redo the similar work for V2 > publisher: > https://issues.apache.org/jira/browse/YARN-3543 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4013) Publisher V2 should write the unmanaged AM flag too
[ https://issues.apache.org/jira/browse/YARN-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-4013: --- Assignee: Naganarasimha G R > Publisher V2 should write the unmanaged AM flag too > --- > > Key: YARN-4013 > URL: https://issues.apache.org/jira/browse/YARN-4013 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Naganarasimha G R > > Upon rebase the branch, I find we need to redo the similar work for V2 > publisher: > https://issues.apache.org/jira/browse/YARN-3543 -- This message was sent by Atlassian JIRA (v6.3.4#6332)