[jira] [Reopened] (YARN-8752) yarn-registry.md has wrong word ong-lived,it should be long-lived
[ https://issues.apache.org/jira/browse/YARN-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leiqiang reopened YARN-8752: > yarn-registry.md has wrong word ong-lived,it should be long-lived > - > > Key: YARN-8752 > URL: https://issues.apache.org/jira/browse/YARN-8752 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.1.0 >Reporter: leiqiang >Priority: Major > Labels: documentation > Attachments: YARN-8752-1.patch > > > In yarn-registry.md line 88, > deploy {color:#FF}ong-lived{color} services instances, this word should > be {color:#FF}long-lived{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8752) yarn-registry.md has wrong word ong-lived,it should be long-lived
[ https://issues.apache.org/jira/browse/YARN-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leiqiang resolved YARN-8752. Resolution: Fixed > yarn-registry.md has wrong word ong-lived,it should be long-lived > - > > Key: YARN-8752 > URL: https://issues.apache.org/jira/browse/YARN-8752 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.1.0 >Reporter: leiqiang >Priority: Major > Labels: documentation > Attachments: YARN-8752-1.patch > > > In yarn-registry.md line 88, > deploy {color:#FF}ong-lived{color} services instances, this word should > be {color:#FF}long-lived{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context
[ https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8258: - Attachment: YARN-8258.009.patch > YARN webappcontext for UI2 should inherit all filters from default context > -- > > Key: YARN-8258 > URL: https://issues.apache.org/jira/browse/YARN-8258 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Sumana Sathish >Assignee: Sunil Govindan >Priority: Major > Attachments: Screen Shot 2018-06-26 at 5.54.35 PM.png, > YARN-8258.001.patch, YARN-8258.002.patch, YARN-8258.003.patch, > YARN-8258.004.patch, YARN-8258.005.patch, YARN-8258.006.patch, > YARN-8258.007.patch, YARN-8258.008.patch, YARN-8258.009.patch > > > Thanks [~ssath...@hortonworks.com] for finding this. > Ideally all filters from default context has to be inherited to UI2 context > as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610141#comment-16610141 ] Chen Yufei commented on YARN-8513: -- [~leftnoteasy] Thanks for taking time to investigate this. The resource allocation scheme has no problem for me. I'm not able to turn on preemption in our cluster for now, is there any other way to avoid RM not responding to any other requests when the problem occurs? > CapacityScheduler infinite loop when queue is near fully utilized > - > > Key: YARN-8513 > URL: https://issues.apache.org/jira/browse/YARN-8513 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 3.1.0, 2.9.1 > Environment: Ubuntu 14.04.5 and 16.04.4 > YARN is configured with one label and 5 queues. >Reporter: Chen Yufei >Priority: Major > Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log, > jstack-5.log, top-during-lock.log, top-when-normal.log, yarn3-jstack1.log, > yarn3-jstack2.log, yarn3-jstack3.log, yarn3-jstack4.log, yarn3-jstack5.log, > yarn3-resourcemanager.log, yarn3-top > > > ResourceManager does not respond to any request when queue is near fully > utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM > restart, it can recover running jobs and start accepting new ones. > > Seems like CapacityScheduler is in an infinite loop printing out the > following log messages (more than 25,000 lines in a second): > > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > assignedContainer queue=root usedCapacity=0.99816763 > absoluteUsedCapacity=0.99816763 used= > cluster=}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Failed to accept allocation proposal}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: > assignedContainer application attempt=appattempt_1530619767030_1652_01 > container=null > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943 > clusterResource= type=NODE_LOCAL > requestedPartition=}} > > I encounter this problem several times after upgrading to YARN 2.9.1, while > the same configuration works fine under version 2.7.3. > > YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a > similar problem. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified
[ https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8757: - Target Version/s: 3.2.0 Priority: Critical (was: Major) > [Submarine] Add Tensorboard component when --tensorboard is specified > - > > Key: YARN-8757 > URL: https://issues.apache.org/jira/browse/YARN-8757 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8757.001.patch > > > We need to have a Tensorboard component when --tensorboard is specified. And > we need to set quicklinks to let users view tensorboard. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8709) CS preemption monitor always fails since one under-served queue was deleted
[ https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609997#comment-16609997 ] Tao Yang commented on YARN-8709: Thanks [~eepayne],[~cheersyang] and [~sunilg]. > CS preemption monitor always fails since one under-served queue was deleted > --- > > Key: YARN-8709 > URL: https://issues.apache.org/jira/browse/YARN-8709 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, scheduler preemption >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2 > > Attachments: YARN-8709.001.patch, YARN-8709.002.patch > > > After some queues deleted, the preemption checker in SchedulingMonitor was > always skipped because of YarnRuntimeException for every run. > Error logs: > {noformat} > ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor: > Exception raised while executing preemption checker, skip this run..., > exception= > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: This shouldn't > happen, cannot find TempQueuePerPartition for queueName=1535075839208 > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getQueueByPartition(ProportionalCapacityPreemptionPolicy.java:701) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.selectCandidates(IntraQueueCandidatesSelector.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:514) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:348) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:99) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:111) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:186) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) > {noformat} > I think there is something wrong with partitionToUnderServedQueues field in > ProportionalCapacityPreemptionPolicy. Items of partitionToUnderServedQueues > can be add but never be removed, except rebuilding this policy. For example, > once under-served queue "a" is added into this structure, it will always be > there and never be removed, intra-queue preemption checker will try to get > all queues info for partitionToUnderServedQueues in > IntraQueueCandidatesSelector#selectCandidates and will throw > YarnRuntimeException if not found. So that after queue "a" is deleted from > queue structure, the preemption checker will always fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified
[ https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609992#comment-16609992 ] Hadoop QA commented on YARN-8757: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 31s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 28s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine in trunk has 4 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 10s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine: The patch generated 19 new + 48 unchanged - 2 fixed = 67 total (was 50) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s{color} | {color:green} hadoop-yarn-submarine in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 42m 16s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8757 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12939186/YARN-8757.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a947accecd4e 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 987d819 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/21806/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-submarine-warnings.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21806/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applic
[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609972#comment-16609972 ] Hadoop QA commented on YARN-8763: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 18s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 42s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 22s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 6 new + 2 unchanged - 0 fixed = 8 total (was 2) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 4m 21s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 12s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 28s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 33s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.TestNodeManagerReboot | | | hadoop.yarn.server.nodemanager.containermanager.resourceplugin.TestResourcePluginManager | | | hadoop.yarn.server.nodemanager.TestNodeManagerResync | | | hadoop.yarn.server.nodemanager.TestNodeStatusUpdater | | | hadoop.yarn.server.nodemanager.webapp.TestNMWebServer | | | hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels | | | hadoop.yarn.server.nodemanager.TestNodeManagerShutdown | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8763 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12939182/YARN-8763-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle
[jira] [Commented] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified
[ https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609943#comment-16609943 ] Wangda Tan commented on YARN-8757: -- Added ver.1 patch which spin up a Tensorboard container when --{{tensorboard}} is specified. And now user can launch a tensorboard container point to a parent folder to list all jobs. Will update documentations in next patch. Also improved unit tests a bit. > [Submarine] Add Tensorboard component when --tensorboard is specified > - > > Key: YARN-8757 > URL: https://issues.apache.org/jira/browse/YARN-8757 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: YARN-8757.001.patch > > > We need to have a Tensorboard component when --tensorboard is specified. And > we need to set quicklinks to let users view tensorboard. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified
[ https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8757: - Attachment: YARN-8757.001.patch > [Submarine] Add Tensorboard component when --tensorboard is specified > - > > Key: YARN-8757 > URL: https://issues.apache.org/jira/browse/YARN-8757 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: YARN-8757.001.patch > > > We need to have a Tensorboard component when --tensorboard is specified. And > we need to set quicklinks to let users view tensorboard. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609926#comment-16609926 ] Eric Yang commented on YARN-8763: - [~Zian Chen] Thank you for the patch. We usually have dependent project version specified in hadoop-project/pom.xml. For the web socket jar files dependencies, please move the logic there. WebSocket entry point, needs to accept container id as parameter to guide the servlet to interface with the corresponding container. In HTTP session.getUpgradeRequest().getRequestURI() provides the full path. You can split it up and get anything that comes after container/... to get the container id variable. We also need a test case to mock the testing of ContainerShellWebSocket is tested. > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609919#comment-16609919 ] Zian Chen commented on YARN-8763: - Hi [~eyang], could you help review the patch? Thanks! > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8763: Attachment: YARN-8763-001.patch > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609917#comment-16609917 ] Zian Chen commented on YARN-8763: - Provide initial patch for this. > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
Zian Chen created YARN-8763: --- Summary: Add WebSocket logic to the Node Manager web server to establish servlet Key: YARN-8763 URL: https://issues.apache.org/jira/browse/YARN-8763 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zian Chen Assignee: Zian Chen The reason we want to use WebSocket servlet to serve the backend instead of establishing the connection through HTTP is that WebSocket solves a few issues with HTTP which needed for our scenario, # In HTTP, the request is always initiated by the client and the response is processed by the server — making HTTP a unidirectional protocol, while web socket provides the Bi-directional protocol which means either client/server can send a message to the other party. # Full-duplex communication — client and server can talk to each other independently at the same time # Single TCP connection — After upgrading the HTTP connection in the beginning, client and server communicate over that same TCP connection throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8762) [Umbrella] Support Interactive Docker Shell to running Containers
[ https://issues.apache.org/jira/browse/YARN-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8762: Attachment: Interactive Docker Shell design doc.pdf > [Umbrella] Support Interactive Docker Shell to running Containers > - > > Key: YARN-8762 > URL: https://issues.apache.org/jira/browse/YARN-8762 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Zian Chen >Priority: Major > Labels: Docker > Attachments: Interactive Docker Shell design doc.pdf > > > Debugging distributed application can be challenging on Hadoop. Hadoop > provide limited debugging ability through application log files. One of the > most frequently requested feature is to provide interactive shell to assist > real time debugging. This feature is inspired by docker exec to provide > ability to run arbitrary commands in docker container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8762) [Umbrella] Support Interactive Docker Shell to running Containers
[ https://issues.apache.org/jira/browse/YARN-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609916#comment-16609916 ] Zian Chen commented on YARN-8762: - Provide design doc for this. > [Umbrella] Support Interactive Docker Shell to running Containers > - > > Key: YARN-8762 > URL: https://issues.apache.org/jira/browse/YARN-8762 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Zian Chen >Priority: Major > Labels: Docker > > Debugging distributed application can be challenging on Hadoop. Hadoop > provide limited debugging ability through application log files. One of the > most frequently requested feature is to provide interactive shell to assist > real time debugging. This feature is inspired by docker exec to provide > ability to run arbitrary commands in docker container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8523) Interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609914#comment-16609914 ] Zian Chen commented on YARN-8523: - Offline discussed with Eric and Wangda, this feature involves creating a pipeline among NM, container-exec and docker exec which requires a lot of changes to container stack, create Umbrella Jira YARN-8762 to track progress. > Interactive docker shell > > > Key: YARN-8523 > URL: https://issues.apache.org/jira/browse/YARN-8523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Zian Chen >Priority: Major > Labels: Docker > > Some application might require interactive unix commands executions to carry > out operations. Container-executor can interface with docker exec to debug > or analyze docker containers while the application is running. It would be > nice to support an API to invoke docker exec to perform unix commands and > report back the output to application master. Application master can > distribute and aggregate execution of the commands to record in application > master log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8762) [Umbrella] Support Interactive Docker Shell to running Containers
Zian Chen created YARN-8762: --- Summary: [Umbrella] Support Interactive Docker Shell to running Containers Key: YARN-8762 URL: https://issues.apache.org/jira/browse/YARN-8762 Project: Hadoop YARN Issue Type: New Feature Reporter: Zian Chen Debugging distributed application can be challenging on Hadoop. Hadoop provide limited debugging ability through application log files. One of the most frequently requested feature is to provide interactive shell to assist real time debugging. This feature is inspired by docker exec to provide ability to run arbitrary commands in docker container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8754) [UI2] Improve terms on Component Instance page
[ https://issues.apache.org/jira/browse/YARN-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609905#comment-16609905 ] Hadoop QA commented on YARN-8754: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} YARN-8754 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8754 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21804/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > [UI2] Improve terms on Component Instance page > --- > > Key: YARN-8754 > URL: https://issues.apache.org/jira/browse/YARN-8754 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-09-07 at 4.12.54 PM.png, Screen Shot > 2018-09-07 at 4.30.11 PM.png, YARN-8754.001.patch > > > Component instance page has "node" and "host". These two fields are > representing "bare_host" and "hostname" respectively. > From UI2 page thats not clear. Thus, table content need to be changed to > "bare host" from "node" . > This page also has "Host URL" which is hard coded to N/A. Thus, removing this > field from table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8666) [UI2] Remove application tab from Yarn Queue Page
[ https://issues.apache.org/jira/browse/YARN-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609904#comment-16609904 ] Hadoop QA commented on YARN-8666: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} YARN-8666 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8666 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21802/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > [UI2] Remove application tab from Yarn Queue Page > - > > Key: YARN-8666 > URL: https://issues.apache.org/jira/browse/YARN-8666 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-08-14 at 3.43.18 PM.png, Screen Shot > 2018-09-06 at 12.50.14 PM.png, YARN-8666.001.patch > > > Yarn UI2 Queue page puts Application button. This button does not redirect to > any other page. In addition to that running application table is also > available on same page. > Thus, there is no need to have a button for application in Queue page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart
[ https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609906#comment-16609906 ] Hadoop QA commented on YARN-8753: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} YARN-8753 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8753 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21803/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > [UI2] Lost nodes representation missing from Nodemanagers Chart > --- > > Key: YARN-8753 > URL: https://issues.apache.org/jira/browse/YARN-8753 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot > 2018-09-06 at 6.16.14 PM.png, Screen Shot 2018-09-07 at 11.59.02 AM.png, > YARN-8753.001.patch > > > Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status > page. > This chart does not show nodemanagers if they are LOST. > Due to this issue, Node information page and Node status page shows different > node managers count. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8658) [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609862#comment-16609862 ] Hadoop QA commented on YARN-8658: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 54s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 17s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 52s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 58s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 48s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 27s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 53s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 89m 2s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8658 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12939162/YARN-8658.08.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1d006f695516 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 987d819 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | htt
[jira] [Commented] (YARN-7018) Interface for adding extra behavior to node heartbeats
[ https://issues.apache.org/jira/browse/YARN-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609832#comment-16609832 ] Jason Lowe commented on YARN-7018: -- Originally I was thinking this could be outside of the scheduler, examining scheduler-agnostic settings like SchedulerNode, etc., then it could send NODE_RESOURCE_UPDATE to adjust node capabilities which is also scheduler-agnostic. However it would be lower overhead to have the scheduler call the plugin directly to avoid the messaging overhead, but it does increase coupling between the plugin and the scheduler a little. I'm fine if we want to move the plugin interactions into each of the schedulers. Back to the prototype patch, I assume NodeHeartBeatPluginImpl is just an example and would not be part of the final commit? There needs to be some lifecycle support around the plugin, i.e.: a way for the plugin to know it is being initialized, shutdown, etc. Having a callback when nodes are added and removed would also be helpful for some plugin implementations, otherwise the plugin will have to track nodes redundantly to know when it sees a new one and some other type of hack like timeouts to know when one is no longer being tracked. Similarly I think it would be nice to have explicit config refresh support in the plugin like there is for the schedulers. One idea: if the plugin class we load after refreshing is the same as the old one, do _not_ replace the plugin object but rather invoke a refreshConfigs method or something similar that lets the existing plugin refresh rather than forcing a load-from-scratch approach on each refresh. > Interface for adding extra behavior to node heartbeats > -- > > Key: YARN-7018 > URL: https://issues.apache.org/jira/browse/YARN-7018 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Major > Attachments: YARN-7018.POC.001.patch, YARN-7018.POC.002.patch > > > This JIRA tracks an interface for plugging in new behavior to node heartbeat > processing. Adding a formal interface for additional node heartbeat > processing would allow admins to configure new functionality that is > scheduler-independent without needing to replace the entire scheduler. For > example, both YARN-5202 and YARN-5215 had approaches where node heartbeat > processing was extended to implement new functionality that was essentially > scheduler-independent and could be implemented as a plugin with this > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8680) YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate
[ https://issues.apache.org/jira/browse/YARN-8680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609821#comment-16609821 ] Jason Lowe commented on YARN-8680: -- Thanks for updating the patch! In loadUserLocalizedResources for this patch hunk: {noformat} - if (!key.startsWith(keyPrefix)) { + + if (!key.startsWith(LOCALIZATION_APPCACHE_SUFFIX, + keyPrefix.length())) { break; } {noformat} The old code would make sure the key matches the expected prefix, but the new code is making the dangerous assumption that the key found has the same base prefix that was used in the seek. That is not necessarily the case. If there are no appcache localization entries in the database then this will seek to the first key that occurs lexicographically _after_ the desired key. That key may or may not be long enough to seek keyPrefix.length() characters into it, and if it isn't then we explode with an index out of bounds exception. This code needs to walk through a sub-block of keys by checking the full key prefix break out of the loop when it doesn't match. Just above the while loop the code computes the desired prefix, so it just needs to cache it in a local variable for later comparison in the while loop. Same comment applies to the handling of the LOCALIZATION_FILECACHE_SUFFIX key after the while loop. > YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate > - > > Key: YARN-8680 > URL: https://issues.apache.org/jira/browse/YARN-8680 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Pradeep Ambati >Assignee: Pradeep Ambati >Priority: Critical > Attachments: YARN-8680.00.patch, YARN-8680.01.patch, > YARN-8680.02.patch, YARN-8680.03.patch > > > Similar to YARN-8242, implement iterable abstraction for > LocalResourceTrackerState to load completed and in progress resources when > needed rather than loading them all at a time for a respective state. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8734) Readiness check for remote service
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609797#comment-16609797 ] Eric Yang commented on YARN-8734: - Designed document is attached as "Dependency check vs.pdf". > Readiness check for remote service > -- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Priority: Major > Attachments: Dependency check vs.pdf > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8734) Readiness check for remote service
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang reassigned YARN-8734: --- Assignee: Eric Yang > Readiness check for remote service > -- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Dependency check vs.pdf > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8734) Readiness check for remote service
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8734: Attachment: Dependency check vs.pdf > Readiness check for remote service > -- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Priority: Major > Attachments: Dependency check vs.pdf > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8658) [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Young Chen updated YARN-8658: - Attachment: YARN-8658.08.patch > [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor > -- > > Key: YARN-8658 > URL: https://issues.apache.org/jira/browse/YARN-8658 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Young Chen >Priority: Major > Attachments: YARN-8658.01.patch, YARN-8658.02.patch, > YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch, > YARN-8658.06.patch, YARN-8658.07.patch, YARN-8658.08.patch > > > AMRMClientRelayer (YARN-7900) is introduced for stateful > FederationInterceptor (YARN-7899), to keep track of all pending requests sent > to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to > show the state of things in FederationInterceptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8658) [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609793#comment-16609793 ] Young Chen commented on YARN-8658: -- Fixed a bug with UAM throwing exceptions on skipping register due to some changes I left out while resolving conflicts. > [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor > -- > > Key: YARN-8658 > URL: https://issues.apache.org/jira/browse/YARN-8658 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Young Chen >Priority: Major > Attachments: YARN-8658.01.patch, YARN-8658.02.patch, > YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch, > YARN-8658.06.patch, YARN-8658.07.patch, YARN-8658.08.patch > > > AMRMClientRelayer (YARN-7900) is introduced for stateful > FederationInterceptor (YARN-7899), to keep track of all pending requests sent > to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to > show the state of things in FederationInterceptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8709) CS preemption monitor always fails since one under-served queue was deleted
[ https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609766#comment-16609766 ] Hudson commented on YARN-8709: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14916 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14916/]) YARN-8709: CS preemption monitor always fails since one under-served (ericp: rev 987d8191ad409298570f7ef981e9bc8fb72ff16c) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicyMockFramework.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyIntraQueue.java > CS preemption monitor always fails since one under-served queue was deleted > --- > > Key: YARN-8709 > URL: https://issues.apache.org/jira/browse/YARN-8709 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, scheduler preemption >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8709.001.patch, YARN-8709.002.patch > > > After some queues deleted, the preemption checker in SchedulingMonitor was > always skipped because of YarnRuntimeException for every run. > Error logs: > {noformat} > ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor: > Exception raised while executing preemption checker, skip this run..., > exception= > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: This shouldn't > happen, cannot find TempQueuePerPartition for queueName=1535075839208 > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getQueueByPartition(ProportionalCapacityPreemptionPolicy.java:701) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.selectCandidates(IntraQueueCandidatesSelector.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:514) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:348) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:99) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:111) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:186) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) > {noformat} > I think there is something wrong with partitionToUnderServedQueues field in > ProportionalCapacityPreemptionPolicy. Items of partitionToUnderServedQueues > can be add but never be removed, except rebuilding this policy. For example, > once under-served queue "a" is added into this structure, it will always be > there and never be removed, intra-queue preemption checker will try to get > all queues info for partitionToUnderServedQueues in > IntraQueueCandidatesSelector#selectCandidates and will throw > YarnRuntimeException if not found. So that after queue "a" is deleted from > queue structure, the preemption checker will always fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional comm
[jira] [Commented] (YARN-8761) Service AM support for decommissioning component instances
[ https://issues.apache.org/jira/browse/YARN-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609763#comment-16609763 ] Billie Rinaldi commented on YARN-8761: -- I think to allow removing specific component instances, we will need to maintain a list of decommissioned instances in the Component spec for the service. This will prevent future AM attempts from assigning containers to the decommissioned instances. We should be able to support decommissioning by component instance name or by instance hostname (componentInstanceName.serviceName.user.domain). > Service AM support for decommissioning component instances > -- > > Key: YARN-8761 > URL: https://issues.apache.org/jira/browse/YARN-8761 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > > The idea behind this feature is to have a flex down where specific component > instances are removed. Currently on a flex down, the service AM chooses for > removal the component instances with the highest IDs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8761) Service AM support for decommissioning component instances
Billie Rinaldi created YARN-8761: Summary: Service AM support for decommissioning component instances Key: YARN-8761 URL: https://issues.apache.org/jira/browse/YARN-8761 Project: Hadoop YARN Issue Type: Bug Reporter: Billie Rinaldi Assignee: Billie Rinaldi The idea behind this feature is to have a flex down where specific component instances are removed. Currently on a flex down, the service AM chooses for removal the component instances with the highest IDs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609755#comment-16609755 ] Hadoop QA commented on YARN-8569: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 52s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 17s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 4 new + 147 unchanged - 1 fixed = 151 total (was 148) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 58s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 25s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 16s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {c
[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609735#comment-16609735 ] Jason Lowe commented on YARN-8648: -- Thanks for updating the patch! Should DockerRmCommand take the cgroup hierarchy or null argument in the constructor? It's a bit weird that it requires a container ID in the constructor but not the cgroup hierarchy, yet callers need to check if they need to pass the hierarchy in order to use it properly. Typically it's safer to put such things in the constructor so callers have to think about it. Do we really want to ignore EBUSY errors when trying to remove the cgroup entries? I think this is here for the docker-in-docker use-case that share the same cgroup parent, but it also suppresses useful error messages when the code tries to remove an entry and fails to do so. As written now, the patch will silently fail to remove cgroup entries that are still being used in all cases which seems less than ideal. There is already a {{validate_container_id}} in string-utils.c that the code should leverage to check if the argument is a container ID. The dir_exists check seems extraneous since the code already checks for ENOENT. As you mentioned above, the entry could be deleted after the check anyway. The code makes a system call to avoid a system call, so it won't be much of an optimization in practice. Now that optind is not being passed explicitly to exec_docker_command, do we really want exec_docker_command to examine/modify the global optind variable? What if someone wants to exec multiple docker commands consecutively with very different argc/argv values? Currently the caller would have to be aware of the fact that exec_docker_command is using the global optind to calculate argument offsets in the loop, but not the global argc/argv values, and manually fixup optind between invocations to get it to work properly. Regarding the "Don't increment optind here" comment, it would be good to elaborate a bit more on why an increment would be bad here. Otherwise the comment is simply parroting the code which isn't as helpful. If we fixup the problems with exec_docker_command and optind in the previous issue then I'm hoping there wouldn't be a need for careful optind management and comments here. > Container cgroups are leaked when using docker > -- > > Key: YARN-8648 > URL: https://issues.apache.org/jira/browse/YARN-8648 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: Docker > Attachments: YARN-8648.001.patch, YARN-8648.002.patch, > YARN-8648.003.patch, YARN-8648.004.patch > > > When you run with docker and enable cgroups for cpu, docker creates cgroups > for all resources on the system, not just for cpu. For instance, if the > {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, > the nodemanager will create a cgroup for each container under > {{/sys/fs/cgroup/cpu/hadoop-yarn}}. In the docker case, we pass this path > via the {{--cgroup-parent}} command line argument. Docker then creates a > cgroup for the docker container under that, for instance: > {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}. > When the container exits, docker cleans up the {{docker_container_id}} > cgroup, and the nodemanager cleans up the {{container_id}} cgroup, All is > good under {{/sys/fs/cgroup/hadoop-yarn}}. > The problem is that docker also creates that same hierarchy under every > resource under {{/sys/fs/cgroup}}. On the rhel7 system I am using, these > are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, > perf_event, and systemd.So for instance, docker creates > {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but > it only cleans up the leaf cgroup {{docker_container_id}}. Nobody cleans up > the {{container_id}} cgroups for these other resources. On one of our busy > clusters, we found > 100,000 of these leaked cgroups. > I found this in our 2.8-based version of hadoop, but I have been able to > repro with current hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8658) [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609719#comment-16609719 ] Hadoop QA commented on YARN-8658: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 24s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 30s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 54s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 2 new + 1 unchanged - 0 fixed = 3 total (was 1) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 14s{color} | {color:red} hadoop-yarn-server-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 23s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 81m 34s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.uam.TestUnmanagedApplicationManager | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8658 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12939134/YARN-8658.07.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f7e41cf7d420 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8fe4062 | | maven | version
[jira] [Commented] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA
[ https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609691#comment-16609691 ] Hadoop QA commented on YARN-5464: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 19 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 57s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 56s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 36s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 33s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 50 new + 702 unchanged - 6 fixed = 752 total (was 708) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 70m 6s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 24m 36s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {col
[jira] [Updated] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8696: --- Summary: [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async (was: FederationInterceptor upgrade: home sub-cluster heartbeat async) > [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async > --- > > Key: YARN-8696 > URL: https://issues.apache.org/jira/browse/YARN-8696 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Attachments: YARN-8696.v1.patch, YARN-8696.v2.patch, > YARN-8696.v3.patch, YARN-8696.v4.patch > > > Today in _FederationInterceptor_, the heartbeat to home sub-cluster is > synchronous. After the heartbeat is sent out to home sub-cluster, it waits > for the home response to come back before merging and returning the (merged) > heartbeat result to back AM. If home sub-cluster is suffering from connection > issues, or down during an YarnRM master-slave switch, all heartbeat threads > in _FederationInterceptor_ will be blocked waiting for home response. As a > result, the successful UAM heartbeats from secondary sub-clusters will not be > returned to AM at all. Additionally, because of the fact that we kept the > same heartbeat responseId between AM and home RM, lots of tricky handling are > needed regarding the responseId resync when it comes to > _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart > (YARN-6127, YARN-1336), home RM master-slave switch etc. > In this patch, we change the heartbeat to home sub-cluster to asynchronous, > same as the way we handle UAM heartbeats in secondaries. So that any > sub-cluster down or connection issues won't impact AM getting responses from > other sub-clusters. The responseId is also managed separately for home > sub-cluster and AM, and they increment independently. The resync logic > becomes much cleaner. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8658) [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8658: --- Summary: [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor (was: Metrics for AMRMClientRelayer inside FederationInterceptor) > [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor > -- > > Key: YARN-8658 > URL: https://issues.apache.org/jira/browse/YARN-8658 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Young Chen >Priority: Major > Attachments: YARN-8658.01.patch, YARN-8658.02.patch, > YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch, > YARN-8658.06.patch, YARN-8658.07.patch > > > AMRMClientRelayer (YARN-7900) is introduced for stateful > FederationInterceptor (YARN-7899), to keep track of all pending requests sent > to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to > show the state of things in FederationInterceptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8760) [AMRMProxy] Fix concurrent re-register due to YarnRM failover in AMRMClientRelayer
[ https://issues.apache.org/jira/browse/YARN-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8760: --- Issue Type: Sub-task (was: Task) Parent: YARN-5597 > [AMRMProxy] Fix concurrent re-register due to YarnRM failover in > AMRMClientRelayer > -- > > Key: YARN-8760 > URL: https://issues.apache.org/jira/browse/YARN-8760 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > > When home YarnRM is failing over, FinishApplicationMaster call from AM can > have multiple retry threads outstanding in FederationInterceptor. When new > YarnRM come back up, all retry threads will re-register to YarnRM. The first > one will succeed but the rest will get "Application Master is already > registered" exception. We should catch and swallow this exception and move > on. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8760) [AMRMProxy] Fix concurrent re-register due to YarnRM failover in AMRMClientRelayer
[ https://issues.apache.org/jira/browse/YARN-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8760: --- Summary: [AMRMProxy] Fix concurrent re-register due to YarnRM failover in AMRMClientRelayer (was: Fix concurrent re-register due to YarnRM failover in AMRMClientRelayer) > [AMRMProxy] Fix concurrent re-register due to YarnRM failover in > AMRMClientRelayer > -- > > Key: YARN-8760 > URL: https://issues.apache.org/jira/browse/YARN-8760 > Project: Hadoop YARN > Issue Type: Task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > > When home YarnRM is failing over, FinishApplicationMaster call from AM can > have multiple retry threads outstanding in FederationInterceptor. When new > YarnRM come back up, all retry threads will re-register to YarnRM. The first > one will succeed but the rest will get "Application Master is already > registered" exception. We should catch and swallow this exception and move > on. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8760) Fix concurrent re-register due to YarnRM failover in AMRMClientRelayer
Botong Huang created YARN-8760: -- Summary: Fix concurrent re-register due to YarnRM failover in AMRMClientRelayer Key: YARN-8760 URL: https://issues.apache.org/jira/browse/YARN-8760 Project: Hadoop YARN Issue Type: Task Reporter: Botong Huang Assignee: Botong Huang When home YarnRM is failing over, FinishApplicationMaster call from AM can have multiple retry threads outstanding in FederationInterceptor. When new YarnRM come back up, all retry threads will re-register to YarnRM. The first one will succeed but the rest will get "Application Master is already registered" exception. We should catch and swallow this exception and move on. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575445#comment-16575445 ] Eric Yang edited comment on YARN-8569 at 9/10/18 6:37 PM: -- Sysfs is a pseudo file system provided by Linux Kernel to expose system related information to user space. YARN can mimic the same ideology to export cluster information to container. The proposal is to expose cluster information to: {code} /hadoop/yarn/sysfs/service.json {code} This basically have the runtime information about the deployed application, and getting updated when state changes happen. The file is replicated from YARN service AM to host system in appcache for the application. was (Author: eyang): Sysfs is a pseudo file system provided by Linux Kernel to expose system related information to user space. YARN can mimic the same ideology to export cluster information to container. The proposal is to expose cluster information to: {code} /hadoop/yarn/fs/cluster.json {code} This basically have the runtime information about the deployed application, and getting updated when state changes happen. The file is replicated from YARN service AM to host system in appcache for the application. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Young Chen updated YARN-8658: - Attachment: YARN-8658.07.patch > Metrics for AMRMClientRelayer inside FederationInterceptor > -- > > Key: YARN-8658 > URL: https://issues.apache.org/jira/browse/YARN-8658 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Young Chen >Priority: Major > Attachments: YARN-8658.01.patch, YARN-8658.02.patch, > YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch, > YARN-8658.06.patch, YARN-8658.07.patch > > > AMRMClientRelayer (YARN-7900) is introduced for stateful > FederationInterceptor (YARN-7899), to keep track of all pending requests sent > to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to > show the state of things in FederationInterceptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609658#comment-16609658 ] Young Chen commented on YARN-8658: -- Thanks for the feedback [~botong]! Addressed the issues and uploaded a new patch. > Metrics for AMRMClientRelayer inside FederationInterceptor > -- > > Key: YARN-8658 > URL: https://issues.apache.org/jira/browse/YARN-8658 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Young Chen >Priority: Major > Attachments: YARN-8658.01.patch, YARN-8658.02.patch, > YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch, > YARN-8658.06.patch, YARN-8658.07.patch > > > AMRMClientRelayer (YARN-7900) is introduced for stateful > FederationInterceptor (YARN-7899), to keep track of all pending requests sent > to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to > show the state of things in FederationInterceptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609645#comment-16609645 ] Eric Yang commented on YARN-8569: - Patch 008 fixed more check style issues, and change sync sysfs api to be based on application id instead of combination of application id and container id. This reduces the number of network requests and repetitive syncing of same cluster spec information. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8569: Attachment: YARN-8569.008.patch > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8759) Copy of "resource-types.xml" is not deleted if test fails, causes other test failures
[ https://issues.apache.org/jira/browse/YARN-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609616#comment-16609616 ] Hadoop QA commented on YARN-8759: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 50s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 70m 51s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}126m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8759 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12939121/YARN-8759.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7f901d92e591 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8fe4062 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21797/testReport/ | | Max. process+thread count | 855 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21797/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Copy of "resource-types.xml" is
[jira] [Commented] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA
[ https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609507#comment-16609507 ] Hadoop QA commented on YARN-5464: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 19 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 37s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 40s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 3m 56s{color} | {color:red} hadoop-yarn in trunk failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 4m 28s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 4m 28s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 4m 28s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 26s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 50 new + 702 unchanged - 6 fixed = 752 total (was 708) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 3s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 38s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 23s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 24m 50s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}173m 0s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-5464 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12939096/YARN-5464.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedc
[jira] [Commented] (YARN-8709) CS preemption monitor always fails since one under-served queue was deleted
[ https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609499#comment-16609499 ] Eric Payne commented on YARN-8709: -- Thanks [~Tao Yang]. +1. Will commit shortly. > CS preemption monitor always fails since one under-served queue was deleted > --- > > Key: YARN-8709 > URL: https://issues.apache.org/jira/browse/YARN-8709 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, scheduler preemption >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8709.001.patch, YARN-8709.002.patch > > > After some queues deleted, the preemption checker in SchedulingMonitor was > always skipped because of YarnRuntimeException for every run. > Error logs: > {noformat} > ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor: > Exception raised while executing preemption checker, skip this run..., > exception= > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: This shouldn't > happen, cannot find TempQueuePerPartition for queueName=1535075839208 > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getQueueByPartition(ProportionalCapacityPreemptionPolicy.java:701) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.selectCandidates(IntraQueueCandidatesSelector.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:514) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:348) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:99) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:111) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:186) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) > {noformat} > I think there is something wrong with partitionToUnderServedQueues field in > ProportionalCapacityPreemptionPolicy. Items of partitionToUnderServedQueues > can be add but never be removed, except rebuilding this policy. For example, > once under-served queue "a" is added into this structure, it will always be > there and never be removed, intra-queue preemption checker will try to get > all queues info for partitionToUnderServedQueues in > IntraQueueCandidatesSelector#selectCandidates and will throw > YarnRuntimeException if not found. So that after queue "a" is deleted from > queue structure, the preemption checker will always fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8709) CS preemption monitor always fails since one under-served queue was deleted
[ https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-8709: - Summary: CS preemption monitor always fails since one under-served queue was deleted (was: intra-queue preemption checker always fail since one under-served queue was deleted) > CS preemption monitor always fails since one under-served queue was deleted > --- > > Key: YARN-8709 > URL: https://issues.apache.org/jira/browse/YARN-8709 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, scheduler preemption >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8709.001.patch, YARN-8709.002.patch > > > After some queues deleted, the preemption checker in SchedulingMonitor was > always skipped because of YarnRuntimeException for every run. > Error logs: > {noformat} > ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor: > Exception raised while executing preemption checker, skip this run..., > exception= > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: This shouldn't > happen, cannot find TempQueuePerPartition for queueName=1535075839208 > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getQueueByPartition(ProportionalCapacityPreemptionPolicy.java:701) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.selectCandidates(IntraQueueCandidatesSelector.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:514) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:348) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:99) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:111) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:186) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) > {noformat} > I think there is something wrong with partitionToUnderServedQueues field in > ProportionalCapacityPreemptionPolicy. Items of partitionToUnderServedQueues > can be add but never be removed, except rebuilding this policy. For example, > once under-served queue "a" is added into this structure, it will always be > there and never be removed, intra-queue preemption checker will try to get > all queues info for partitionToUnderServedQueues in > IntraQueueCandidatesSelector#selectCandidates and will throw > YarnRuntimeException if not found. So that after queue "a" is deleted from > queue structure, the preemption checker will always fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8759) Copy of "resource-types.xml" is not deleted if test fails, causes other test failures
[ https://issues.apache.org/jira/browse/YARN-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609463#comment-16609463 ] Manikandan R commented on YARN-8759: [~bsteinbach] Thanks for raising this. Please refer https://issues.apache.org/jira/browse/YARN-7159?focusedCommentId=16235697&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16235697 as well. It talks about the problem we faced initially and the reason for having the resource types file in each sub component (for ex, yarn nm etc). It would be nice if we can find some generic solution. Also, I think there are more few more places like you encountered. Please check. cc [~sunilg] > Copy of "resource-types.xml" is not deleted if test fails, causes other test > failures > - > > Key: YARN-8759 > URL: https://issues.apache.org/jira/browse/YARN-8759 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Major > Attachments: YARN-8759.001.patch > > > resource-types.xml is copied in several tests to the test machine, but it is > deleted only at the end of the test. In case the test fails the file will not > be deleted and other tests will fail, because of the wrong configuration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA
[ https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609454#comment-16609454 ] Antal Bálint Steinbach commented on YARN-5464: -- Hi [~djp], I uploaded a patch based on the patch of [~rkanter] . If you will have some time please review it or if you don't, It would be great if you can suggest somebody who is familiar with the issue. > Server-Side NM Graceful Decommissioning with RM HA > -- > > Key: YARN-5464 > URL: https://issues.apache.org/jira/browse/YARN-5464 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, yarn >Reporter: Robert Kanter >Assignee: Antal Bálint Steinbach >Priority: Major > Attachments: YARN-5464.001.patch, YARN-5464.002.patch, > YARN-5464.003.patch, YARN-5464.004.patch, YARN-5464.wip.patch > > > Make sure to remove the note added by YARN-7094 about RM HA failover not > working right. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA
[ https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-5464: - Attachment: YARN-5464.004.patch > Server-Side NM Graceful Decommissioning with RM HA > -- > > Key: YARN-5464 > URL: https://issues.apache.org/jira/browse/YARN-5464 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, yarn >Reporter: Robert Kanter >Assignee: Antal Bálint Steinbach >Priority: Major > Attachments: YARN-5464.001.patch, YARN-5464.002.patch, > YARN-5464.003.patch, YARN-5464.004.patch, YARN-5464.wip.patch > > > Make sure to remove the note added by YARN-7094 about RM HA failover not > working right. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8759) Copy of "resource-types.xml" is not deleted if test fails, causes other test failures
[ https://issues.apache.org/jira/browse/YARN-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-8759: - Issue Type: Bug (was: Improvement) > Copy of "resource-types.xml" is not deleted if test fails, causes other test > failures > - > > Key: YARN-8759 > URL: https://issues.apache.org/jira/browse/YARN-8759 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Major > > resource-types.xml is copied in several tests to the test machine, but it is > deleted only at the end of the test. In case the test fails the file will not > be deleted and other tests will fail, because of the wrong configuration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8759) Copy of "resource-types.xml" is not deleted if test fails, causes other test failures
Antal Bálint Steinbach created YARN-8759: Summary: Copy of "resource-types.xml" is not deleted if test fails, causes other test failures Key: YARN-8759 URL: https://issues.apache.org/jira/browse/YARN-8759 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Antal Bálint Steinbach Assignee: Antal Bálint Steinbach resource-types.xml is copied in several tests to the test machine, but it is deleted only at the end of the test. In case the test fails the file will not be deleted and other tests will fail, because of the wrong configuration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA
[ https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-5464: - Attachment: YARN-5464.003.patch > Server-Side NM Graceful Decommissioning with RM HA > -- > > Key: YARN-5464 > URL: https://issues.apache.org/jira/browse/YARN-5464 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, yarn >Reporter: Robert Kanter >Assignee: Antal Bálint Steinbach >Priority: Major > Attachments: YARN-5464.001.patch, YARN-5464.002.patch, > YARN-5464.003.patch, YARN-5464.wip.patch > > > Make sure to remove the note added by YARN-7094 about RM HA failover not > working right. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609206#comment-16609206 ] Jim Brennan commented on YARN-8648: --- This is ready for review. > Container cgroups are leaked when using docker > -- > > Key: YARN-8648 > URL: https://issues.apache.org/jira/browse/YARN-8648 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: Docker > Attachments: YARN-8648.001.patch, YARN-8648.002.patch, > YARN-8648.003.patch, YARN-8648.004.patch > > > When you run with docker and enable cgroups for cpu, docker creates cgroups > for all resources on the system, not just for cpu. For instance, if the > {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, > the nodemanager will create a cgroup for each container under > {{/sys/fs/cgroup/cpu/hadoop-yarn}}. In the docker case, we pass this path > via the {{--cgroup-parent}} command line argument. Docker then creates a > cgroup for the docker container under that, for instance: > {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}. > When the container exits, docker cleans up the {{docker_container_id}} > cgroup, and the nodemanager cleans up the {{container_id}} cgroup, All is > good under {{/sys/fs/cgroup/hadoop-yarn}}. > The problem is that docker also creates that same hierarchy under every > resource under {{/sys/fs/cgroup}}. On the rhel7 system I am using, these > are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, > perf_event, and systemd.So for instance, docker creates > {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but > it only cleans up the leaf cgroup {{docker_container_id}}. Nobody cleans up > the {{container_id}} cgroups for these other resources. On one of our busy > clusters, we found > 100,000 of these leaked cgroups. > I found this in our 2.8-based version of hadoop, but I have been able to > repro with current hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA
[ https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609152#comment-16609152 ] Hadoop QA commented on YARN-5464: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 19 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 7s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 19s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 31s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 50 new + 702 unchanged - 6 fixed = 752 total (was 708) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 31s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 48s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 73m 8s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 25m 13s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 30s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}179m 22s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-5464 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12939052/YARN-5464.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc | | uname | Linux d802166afd3e 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
[jira] [Updated] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA
[ https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-5464: - Attachment: YARN-5464.002.patch > Server-Side NM Graceful Decommissioning with RM HA > -- > > Key: YARN-5464 > URL: https://issues.apache.org/jira/browse/YARN-5464 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, yarn >Reporter: Robert Kanter >Assignee: Antal Bálint Steinbach >Priority: Major > Attachments: YARN-5464.001.patch, YARN-5464.002.patch, > YARN-5464.wip.patch > > > Make sure to remove the note added by YARN-7094 about RM HA failover not > working right. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7505) RM REST endpoints generate malformed JSON
[ https://issues.apache.org/jira/browse/YARN-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16608844#comment-16608844 ] Hadoop QA commented on YARN-7505: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 5s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 41s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 12s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 32 unchanged - 2 fixed = 33 total (was 34) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 18s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 0s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 59s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}166m 48s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMHA | | | hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-7505 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12897909/YARN-7505.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninsta
[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration
[ https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16608829#comment-16608829 ] collinma commented on YARN-8747: hi [~sunilg] , could we merge the PR into trunk? Just let me know if you have any concern. > [UI2] YARN UI2 page loading failed due to js error under some time zone > configuration > - > > Key: YARN-8747 > URL: https://issues.apache.org/jira/browse/YARN-8747 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.1.1 >Reporter: collinma >Assignee: collinma >Priority: Blocker > Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png > > Original Estimate: 1h > Remaining Estimate: 1h > > We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured > as GMT+8, the web browser time zone is GMT+8 too. yarn ui page loaded failed > due to js error: > > !image-2018-09-05-18-54-03-991.png! > The moment-timezone js component raised that error. This has been fixed in > moment-timezone > v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need > to update moment-timezone version accordingly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org