[jira] [Commented] (YARN-8382) cgroup file leak in NM
[ https://issues.apache.org/jira/browse/YARN-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499750#comment-16499750 ] Miklos Szegedi commented on YARN-8382: -- +1 LGTM. I will commit this, if noone has objections. > cgroup file leak in NM > -- > > Key: YARN-8382 > URL: https://issues.apache.org/jira/browse/YARN-8382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: we write an container with a shutdownHook which has a > piece of code like "while(true) sleep(100)" . when > *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms <* > *yarn.nodemanager.sleep-delay-before-sigkill.ms , cgourp file leak happens; > when* *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms >* > ** *yarn.nodemanager.sleep-delay-before-sigkill.ms, cgroup file is deleted > successfully*** >Reporter: Hu Ziqian >Assignee: Hu Ziqian >Priority: Major > Attachments: YARN-8382-branch-2.8.3.001.patch, > YARN-8382-branch-2.8.3.002.patch, YARN-8382.001.patch, YARN-8382.002.patch > > > As Jiandan said in YARN-6525, NM may delete Cgroup container file timeout > with logs like below: > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: > Unable to delete cgroup at: /cgroup/cpu/hadoop-yarn/container_xxx, tried to > delete for 1000ms > > we found one situation is that when we set > *yarn.nodemanager.sleep-delay-before-sigkill.ms* bigger than > *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms*, the > cgroup file leak happens *.* > > One container process tree looks like follow graph: > bash(16097)───java(16099)─┬─\{java}(16100) > ├─\{java}(16101) > {{ ├─\{java}(16102)}} > > {{when NM kills a container, NM sends kill -15 -pid to kill container process > group. Bash process will exit when it received sigterm, but java process may > do some job (shutdownHook etc.), and doesn't exit unit receive sigkill. And > when bash process exits, CgroupsLCEResourcesHandler begin to try to delete > cgroup files. So when > *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms* > arrived, the java processes may still running and cgourp/tasks still not > empty and cause a cgroup file leak.}} > > {{we add a condition that > *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms* must > bigger than *yarn.nodemanager.sleep-delay-before-sigkill.ms* to solve this > problem.}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4677) RMNodeResourceUpdateEvent update from scheduler can lead to race condition
[ https://issues.apache.org/jira/browse/YARN-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499744#comment-16499744 ] genericqa commented on YARN-4677: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 19m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 30s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 32s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 8 new + 602 unchanged - 10 fixed = 610 total (was 612) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 64m 58s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}104m 31s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:f667ef1 | | JIRA Issue | YARN-4677 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12926301/YARN-4677-branch-2.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c5430d214474 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / d00a58f | | maven | version: Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) | | Default Java | 1.7.0_171 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/20935/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20935/testReport/ | | Max. process+thread count | 839 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20935/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > RMNodeResourceUpdateEvent update from scheduler can lead to race condition >
[jira] [Updated] (YARN-8388) TestCGroupElasticMemoryController.testNormalExit() hangs on Linux
[ https://issues.apache.org/jira/browse/YARN-8388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-8388: - Attachment: YARN-8388.000.patch > TestCGroupElasticMemoryController.testNormalExit() hangs on Linux > - > > Key: YARN-8388 > URL: https://issues.apache.org/jira/browse/YARN-8388 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: Haibo Chen >Assignee: Miklos Szegedi >Priority: Major > Attachments: YARN-8388.000.patch > > > YARN-8375 disables the unit test on Linux. But given that we will be running > the CGroupElasticMemoryController on Linux, we need to figure out why it is > hanging and ideally fix it. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8155) Improve the logging in NMTimelinePublisher and TimelineCollectorWebService
[ https://issues.apache.org/jira/browse/YARN-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499717#comment-16499717 ] Abhishek Modi commented on YARN-8155: - [~rohithsharma] [~vrushalic] could you please review it. > Improve the logging in NMTimelinePublisher and TimelineCollectorWebService > -- > > Key: YARN-8155 > URL: https://issues.apache.org/jira/browse/YARN-8155 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-8155.001.patch, YARN-8155.002.patch > > > We see that NM logs are filled with larger stack trace of NotFoundException > if collector is removed from one of the NM and other NMs are still publishing > the entities. > > This Jira is to improve the logging in NM so that we log with informative > message. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4677) RMNodeResourceUpdateEvent update from scheduler can lead to race condition
[ https://issues.apache.org/jira/browse/YARN-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499700#comment-16499700 ] Wilfred Spiegelenburg commented on YARN-4677: - Attached a new version of the branch-2 patch which fixes the comments so we can get the fix checked in and finalised. > RMNodeResourceUpdateEvent update from scheduler can lead to race condition > -- > > Key: YARN-4677 > URL: https://issues.apache.org/jira/browse/YARN-4677 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Brook Zhou >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-4677-branch-2.001.patch, > YARN-4677-branch-2.002.patch, YARN-4677-branch-2.003.patch, YARN-4677.01.patch > > > When a node is in decommissioning state, there is time window between > completedContainer() and RMNodeResourceUpdateEvent get handled in > scheduler.nodeUpdate (YARN-3223). > So if a scheduling effort happens within this window, the new container could > still get allocated on this node. Even worse case is if scheduling effort > happen after RMNodeResourceUpdateEvent sent out but before it is propagated > to SchedulerNode - then the total resource is lower than used resource and > available resource is a negative value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4677) RMNodeResourceUpdateEvent update from scheduler can lead to race condition
[ https://issues.apache.org/jira/browse/YARN-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499699#comment-16499699 ] Wilfred Spiegelenburg commented on YARN-4677: - Attached a new version of the branch-2 patch which fixes the comments so we can get the fix checked in and finalised. > RMNodeResourceUpdateEvent update from scheduler can lead to race condition > -- > > Key: YARN-4677 > URL: https://issues.apache.org/jira/browse/YARN-4677 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Brook Zhou >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-4677-branch-2.001.patch, > YARN-4677-branch-2.002.patch, YARN-4677-branch-2.003.patch, YARN-4677.01.patch > > > When a node is in decommissioning state, there is time window between > completedContainer() and RMNodeResourceUpdateEvent get handled in > scheduler.nodeUpdate (YARN-3223). > So if a scheduling effort happens within this window, the new container could > still get allocated on this node. Even worse case is if scheduling effort > happen after RMNodeResourceUpdateEvent sent out but before it is propagated > to SchedulerNode - then the total resource is lower than used resource and > available resource is a negative value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4677) RMNodeResourceUpdateEvent update from scheduler can lead to race condition
[ https://issues.apache.org/jira/browse/YARN-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YARN-4677: Attachment: YARN-4677-branch-2.003.patch > RMNodeResourceUpdateEvent update from scheduler can lead to race condition > -- > > Key: YARN-4677 > URL: https://issues.apache.org/jira/browse/YARN-4677 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Brook Zhou >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-4677-branch-2.001.patch, > YARN-4677-branch-2.002.patch, YARN-4677-branch-2.003.patch, YARN-4677.01.patch > > > When a node is in decommissioning state, there is time window between > completedContainer() and RMNodeResourceUpdateEvent get handled in > scheduler.nodeUpdate (YARN-3223). > So if a scheduling effort happens within this window, the new container could > still get allocated on this node. Even worse case is if scheduling effort > happen after RMNodeResourceUpdateEvent sent out but before it is propagated > to SchedulerNode - then the total resource is lower than used resource and > available resource is a negative value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: (was: hadoop-2.7.2.port-gpu.patch) > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Fix For: 2.7.2 > > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2-gpu.patch, hadoop-2.7.2.gpu-port.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8276) [UI2] After version field became mandatory, form-based submission of new YARN service doesn't work
[ https://issues.apache.org/jira/browse/YARN-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499520#comment-16499520 ] Hudson commented on YARN-8276: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #14346 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14346/]) YARN-8276. [UI2] After version field became mandatory, form-based (sunilg: rev 9c4cbed8d19ec0f486af454de6b117d77a0a0b84) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/components/deploy-service.js * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/deploy-service.hbs * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/utils/info-seeder.js * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-servicedef.js > [UI2] After version field became mandatory, form-based submission of new YARN > service doesn't work > -- > > Key: YARN-8276 > URL: https://issues.apache.org/jira/browse/YARN-8276 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8276.001.patch > > > After version became mandatory in YARN service, one cannot create a new > service through UI, there is no way to specify the version field and the > service fails with the following message: > {code} > "Error: Adapter operation failed". > {code} > Checking through browser dev tools, the REST response is the following: > {code} > {"diagnostics":"Version of service sleeper-service is either empty or not > provided"} > {code} > Discovered by [~vinodkv]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8276) [UI2] After version field became mandatory, form-based submission of new YARN service doesn't work
[ https://issues.apache.org/jira/browse/YARN-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8276: - Summary: [UI2] After version field became mandatory, form-based submission of new YARN service doesn't work (was: [UI2] After version field became mandatory, form-based submission of new YARN service through UI2 doesn't work) > [UI2] After version field became mandatory, form-based submission of new YARN > service doesn't work > -- > > Key: YARN-8276 > URL: https://issues.apache.org/jira/browse/YARN-8276 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Critical > Attachments: YARN-8276.001.patch > > > After version became mandatory in YARN service, one cannot create a new > service through UI, there is no way to specify the version field and the > service fails with the following message: > {code} > "Error: Adapter operation failed". > {code} > Checking through browser dev tools, the REST response is the following: > {code} > {"diagnostics":"Version of service sleeper-service is either empty or not > provided"} > {code} > Discovered by [~vinodkv]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8155) Improve the logging in NMTimelinePublisher and TimelineCollectorWebService
[ https://issues.apache.org/jira/browse/YARN-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499510#comment-16499510 ] genericqa commented on YARN-8155: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 50s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 34s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 19s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 20s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 85m 4s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8155 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12926277/YARN-8155.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 16b893ffe0e7 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a804b7c | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results |
[jira] [Updated] (YARN-8155) Improve the logging in NMTimelinePublisher and TimelineCollectorWebService
[ https://issues.apache.org/jira/browse/YARN-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-8155: Attachment: YARN-8155.002.patch > Improve the logging in NMTimelinePublisher and TimelineCollectorWebService > -- > > Key: YARN-8155 > URL: https://issues.apache.org/jira/browse/YARN-8155 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-8155.001.patch, YARN-8155.002.patch > > > We see that NM logs are filled with larger stack trace of NotFoundException > if collector is removed from one of the NM and other NMs are still publishing > the entities. > > This Jira is to improve the logging in NM so that we log with informative > message. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8155) Improve the logging in NMTimelinePublisher and TimelineCollectorWebService
[ https://issues.apache.org/jira/browse/YARN-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499333#comment-16499333 ] genericqa commented on YARN-8155: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 48s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 19s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 27s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 54s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 20s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 22s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 57s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 85m 29s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8155 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12926238/YARN-8155.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 03f0dd96899a 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a804b7c | |
[jira] [Commented] (YARN-8382) cgroup file leak in NM
[ https://issues.apache.org/jira/browse/YARN-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499324#comment-16499324 ] Hu Ziqian commented on YARN-8382: - [~miklos.szeg...@cloudera.com], I added a new patch which set cgroup delete delay to sigkill timeout + 1 second + NM_LINUX_CONTAINER_CGROUPS_DELETE_TIMEOUT and fixed the checkstyle. > cgroup file leak in NM > -- > > Key: YARN-8382 > URL: https://issues.apache.org/jira/browse/YARN-8382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: we write an container with a shutdownHook which has a > piece of code like "while(true) sleep(100)" . when > *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms <* > *yarn.nodemanager.sleep-delay-before-sigkill.ms , cgourp file leak happens; > when* *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms >* > ** *yarn.nodemanager.sleep-delay-before-sigkill.ms, cgroup file is deleted > successfully*** >Reporter: Hu Ziqian >Assignee: Hu Ziqian >Priority: Major > Attachments: YARN-8382-branch-2.8.3.001.patch, > YARN-8382-branch-2.8.3.002.patch, YARN-8382.001.patch, YARN-8382.002.patch > > > As Jiandan said in YARN-6525, NM may delete Cgroup container file timeout > with logs like below: > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: > Unable to delete cgroup at: /cgroup/cpu/hadoop-yarn/container_xxx, tried to > delete for 1000ms > > we found one situation is that when we set > *yarn.nodemanager.sleep-delay-before-sigkill.ms* bigger than > *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms*, the > cgroup file leak happens *.* > > One container process tree looks like follow graph: > bash(16097)───java(16099)─┬─\{java}(16100) > ├─\{java}(16101) > {{ ├─\{java}(16102)}} > > {{when NM kills a container, NM sends kill -15 -pid to kill container process > group. Bash process will exit when it received sigterm, but java process may > do some job (shutdownHook etc.), and doesn't exit unit receive sigkill. And > when bash process exits, CgroupsLCEResourcesHandler begin to try to delete > cgroup files. So when > *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms* > arrived, the java processes may still running and cgourp/tasks still not > empty and cause a cgroup file leak.}} > > {{we add a condition that > *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms* must > bigger than *yarn.nodemanager.sleep-delay-before-sigkill.ms* to solve this > problem.}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org