[jira] [Commented] (YARN-9336) JobHistoryServer leaks CLOSE_WAIT tcp connections when using LogAggregationIndexedFileController
[ https://issues.apache.org/jira/browse/YARN-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807448#comment-16807448 ] Hadoop QA commented on YARN-9336: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2.9 Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 10s{color} | {color:red} root in branch-2.9 failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 9s{color} | {color:red} hadoop-yarn-common in branch-2.9 failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 9s{color} | {color:orange} The patch fails to run checkstyle in hadoop-yarn-common {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 9s{color} | {color:red} hadoop-yarn-common in branch-2.9 failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 10s{color} | {color:red} hadoop-yarn-common in branch-2.9 failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 9s{color} | {color:red} hadoop-yarn-common in branch-2.9 failed. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 8s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 9s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 9s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 6s{color} | {color:orange} The patch fails to run checkstyle in hadoop-yarn-common {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 9s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 9s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 9s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 8s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:blue}0{color} | {color:blue} asflicense {color} | {color:blue} 0m 10s{color} | {color:blue} ASF License check generated no output? {color} | | {color:black}{color} | {color:black} {color} | {color:black} 3m 12s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:07598f5 | | JIRA Issue | YARN-9336 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964526/YARN-9336-branch-2.9.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 42a2f79f81bc 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2.9 / c7a60ca | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.7.0_95 | | mvninstall | https://builds.apache.org/job/PreCommit-YARN-Build/23861/artifact/out/branch-mvninstall-root.txt | | compile | https://builds.apache.org/job/PreCommit-YARN-Build/23861/artifact/out/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/23861/artifact/out//testptch/patchprocess/maven-branch-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt | | mvnsite | https://builds.apache.org/job/PreCommit-YARN-Build/23861/artifact/out/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-
[jira] [Commented] (YARN-9027) EntityGroupFSTimelineStore fails to init LevelDBCacheTimelineStore
[ https://issues.apache.org/jira/browse/YARN-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807442#comment-16807442 ] Prabhu Joseph commented on YARN-9027: - [~giovanni.fumarola] Can you review this jira - which fixes LevelDBCacheTimelineStore failing to initialize due to default constructor not present > EntityGroupFSTimelineStore fails to init LevelDBCacheTimelineStore > --- > > Key: YARN-9027 > URL: https://issues.apache.org/jira/browse/YARN-9027 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: 0001-YARN-9027.patch, 0002-YARN-9027.patch, > 0003-YARN-9027.patch > > > EntityGroupFSTimelineStore fails to init LevelDBCacheTimelineStore as the > expected default constructor is not present. > {code} > Caused by: java.lang.RuntimeException: java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.() > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134) > at > org.apache.hadoop.yarn.server.timeline.EntityCacheItem.refreshCache(EntityCacheItem.java:100) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getCachedStore(EntityGroupFSTimelineStore.java:1026) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresFromCacheIds(EntityGroupFSTimelineStore.java:945) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresForRead(EntityGroupFSTimelineStore.java:998) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getEntities(EntityGroupFSTimelineStore.java:1040) > at > org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doGetEntities(TimelineDataManager.java:168) > at > org.apache.hadoop.yarn.server.timeline.TimelineDataManager.getEntities(TimelineDataManager.java:138) > at > org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:117) > ... 59 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.getDeclaredConstructor(Class.java:2178) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128) > ... 67 more > {code} > Repro: > {code} > 1. Set Offline Caching with > yarn.timeline-service.entity-group-fs-store.cache-store-class=org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore > 2. Run a Tez query > 3. Check Tez View > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9336) JobHistoryServer leaks CLOSE_WAIT tcp connections when using LogAggregationIndexedFileController
[ https://issues.apache.org/jira/browse/YARN-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807440#comment-16807440 ] Tarun Parimi commented on YARN-9336: Reattaching same patch as the build failed due to a maven issue not related to the patch. > JobHistoryServer leaks CLOSE_WAIT tcp connections when using > LogAggregationIndexedFileController > > > Key: YARN-9336 > URL: https://issues.apache.org/jira/browse/YARN-9336 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.9.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9336-branch-2.9.001.patch, > YARN-9336-branch-2.9.002.patch, YARN-9336.001.patch, YARN-9336.002.patch > > > The JobHistoryServer is leaking CLOSE_WAIT connections to DataNodes whenever > viewing a huge log file in JobhistoryServer. This happens only when the below > is configured. > {code:java} > yarn.log-aggregation.file-formats=IndexedFormat > yarn.log-aggregation.file-controller.IndexedFormat.class=org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController > yarn.log.server.url=http://jobhistory-host:19888/jobhistory/logs > {code} > On investigation, I found that the FSDataInputStream is not closed in > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.IndexedFileAggregatedLogsBlock > . Since this block is called every time the Jobhistory page displays the > logs, CLOSE_WAIT connections to DataNodes keep on increasing in > JobHistoryServer. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9336) JobHistoryServer leaks CLOSE_WAIT tcp connections when using LogAggregationIndexedFileController
[ https://issues.apache.org/jira/browse/YARN-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tarun Parimi updated YARN-9336: --- Attachment: YARN-9336-branch-2.9.002.patch > JobHistoryServer leaks CLOSE_WAIT tcp connections when using > LogAggregationIndexedFileController > > > Key: YARN-9336 > URL: https://issues.apache.org/jira/browse/YARN-9336 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.9.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9336-branch-2.9.001.patch, > YARN-9336-branch-2.9.002.patch, YARN-9336.001.patch, YARN-9336.002.patch > > > The JobHistoryServer is leaking CLOSE_WAIT connections to DataNodes whenever > viewing a huge log file in JobhistoryServer. This happens only when the below > is configured. > {code:java} > yarn.log-aggregation.file-formats=IndexedFormat > yarn.log-aggregation.file-controller.IndexedFormat.class=org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController > yarn.log.server.url=http://jobhistory-host:19888/jobhistory/logs > {code} > On investigation, I found that the FSDataInputStream is not closed in > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.IndexedFileAggregatedLogsBlock > . Since this block is called every time the Jobhistory page displays the > logs, CLOSE_WAIT connections to DataNodes keep on increasing in > JobHistoryServer. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9080) Bucket Directories as part of ATS done accumulates
[ https://issues.apache.org/jira/browse/YARN-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807437#comment-16807437 ] Prabhu Joseph commented on YARN-9080: - [~snemeth] [~giovanni.fumarola] Can you review the patch for this jira - which fixes deletion of bucket directories part of ATS done directory. > Bucket Directories as part of ATS done accumulates > -- > > Key: YARN-9080 > URL: https://issues.apache.org/jira/browse/YARN-9080 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: 0001-YARN-9080.patch, 0002-YARN-9080.patch, > 0003-YARN-9080.patch, YARN-9080-004.patch, YARN-9080-005.patch, > YARN-9080-006.patch > > > Have observed older bucket directories cluster_timestamp, bucket1 and bucket2 > as part of ATS done accumulates. The cleanLogs part of EntityLogCleaner > removes only the app directories and not the bucket directories. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9227) DistributedShell RelativePath is not removed at end
[ https://issues.apache.org/jira/browse/YARN-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807435#comment-16807435 ] Prabhu Joseph commented on YARN-9227: - Thanks [~snemeth] and [~giovanni.fumarola]. > DistributedShell RelativePath is not removed at end > --- > > Key: YARN-9227 > URL: https://issues.apache.org/jira/browse/YARN-9227 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell >Affects Versions: 3.1.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Fix For: 3.3.0 > > Attachments: 0001-YARN-9227.patch, 0002-YARN-9227.patch, > 0003-YARN-9227.patch, YARN-9227-004.patch, YARN-9227-005.patch > > > DistributedShell Job does not remove the relative path which contains jars > and localized files. > {code} > [ambari-qa@ash hadoop-yarn]$ hadoop fs -ls > /user/ambari-qa/DistributedShell/application_1542665708563_0017 > Found 2 items > -rw-r--r-- 3 ambari-qa hdfs 46636 2019-01-23 13:37 > /user/ambari-qa/DistributedShell/application_1542665708563_0017/AppMaster.jar > -rwx--x--- 3 ambari-qa hdfs 4 2019-01-23 13:37 > /user/ambari-qa/DistributedShell/application_1542665708563_0017/shellCommands > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9336) JobHistoryServer leaks CLOSE_WAIT tcp connections when using LogAggregationIndexedFileController
[ https://issues.apache.org/jira/browse/YARN-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807426#comment-16807426 ] Hadoop QA commented on YARN-9336: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 7s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2.9 Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 10s{color} | {color:red} root in branch-2.9 failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 9s{color} | {color:red} hadoop-yarn-common in branch-2.9 failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 7s{color} | {color:orange} The patch fails to run checkstyle in hadoop-yarn-common {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 9s{color} | {color:red} hadoop-yarn-common in branch-2.9 failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 8s{color} | {color:red} hadoop-yarn-common in branch-2.9 failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 9s{color} | {color:red} hadoop-yarn-common in branch-2.9 failed. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 8s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 9s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 9s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 6s{color} | {color:orange} The patch fails to run checkstyle in hadoop-yarn-common {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 8s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 8s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 8s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 8s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:blue}0{color} | {color:blue} asflicense {color} | {color:blue} 0m 10s{color} | {color:blue} ASF License check generated no output? {color} | | {color:black}{color} | {color:black} {color} | {color:black} 14m 49s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:07598f5 | | JIRA Issue | YARN-9336 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964522/YARN-9336-branch-2.9.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 696d94faa85b 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2.9 / c7a60ca | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.7.0_95 | | mvninstall | https://builds.apache.org/job/PreCommit-YARN-Build/23860/artifact/out/branch-mvninstall-root.txt | | compile | https://builds.apache.org/job/PreCommit-YARN-Build/23860/artifact/out/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/23860/artifact/out//testptch/patchprocess/maven-branch-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt | | mvnsite | https://builds.apache.org/job/PreCommit-YARN-Build/23860/artifact/out/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-
[jira] [Commented] (YARN-9418) ATSV2 /apps/appId/entities/YARN_CONTAINER rest api does not show metrics
[ https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807420#comment-16807420 ] Prabhu Joseph commented on YARN-9418: - Thanks [~giovanni.fumarola]. > ATSV2 /apps/appId/entities/YARN_CONTAINER rest api does not show metrics > > > Key: YARN-9418 > URL: https://issues.apache.org/jira/browse/YARN-9418 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > Fix For: 3.3.0 > > Attachments: YARN-9418-001.patch, YARN-9418-002.patch, > YARN-9418-003.patch > > > ATSV2 entities rest api does not show the metrics > {code:java} > [hbase@yarn-ats-3 centos]$ curl -s > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS"; > | jq . > { > "metrics": [], > "events": [], > "createdtime": 1553695002014, > "idprefix": 0, > "type": "YARN_CONTAINER", > "id": "container_e18_1553685341603_0006_01_01", > "info": { > "UID": > "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01", > "FROM_ID": > "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01" > }, > "configs": {}, > "isrelatedto": {}, > "relatesto": {} > }{code} > NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this > is not shown in above output. Found NM container entities are set with > entityIdPrefix as inverted container starttime whereas RM container entities > are set with default 0. TimelineReader fetches only RM container entries. > Confirmed with setting NM container entities entityIdPrefix to 0 same as RM > (for testing purpose) and found metrics are shown. > {code:java} > "metrics": [ > { > "type": "SINGLE_VALUE", > "id": "MEMORY", > "aggregationOp": "NOP", > "values": { > "1553774981355": 490430464 > } > }, > { > "type": "SINGLE_VALUE", > "id": "CPU", > "aggregationOp": "NOP", > "values": { > "1553774981355": 5 > } > } > ]{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-9336) JobHistoryServer leaks CLOSE_WAIT tcp connections when using LogAggregationIndexedFileController
[ https://issues.apache.org/jira/browse/YARN-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tarun Parimi reopened YARN-9336: Reopening to submit patch for branch-2.9 again > JobHistoryServer leaks CLOSE_WAIT tcp connections when using > LogAggregationIndexedFileController > > > Key: YARN-9336 > URL: https://issues.apache.org/jira/browse/YARN-9336 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.9.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9336-branch-2.9.001.patch, YARN-9336.001.patch, > YARN-9336.002.patch > > > The JobHistoryServer is leaking CLOSE_WAIT connections to DataNodes whenever > viewing a huge log file in JobhistoryServer. This happens only when the below > is configured. > {code:java} > yarn.log-aggregation.file-formats=IndexedFormat > yarn.log-aggregation.file-controller.IndexedFormat.class=org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController > yarn.log.server.url=http://jobhistory-host:19888/jobhistory/logs > {code} > On investigation, I found that the FSDataInputStream is not closed in > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.IndexedFileAggregatedLogsBlock > . Since this block is called every time the Jobhistory page displays the > logs, CLOSE_WAIT connections to DataNodes keep on increasing in > JobHistoryServer. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9336) JobHistoryServer leaks CLOSE_WAIT tcp connections when using LogAggregationIndexedFileController
[ https://issues.apache.org/jira/browse/YARN-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tarun Parimi updated YARN-9336: --- Attachment: YARN-9336-branch-2.9.001.patch > JobHistoryServer leaks CLOSE_WAIT tcp connections when using > LogAggregationIndexedFileController > > > Key: YARN-9336 > URL: https://issues.apache.org/jira/browse/YARN-9336 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.9.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9336-branch-2.9.001.patch, YARN-9336.001.patch, > YARN-9336.002.patch > > > The JobHistoryServer is leaking CLOSE_WAIT connections to DataNodes whenever > viewing a huge log file in JobhistoryServer. This happens only when the below > is configured. > {code:java} > yarn.log-aggregation.file-formats=IndexedFormat > yarn.log-aggregation.file-controller.IndexedFormat.class=org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController > yarn.log.server.url=http://jobhistory-host:19888/jobhistory/logs > {code} > On investigation, I found that the FSDataInputStream is not closed in > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.IndexedFileAggregatedLogsBlock > . Since this block is called every time the Jobhistory page displays the > logs, CLOSE_WAIT connections to DataNodes keep on increasing in > JobHistoryServer. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to remove duplication
[ https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807384#comment-16807384 ] Hudson commented on YARN-9214: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16324 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16324/]) YARN-9214. Add AbstractYarnScheduler#getValidQueues method to remove (yufei: rev 2f752830ba74c90ccce818d687572db9afded25b) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java > Add AbstractYarnScheduler#getValidQueues method to remove duplication > - > > Key: YARN-9214 > URL: https://issues.apache.org/jira/browse/YARN-9214 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9214.001.patch, YARN-9214.002.patch, > YARN-9214.003.patch, YARN-9214.004.patch, YARN-9214.005.patch > > > *AbstractYarnScheduler#moveAllApps* and > *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I > think we need a method to handle it named > *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc > comment to expound why exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to remove duplication
[ https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807378#comment-16807378 ] Yufei Gu commented on YARN-9214: Committed to trunk. Thanks [~jiwq] for the contribution. Thanks [~snemeth] for the review. > Add AbstractYarnScheduler#getValidQueues method to remove duplication > - > > Key: YARN-9214 > URL: https://issues.apache.org/jira/browse/YARN-9214 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9214.001.patch, YARN-9214.002.patch, > YARN-9214.003.patch, YARN-9214.004.patch, YARN-9214.005.patch > > > *AbstractYarnScheduler#moveAllApps* and > *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I > think we need a method to handle it named > *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc > comment to expound why exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to remove duplication
[ https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-9214: --- Summary: Add AbstractYarnScheduler#getValidQueues method to remove duplication (was: Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code ) > Add AbstractYarnScheduler#getValidQueues method to remove duplication > - > > Key: YARN-9214 > URL: https://issues.apache.org/jira/browse/YARN-9214 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9214.001.patch, YARN-9214.002.patch, > YARN-9214.003.patch, YARN-9214.004.patch, YARN-9214.005.patch > > > *AbstractYarnScheduler#moveAllApps* and > *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I > think we need a method to handle it named > *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc > comment to expound why exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code
[ https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807374#comment-16807374 ] Yufei Gu commented on YARN-9214: +1 > Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code > -- > > Key: YARN-9214 > URL: https://issues.apache.org/jira/browse/YARN-9214 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9214.001.patch, YARN-9214.002.patch, > YARN-9214.003.patch, YARN-9214.004.patch, YARN-9214.005.patch > > > *AbstractYarnScheduler#moveAllApps* and > *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I > think we need a method to handle it named > *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc > comment to expound why exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9428) Add metrics for paused containers in NodeManager
[ https://issues.apache.org/jira/browse/YARN-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807362#comment-16807362 ] Abhishek Modi commented on YARN-9428: - Thanks [~giovanni.fumarola] for review and committing it. Thanks. > Add metrics for paused containers in NodeManager > > > Key: YARN-9428 > URL: https://issues.apache.org/jira/browse/YARN-9428 > Project: Hadoop YARN > Issue Type: Task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9428.001.patch, YARN-9428.002.patch > > > Add metrics for paused containers in NodeManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat
[ https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807363#comment-16807363 ] Abhishek Modi commented on YARN-2889: - [~giovanni.fumarola] could you please review it. Thanks. > Limit the number of opportunistic container allocated per AM heartbeat > -- > > Key: YARN-2889 > URL: https://issues.apache.org/jira/browse/YARN-2889 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-2889.001.patch, YARN-2889.002.patch > > > We introduce a way to limit the number of opportunistic containers that will > be allocated on each AM heartbeat. > This way we can restrict the number of opportunistic containers handed out > by the system, as well as throttle down misbehaving AMs (asking for too many > opportunistic containers). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9432) Excess reserved containers may exist for a long time after its request has been cancelled or satisfied when multi-nodes enabled
[ https://issues.apache.org/jira/browse/YARN-9432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807353#comment-16807353 ] Hadoop QA commented on YARN-9432: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 37s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 30s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 59 unchanged - 0 fixed = 60 total (was 59) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 57s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}125m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9432 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964433/YARN-9432.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 35fb7163c191 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ab2bda5 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/23859/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | http
[jira] [Commented] (YARN-9431) Fix flaky junit test fair.TestAppRunnability after YARN-8967
[ https://issues.apache.org/jira/browse/YARN-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807337#comment-16807337 ] Wilfred Spiegelenburg commented on YARN-9431: - Thank you [~giovanni.fumarola] for the commit and [~pbacsko] for confirming the fix > Fix flaky junit test fair.TestAppRunnability after YARN-8967 > > > Key: YARN-9431 > URL: https://issues.apache.org/jira/browse/YARN-9431 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, test >Affects Versions: 3.3.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Fix For: 3.3.0 > > Attachments: YARN-9431.001.patch > > > In YARN-4901 one of the scheduler tests failed. This seems to be linked to > the changes around the placement rules introduced in YARN-8967. > Applications submitted in the tests are accepted and rejected at the same > time: > {code} > 2019-04-01 12:00:57,269 INFO [main] fair.FairScheduler > (FairScheduler.java:addApplication(540)) - Accepted application > application_0_0001 from user: user1, in queue: root.user1, currently num of > applications: 1 > 2019-04-01 12:00:57,269 INFO [AsyncDispatcher event handler] > fair.FairScheduler (FairScheduler.java:rejectApplicationWithMessage(1344)) - > Reject application application_0_0001 submitted by user user1 application > rejected by placement rules. > {code} > This should never happen and is most likely due to the way the tests > generates the application and events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8530) Add security filters to Application catalog
[ https://issues.apache.org/jira/browse/YARN-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807330#comment-16807330 ] Hadoop QA commented on YARN-8530: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 52s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-docker hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 1s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 30s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 6s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-docker hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 38s{color} | {color:green} hadoop-yarn-applications-catalog-webapp in the patch passed. {color} | | {color:green}+1{color
[jira] [Updated] (YARN-8530) Add security filters to Application catalog
[ https://issues.apache.org/jira/browse/YARN-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8530: Attachment: YARN-8530.003.patch > Add security filters to Application catalog > --- > > Key: YARN-8530 > URL: https://issues.apache.org/jira/browse/YARN-8530 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security, yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8530.001.patch, YARN-8530.002.patch, > YARN-8530.003.patch > > > Application catalog UI does not have any security filter applied. CORS > filter and Authentication filter are required to secure the web application. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat
[ https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807252#comment-16807252 ] Hadoop QA commented on YARN-2889: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 49s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 24s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 13s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 46s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 1s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 7s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 53s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 22m 22s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 83m 43s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}228m 43s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-2889 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964474/YARN-2889.002.pat
[jira] [Commented] (YARN-9192) Deletion Taks will be picked up to delete running containers
[ https://issues.apache.org/jira/browse/YARN-9192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807236#comment-16807236 ] Rayman commented on YARN-9192: -- [~sihai] This is probably because you have set yarn.nodemanager.recovery.enabled to true, and yarn.nodemanager.recovery.supervised to false. [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManager.html] > Deletion Taks will be picked up to delete running containers > > > Key: YARN-9192 > URL: https://issues.apache.org/jira/browse/YARN-9192 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 2.9.1 >Reporter: Sihai Ke >Priority: Major > > I suspect there is a bug in Yarn deletion task service, below is my repo > steps: > # First let's set yarn.nodemanager.delete.debug-delay-sec=3600, that means > when the app finished, the Binary/container folder will be deleted after 3600 > seconds. > # when the application App1 (long running service) is running on machine > machine1, and machine1 shutdown, ContainerManagerImpl#serviceStop() will be > called -> ContainerManagerImpl#cleanUpApplicationsOnNMShutDown, and > ApplicationFinishEvent will be sent, and then some delection tasks will be > created, but be stored in DB and will be picked up to execute 3600 seconds. > # 100 seconds later, machine1 comes back, and the same app is assigned to > run this this machine, container created and works well. > # then deleting task created in step 2 will be picked up to delete > containers created in step 3 later. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-9192) Deletion Taks will be picked up to delete running containers
[ https://issues.apache.org/jira/browse/YARN-9192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rayman updated YARN-9192: - Comment: was deleted (was: I'm observing a similar issue, when running Samza over YARN. When bouncing an NM, the NM being killed writes LevelDB state for the deletion-service to act on. The "new" NM reads it and acts upon it, but ends up deleting directories for running containers. This happens when containers are long-running, and are placed on a fixed host. I also observed this in the log *[INFO] [shutdown-hook-0] containermanager.ContainerManagerImpl.cleanUpApplicationsOnNMShutDown(ContainerManagerImpl.java:718) - Waiting for Applications to be Finished*) > Deletion Taks will be picked up to delete running containers > > > Key: YARN-9192 > URL: https://issues.apache.org/jira/browse/YARN-9192 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 2.9.1 >Reporter: Sihai Ke >Priority: Major > > I suspect there is a bug in Yarn deletion task service, below is my repo > steps: > # First let's set yarn.nodemanager.delete.debug-delay-sec=3600, that means > when the app finished, the Binary/container folder will be deleted after 3600 > seconds. > # when the application App1 (long running service) is running on machine > machine1, and machine1 shutdown, ContainerManagerImpl#serviceStop() will be > called -> ContainerManagerImpl#cleanUpApplicationsOnNMShutDown, and > ApplicationFinishEvent will be sent, and then some delection tasks will be > created, but be stored in DB and will be picked up to execute 3600 seconds. > # 100 seconds later, machine1 comes back, and the same app is assigned to > run this this machine, container created and works well. > # then deleting task created in step 2 will be picked up to delete > containers created in step 3 later. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9192) Deletion Taks will be picked up to delete running containers
[ https://issues.apache.org/jira/browse/YARN-9192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807102#comment-16807102 ] Rayman edited comment on YARN-9192 at 4/1/19 9:33 PM: -- I'm observing a similar issue, when running Samza over YARN. When bouncing an NM, the NM being killed writes LevelDB state for the deletion-service to act on. The "new" NM reads it and acts upon it, but ends up deleting directories for running containers. This happens when containers are long-running, and are placed on a fixed host. I also observed this in the log *[INFO] [shutdown-hook-0] containermanager.ContainerManagerImpl.cleanUpApplicationsOnNMShutDown(ContainerManagerImpl.java:718) - Waiting for Applications to be Finished* was (Author: rayman7718): I'm observing a similar issue, when running Samza over YARN. When bouncing an NM, the NM being killed writes LevelDB state for the deletion-service to act on. The "new" NM reads it and acts upon it, but ends up deleting directories for running containers. This happens when containers are long-running, and are placed on a fixed host. > Deletion Taks will be picked up to delete running containers > > > Key: YARN-9192 > URL: https://issues.apache.org/jira/browse/YARN-9192 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 2.9.1 >Reporter: Sihai Ke >Priority: Major > > I suspect there is a bug in Yarn deletion task service, below is my repo > steps: > # First let's set yarn.nodemanager.delete.debug-delay-sec=3600, that means > when the app finished, the Binary/container folder will be deleted after 3600 > seconds. > # when the application App1 (long running service) is running on machine > machine1, and machine1 shutdown, ContainerManagerImpl#serviceStop() will be > called -> ContainerManagerImpl#cleanUpApplicationsOnNMShutDown, and > ApplicationFinishEvent will be sent, and then some delection tasks will be > created, but be stored in DB and will be picked up to execute 3600 seconds. > # 100 seconds later, machine1 comes back, and the same app is assigned to > run this this machine, container created and works well. > # then deleting task created in step 2 will be picked up to delete > containers created in step 3 later. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5670) Add support for Docker image clean up
[ https://issues.apache.org/jira/browse/YARN-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang reassigned YARN-5670: --- Assignee: (was: Eric Yang) > Add support for Docker image clean up > - > > Key: YARN-5670 > URL: https://issues.apache.org/jira/browse/YARN-5670 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Zhankun Tang >Priority: Major > Labels: Docker > Attachments: Localization Support For Docker Images_002.pdf > > > Regarding to Docker image localization, we also need a way to clean up the > old/stale Docker image to save storage space. We may extend deletion service > to utilize "docker rm" to do this. > This is related to YARN-3854 and may depend on its implementation. Please > refer to YARN-3854 for Docker image localization details. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9428) Add metrics for paused containers in NodeManager
[ https://issues.apache.org/jira/browse/YARN-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807204#comment-16807204 ] Hudson commented on YARN-9428: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16322 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16322/]) YARN-9428. Add metrics for paused containers in NodeManager. Contributed (gifuma: rev ab2bda57bd9ad617342586d5769121a4fef4eab1) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java > Add metrics for paused containers in NodeManager > > > Key: YARN-9428 > URL: https://issues.apache.org/jira/browse/YARN-9428 > Project: Hadoop YARN > Issue Type: Task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9428.001.patch, YARN-9428.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9428) Add metrics for paused containers in NodeManager
[ https://issues.apache.org/jira/browse/YARN-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-9428: --- Fix Version/s: 3.3.0 > Add metrics for paused containers in NodeManager > > > Key: YARN-9428 > URL: https://issues.apache.org/jira/browse/YARN-9428 > Project: Hadoop YARN > Issue Type: Task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9428.001.patch, YARN-9428.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9428) Add metrics for paused containers in NodeManager
[ https://issues.apache.org/jira/browse/YARN-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-9428: --- Description: Add metrics for paused containers in NodeManager. > Add metrics for paused containers in NodeManager > > > Key: YARN-9428 > URL: https://issues.apache.org/jira/browse/YARN-9428 > Project: Hadoop YARN > Issue Type: Task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9428.001.patch, YARN-9428.002.patch > > > Add metrics for paused containers in NodeManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9428) Add metrics for paused containers in NodeManager
[ https://issues.apache.org/jira/browse/YARN-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807151#comment-16807151 ] Hadoop QA commented on YARN-9428: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 4s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 39s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 71m 41s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9428 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964473/YARN-9428.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux acf06fe1355c 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / da7f8c2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23857/testReport/ | | Max. process+thread count | 447 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23857/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Add metrics for paused containers in NodeManag
[jira] [Commented] (YARN-9192) Deletion Taks will be picked up to delete running containers
[ https://issues.apache.org/jira/browse/YARN-9192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807102#comment-16807102 ] Rayman commented on YARN-9192: -- I'm observing a similar issue, when running Samza over YARN. When bouncing an NM, the NM being killed writes LevelDB state for the deletion-service to act on. The "new" NM reads it and acts upon it, but ends up deleting directories for running containers. This happens when containers are long-running, and are placed on a fixed host. > Deletion Taks will be picked up to delete running containers > > > Key: YARN-9192 > URL: https://issues.apache.org/jira/browse/YARN-9192 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 2.9.1 >Reporter: Sihai Ke >Priority: Major > > I suspect there is a bug in Yarn deletion task service, below is my repo > steps: > # First let's set yarn.nodemanager.delete.debug-delay-sec=3600, that means > when the app finished, the Binary/container folder will be deleted after 3600 > seconds. > # when the application App1 (long running service) is running on machine > machine1, and machine1 shutdown, ContainerManagerImpl#serviceStop() will be > called -> ContainerManagerImpl#cleanUpApplicationsOnNMShutDown, and > ApplicationFinishEvent will be sent, and then some delection tasks will be > created, but be stored in DB and will be picked up to execute 3600 seconds. > # 100 seconds later, machine1 comes back, and the same app is assigned to > run this this machine, container created and works well. > # then deleting task created in step 2 will be picked up to delete > containers created in step 3 later. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat
[ https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-2889: Description: We introduce a way to limit the number of opportunistic containers that will be allocated on each AM heartbeat. This way we can restrict the number of opportunistic containers handed out by the system, as well as throttle down misbehaving AMs (asking for too many opportunistic containers). was: We introduce a way to limit the number of queueable requests that each AM can submit to the LocalRM. This way we can restrict the number of queueable containers handed out by the system, as well as throttle down misbehaving AMs (asking for too many queueable containers). > Limit the number of opportunistic container allocated per AM heartbeat > -- > > Key: YARN-2889 > URL: https://issues.apache.org/jira/browse/YARN-2889 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-2889.001.patch, YARN-2889.002.patch > > > We introduce a way to limit the number of opportunistic containers that will > be allocated on each AM heartbeat. > This way we can restrict the number of opportunistic containers handed out > by the system, as well as throttle down misbehaving AMs (asking for too many > opportunistic containers). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat
[ https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-2889: Attachment: YARN-2889.002.patch > Limit the number of opportunistic container allocated per AM heartbeat > -- > > Key: YARN-2889 > URL: https://issues.apache.org/jira/browse/YARN-2889 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-2889.001.patch, YARN-2889.002.patch > > > We introduce a way to limit the number of queueable requests that each AM can > submit to the LocalRM. > This way we can restrict the number of queueable containers handed out by the > system, as well as throttle down misbehaving AMs (asking for too many > queueable containers). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9428) Add metrics for paused containers in NodeManager
[ https://issues.apache.org/jira/browse/YARN-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807082#comment-16807082 ] Abhishek Modi commented on YARN-9428: - Thanks [~giovanni.fumarola] for review. Attached 002 patch with the fixes. Thanks. > Add metrics for paused containers in NodeManager > > > Key: YARN-9428 > URL: https://issues.apache.org/jira/browse/YARN-9428 > Project: Hadoop YARN > Issue Type: Task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9428.001.patch, YARN-9428.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9428) Add metrics for paused containers in NodeManager
[ https://issues.apache.org/jira/browse/YARN-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9428: Attachment: YARN-9428.002.patch > Add metrics for paused containers in NodeManager > > > Key: YARN-9428 > URL: https://issues.apache.org/jira/browse/YARN-9428 > Project: Hadoop YARN > Issue Type: Task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9428.001.patch, YARN-9428.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807068#comment-16807068 ] Hudson commented on YARN-8967: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16321 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16321/]) YARN-9431. Fix flaky junit test fair.TestAppRunnability after YARN-8967. (gifuma: rev da7f8c244d9ff3a3616f6b1dd4ebe3f35bfd3bbe) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAppRunnability.java > Change FairScheduler to use PlacementRule interface > --- > > Key: YARN-8967 > URL: https://issues.apache.org/jira/browse/YARN-8967 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8967.001.patch, YARN-8967.002.patch, > YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, > YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, > YARN-8967.009.patch, YARN-8967.010.patch, YARN-8967.011.patch > > > The PlacementRule interface was introduced to be used by all schedulers as > per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not > and is using its own rule definition. > YARN-8948 cleans up the implementation and removes the CS references which > should allow this change to go through. > This would be the first step in using one placement rule engine for both > schedulers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9431) Fix flaky junit test fair.TestAppRunnability after YARN-8967
[ https://issues.apache.org/jira/browse/YARN-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807067#comment-16807067 ] Hudson commented on YARN-9431: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16321 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16321/]) YARN-9431. Fix flaky junit test fair.TestAppRunnability after YARN-8967. (gifuma: rev da7f8c244d9ff3a3616f6b1dd4ebe3f35bfd3bbe) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAppRunnability.java > Fix flaky junit test fair.TestAppRunnability after YARN-8967 > > > Key: YARN-9431 > URL: https://issues.apache.org/jira/browse/YARN-9431 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, test >Affects Versions: 3.3.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Fix For: 3.3.0 > > Attachments: YARN-9431.001.patch > > > In YARN-4901 one of the scheduler tests failed. This seems to be linked to > the changes around the placement rules introduced in YARN-8967. > Applications submitted in the tests are accepted and rejected at the same > time: > {code} > 2019-04-01 12:00:57,269 INFO [main] fair.FairScheduler > (FairScheduler.java:addApplication(540)) - Accepted application > application_0_0001 from user: user1, in queue: root.user1, currently num of > applications: 1 > 2019-04-01 12:00:57,269 INFO [AsyncDispatcher event handler] > fair.FairScheduler (FairScheduler.java:rejectApplicationWithMessage(1344)) - > Reject application application_0_0001 submitted by user user1 application > rejected by placement rules. > {code} > This should never happen and is most likely due to the way the tests > generates the application and events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9418) ATSV2 /apps/appId/entities/YARN_CONTAINER rest api does not show metrics
[ https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807061#comment-16807061 ] Hudson commented on YARN-9418: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16320 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16320/]) YARN-9418. ATSV2 /apps//entities/YARN_CONTAINER rest api does not show (gifuma: rev 332cab5518ba9c70a5f191883db8c4d22e8e48b7) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/NMTimelinePublisher.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TimelineServiceV2Publisher.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisherForV2.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/TestNMTimelinePublisher.java > ATSV2 /apps/appId/entities/YARN_CONTAINER rest api does not show metrics > > > Key: YARN-9418 > URL: https://issues.apache.org/jira/browse/YARN-9418 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > Fix For: 3.3.0 > > Attachments: YARN-9418-001.patch, YARN-9418-002.patch, > YARN-9418-003.patch > > > ATSV2 entities rest api does not show the metrics > {code:java} > [hbase@yarn-ats-3 centos]$ curl -s > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS"; > | jq . > { > "metrics": [], > "events": [], > "createdtime": 1553695002014, > "idprefix": 0, > "type": "YARN_CONTAINER", > "id": "container_e18_1553685341603_0006_01_01", > "info": { > "UID": > "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01", > "FROM_ID": > "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01" > }, > "configs": {}, > "isrelatedto": {}, > "relatesto": {} > }{code} > NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this > is not shown in above output. Found NM container entities are set with > entityIdPrefix as inverted container starttime whereas RM container entities > are set with default 0. TimelineReader fetches only RM container entries. > Confirmed with setting NM container entities entityIdPrefix to 0 same as RM > (for testing purpose) and found metrics are shown. > {code:java} > "metrics": [ > { > "type": "SINGLE_VALUE", > "id": "MEMORY", > "aggregationOp": "NOP", > "values": { > "1553774981355": 490430464 > } > }, > { > "type": "SINGLE_VALUE", > "id": "CPU", > "aggregationOp": "NOP", > "values": { > "1553774981355": 5 > } > } > ]{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9431) Fix flaky junit test fair.TestAppRunnability after YARN-8967
[ https://issues.apache.org/jira/browse/YARN-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807058#comment-16807058 ] Giovanni Matteo Fumarola commented on YARN-9431: Thanks [~wilfreds] for the patch. LGTM +1. Committed to trunk. > Fix flaky junit test fair.TestAppRunnability after YARN-8967 > > > Key: YARN-9431 > URL: https://issues.apache.org/jira/browse/YARN-9431 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, test >Affects Versions: 3.3.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Attachments: YARN-9431.001.patch > > > In YARN-4901 one of the scheduler tests failed. This seems to be linked to > the changes around the placement rules introduced in YARN-8967. > Applications submitted in the tests are accepted and rejected at the same > time: > {code} > 2019-04-01 12:00:57,269 INFO [main] fair.FairScheduler > (FairScheduler.java:addApplication(540)) - Accepted application > application_0_0001 from user: user1, in queue: root.user1, currently num of > applications: 1 > 2019-04-01 12:00:57,269 INFO [AsyncDispatcher event handler] > fair.FairScheduler (FairScheduler.java:rejectApplicationWithMessage(1344)) - > Reject application application_0_0001 submitted by user user1 application > rejected by placement rules. > {code} > This should never happen and is most likely due to the way the tests > generates the application and events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9431) Fix flaky junit test fair.TestAppRunnability after YARN-8967
[ https://issues.apache.org/jira/browse/YARN-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-9431: --- Fix Version/s: 3.3.0 > Fix flaky junit test fair.TestAppRunnability after YARN-8967 > > > Key: YARN-9431 > URL: https://issues.apache.org/jira/browse/YARN-9431 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, test >Affects Versions: 3.3.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Fix For: 3.3.0 > > Attachments: YARN-9431.001.patch > > > In YARN-4901 one of the scheduler tests failed. This seems to be linked to > the changes around the placement rules introduced in YARN-8967. > Applications submitted in the tests are accepted and rejected at the same > time: > {code} > 2019-04-01 12:00:57,269 INFO [main] fair.FairScheduler > (FairScheduler.java:addApplication(540)) - Accepted application > application_0_0001 from user: user1, in queue: root.user1, currently num of > applications: 1 > 2019-04-01 12:00:57,269 INFO [AsyncDispatcher event handler] > fair.FairScheduler (FairScheduler.java:rejectApplicationWithMessage(1344)) - > Reject application application_0_0001 submitted by user user1 application > rejected by placement rules. > {code} > This should never happen and is most likely due to the way the tests > generates the application and events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9401) Fix `yarn version` print the version info is the same as `hadoop version`
[ https://issues.apache.org/jira/browse/YARN-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807053#comment-16807053 ] Yufei Gu edited comment on YARN-9401 at 4/1/19 6:20 PM: Do we plan to release YARN separately? Probably never. With that, I suggest to explore the idea of removing class YarnVersionInfo rather than this change. It is OK to remove it by looking at the reference in the web-app, besides, the class is "Private and Unstable". New more thoughts from people, cc [~vinodkv]. was (Author: yufeigu): Do we plan to release YARN separately? Probably never. With that, I suggest to explore the idea of removing class YarnVersionInfo rather than this change. It is OK to remove it by looking at the reference in the web-app, besides, the class is "Private and Unstable". New more thoughts from people, cc [~vikumar]. > Fix `yarn version` print the version info is the same as `hadoop version` > - > > Key: YARN-9401 > URL: https://issues.apache.org/jira/browse/YARN-9401 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Minor > Attachments: YARN-9401.001.patch, YARN-9401.002.patch > > > It's caused by in `yarn` shell used `org.apache.hadoop.util.VersionInfo` > instead of `org.apache.hadoop.yarn.util.YarnVersionInfo` as the > `HADOOP_CLASSNAME` by mistake. > {panel:title=Before} > Hadoop 3.3.0-SNAPSHOT > Source code repository [https://github.com/apache/hadoop.git] -r > 53a86e2b8ecb83b666d4ed223fc270e1a46642c1 > Compiled by jiwq on 2019-04-01T04:55Z > Compiled with protoc 2.5.0 > From source with checksum 829bd6e22c17c6da74f5c1a61647922 > {panel} > {panel:title=After} > YARN 3.3.0-SNAPSHOT > Subversion [https://github.com/apache/hadoop.git] -r > 53a86e2b8ecb83b666d4ed223fc270e1a46642c1 > Compiled by jiwq on 2019-04-01T05:06Z > From source with checksum e10a192bd933ffdafe435d7fe99d24d > {panel} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9401) Fix `yarn version` print the version info is the same as `hadoop version`
[ https://issues.apache.org/jira/browse/YARN-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807053#comment-16807053 ] Yufei Gu commented on YARN-9401: Do we plan to release YARN separately? Probably never. With that, I suggest to explore the idea of removing class YarnVersionInfo rather than this change. It is OK to remove it by looking at the reference in the web-app, besides, the class is "Private and Unstable". New more thoughts from people, cc [~vikumar]. > Fix `yarn version` print the version info is the same as `hadoop version` > - > > Key: YARN-9401 > URL: https://issues.apache.org/jira/browse/YARN-9401 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Minor > Attachments: YARN-9401.001.patch, YARN-9401.002.patch > > > It's caused by in `yarn` shell used `org.apache.hadoop.util.VersionInfo` > instead of `org.apache.hadoop.yarn.util.YarnVersionInfo` as the > `HADOOP_CLASSNAME` by mistake. > {panel:title=Before} > Hadoop 3.3.0-SNAPSHOT > Source code repository [https://github.com/apache/hadoop.git] -r > 53a86e2b8ecb83b666d4ed223fc270e1a46642c1 > Compiled by jiwq on 2019-04-01T04:55Z > Compiled with protoc 2.5.0 > From source with checksum 829bd6e22c17c6da74f5c1a61647922 > {panel} > {panel:title=After} > YARN 3.3.0-SNAPSHOT > Subversion [https://github.com/apache/hadoop.git] -r > 53a86e2b8ecb83b666d4ed223fc270e1a46642c1 > Compiled by jiwq on 2019-04-01T05:06Z > From source with checksum e10a192bd933ffdafe435d7fe99d24d > {panel} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9431) Fix flaky junit test fair.TestAppRunnability after YARN-8967
[ https://issues.apache.org/jira/browse/YARN-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-9431: --- Summary: Fix flaky junit test fair.TestAppRunnability after YARN-8967 (was: flaky junit test fair.TestAppRunnability after YARN-8967) > Fix flaky junit test fair.TestAppRunnability after YARN-8967 > > > Key: YARN-9431 > URL: https://issues.apache.org/jira/browse/YARN-9431 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, test >Affects Versions: 3.3.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Attachments: YARN-9431.001.patch > > > In YARN-4901 one of the scheduler tests failed. This seems to be linked to > the changes around the placement rules introduced in YARN-8967. > Applications submitted in the tests are accepted and rejected at the same > time: > {code} > 2019-04-01 12:00:57,269 INFO [main] fair.FairScheduler > (FairScheduler.java:addApplication(540)) - Accepted application > application_0_0001 from user: user1, in queue: root.user1, currently num of > applications: 1 > 2019-04-01 12:00:57,269 INFO [AsyncDispatcher event handler] > fair.FairScheduler (FairScheduler.java:rejectApplicationWithMessage(1344)) - > Reject application application_0_0001 submitted by user user1 application > rejected by placement rules. > {code} > This should never happen and is most likely due to the way the tests > generates the application and events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9227) DistributedShell RelativePath is not removed at end
[ https://issues.apache.org/jira/browse/YARN-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807051#comment-16807051 ] Hudson commented on YARN-9227: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16319 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16319/]) YARN-9227. DistributedShell RelativePath is not removed at end. (gifuma: rev b0d24ef39cbee53ae092f3aafeeebd22cd81bcac) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java > DistributedShell RelativePath is not removed at end > --- > > Key: YARN-9227 > URL: https://issues.apache.org/jira/browse/YARN-9227 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell >Affects Versions: 3.1.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Fix For: 3.3.0 > > Attachments: 0001-YARN-9227.patch, 0002-YARN-9227.patch, > 0003-YARN-9227.patch, YARN-9227-004.patch, YARN-9227-005.patch > > > DistributedShell Job does not remove the relative path which contains jars > and localized files. > {code} > [ambari-qa@ash hadoop-yarn]$ hadoop fs -ls > /user/ambari-qa/DistributedShell/application_1542665708563_0017 > Found 2 items > -rw-r--r-- 3 ambari-qa hdfs 46636 2019-01-23 13:37 > /user/ambari-qa/DistributedShell/application_1542665708563_0017/AppMaster.jar > -rwx--x--- 3 ambari-qa hdfs 4 2019-01-23 13:37 > /user/ambari-qa/DistributedShell/application_1542665708563_0017/shellCommands > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9428) Add metrics for paused containers in NodeManager
[ https://issues.apache.org/jira/browse/YARN-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807050#comment-16807050 ] Giovanni Matteo Fumarola edited comment on YARN-9428 at 4/1/19 6:14 PM: Thanks [~abmodi] for the change. A minor change. Can you change this instruction: @Metric MutableGaugeInt containersPaused; with @Metric("# of paused containers") MutableGaugeInt containersPaused; was (Author: giovanni.fumarola): Thanks [~abmodi] A minor change. Can you change this instruction: @Metric MutableGaugeInt containersPaused; with @Metric("# of paused containers") MutableGaugeInt containersPaused; > Add metrics for paused containers in NodeManager > > > Key: YARN-9428 > URL: https://issues.apache.org/jira/browse/YARN-9428 > Project: Hadoop YARN > Issue Type: Task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9428.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9428) Add metrics for paused containers in NodeManager
[ https://issues.apache.org/jira/browse/YARN-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807050#comment-16807050 ] Giovanni Matteo Fumarola commented on YARN-9428: Thanks [~abmodi] A minor change. Can you change this instruction: @Metric MutableGaugeInt containersPaused; with @Metric("# of paused containers") MutableGaugeInt containersPaused; > Add metrics for paused containers in NodeManager > > > Key: YARN-9428 > URL: https://issues.apache.org/jira/browse/YARN-9428 > Project: Hadoop YARN > Issue Type: Task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9428.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9418) ATSV2 /apps/appId/entities/YARN_CONTAINER rest api does not show metrics
[ https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807045#comment-16807045 ] Giovanni Matteo Fumarola commented on YARN-9418: Thanks [~Prabhu Joseph] for the patch. LGTM +1. Committed to trunk (The title in the commit shows /apps//entities/YARN_CONTAINER). > ATSV2 /apps/appId/entities/YARN_CONTAINER rest api does not show metrics > > > Key: YARN-9418 > URL: https://issues.apache.org/jira/browse/YARN-9418 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > Fix For: 3.3.0 > > Attachments: YARN-9418-001.patch, YARN-9418-002.patch, > YARN-9418-003.patch > > > ATSV2 entities rest api does not show the metrics > {code:java} > [hbase@yarn-ats-3 centos]$ curl -s > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS"; > | jq . > { > "metrics": [], > "events": [], > "createdtime": 1553695002014, > "idprefix": 0, > "type": "YARN_CONTAINER", > "id": "container_e18_1553685341603_0006_01_01", > "info": { > "UID": > "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01", > "FROM_ID": > "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01" > }, > "configs": {}, > "isrelatedto": {}, > "relatesto": {} > }{code} > NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this > is not shown in above output. Found NM container entities are set with > entityIdPrefix as inverted container starttime whereas RM container entities > are set with default 0. TimelineReader fetches only RM container entries. > Confirmed with setting NM container entities entityIdPrefix to 0 same as RM > (for testing purpose) and found metrics are shown. > {code:java} > "metrics": [ > { > "type": "SINGLE_VALUE", > "id": "MEMORY", > "aggregationOp": "NOP", > "values": { > "1553774981355": 490430464 > } > }, > { > "type": "SINGLE_VALUE", > "id": "CPU", > "aggregationOp": "NOP", > "values": { > "1553774981355": 5 > } > } > ]{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9418) ATSV2 /apps/appId/entities/YARN_CONTAINER rest api does not show metrics
[ https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-9418: --- Fix Version/s: 3.3.0 > ATSV2 /apps/appId/entities/YARN_CONTAINER rest api does not show metrics > > > Key: YARN-9418 > URL: https://issues.apache.org/jira/browse/YARN-9418 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > Fix For: 3.3.0 > > Attachments: YARN-9418-001.patch, YARN-9418-002.patch, > YARN-9418-003.patch > > > ATSV2 entities rest api does not show the metrics > {code:java} > [hbase@yarn-ats-3 centos]$ curl -s > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS"; > | jq . > { > "metrics": [], > "events": [], > "createdtime": 1553695002014, > "idprefix": 0, > "type": "YARN_CONTAINER", > "id": "container_e18_1553685341603_0006_01_01", > "info": { > "UID": > "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01", > "FROM_ID": > "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01" > }, > "configs": {}, > "isrelatedto": {}, > "relatesto": {} > }{code} > NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this > is not shown in above output. Found NM container entities are set with > entityIdPrefix as inverted container starttime whereas RM container entities > are set with default 0. TimelineReader fetches only RM container entries. > Confirmed with setting NM container entities entityIdPrefix to 0 same as RM > (for testing purpose) and found metrics are shown. > {code:java} > "metrics": [ > { > "type": "SINGLE_VALUE", > "id": "MEMORY", > "aggregationOp": "NOP", > "values": { > "1553774981355": 490430464 > } > }, > { > "type": "SINGLE_VALUE", > "id": "CPU", > "aggregationOp": "NOP", > "values": { > "1553774981355": 5 > } > } > ]{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9418) ATSV2 /apps/appId/entities/YARN_CONTAINER rest api does not show metrics
[ https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-9418: --- Summary: ATSV2 /apps/appId/entities/YARN_CONTAINER rest api does not show metrics (was: ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics) > ATSV2 /apps/appId/entities/YARN_CONTAINER rest api does not show metrics > > > Key: YARN-9418 > URL: https://issues.apache.org/jira/browse/YARN-9418 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-9418-001.patch, YARN-9418-002.patch, > YARN-9418-003.patch > > > ATSV2 entities rest api does not show the metrics > {code:java} > [hbase@yarn-ats-3 centos]$ curl -s > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS"; > | jq . > { > "metrics": [], > "events": [], > "createdtime": 1553695002014, > "idprefix": 0, > "type": "YARN_CONTAINER", > "id": "container_e18_1553685341603_0006_01_01", > "info": { > "UID": > "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01", > "FROM_ID": > "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01" > }, > "configs": {}, > "isrelatedto": {}, > "relatesto": {} > }{code} > NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this > is not shown in above output. Found NM container entities are set with > entityIdPrefix as inverted container starttime whereas RM container entities > are set with default 0. TimelineReader fetches only RM container entries. > Confirmed with setting NM container entities entityIdPrefix to 0 same as RM > (for testing purpose) and found metrics are shown. > {code:java} > "metrics": [ > { > "type": "SINGLE_VALUE", > "id": "MEMORY", > "aggregationOp": "NOP", > "values": { > "1553774981355": 490430464 > } > }, > { > "type": "SINGLE_VALUE", > "id": "CPU", > "aggregationOp": "NOP", > "values": { > "1553774981355": 5 > } > } > ]{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9227) DistributedShell RelativePath is not removed at end
[ https://issues.apache.org/jira/browse/YARN-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807033#comment-16807033 ] Giovanni Matteo Fumarola commented on YARN-9227: Thanks [~Prabhu Joseph] for the patch and [~snemeth] for the review. Committed to trunk. > DistributedShell RelativePath is not removed at end > --- > > Key: YARN-9227 > URL: https://issues.apache.org/jira/browse/YARN-9227 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell >Affects Versions: 3.1.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Fix For: 3.3.0 > > Attachments: 0001-YARN-9227.patch, 0002-YARN-9227.patch, > 0003-YARN-9227.patch, YARN-9227-004.patch, YARN-9227-005.patch > > > DistributedShell Job does not remove the relative path which contains jars > and localized files. > {code} > [ambari-qa@ash hadoop-yarn]$ hadoop fs -ls > /user/ambari-qa/DistributedShell/application_1542665708563_0017 > Found 2 items > -rw-r--r-- 3 ambari-qa hdfs 46636 2019-01-23 13:37 > /user/ambari-qa/DistributedShell/application_1542665708563_0017/AppMaster.jar > -rwx--x--- 3 ambari-qa hdfs 4 2019-01-23 13:37 > /user/ambari-qa/DistributedShell/application_1542665708563_0017/shellCommands > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9227) DistributedShell RelativePath is not removed at end
[ https://issues.apache.org/jira/browse/YARN-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-9227: --- Fix Version/s: 3.3.0 > DistributedShell RelativePath is not removed at end > --- > > Key: YARN-9227 > URL: https://issues.apache.org/jira/browse/YARN-9227 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell >Affects Versions: 3.1.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Fix For: 3.3.0 > > Attachments: 0001-YARN-9227.patch, 0002-YARN-9227.patch, > 0003-YARN-9227.patch, YARN-9227-004.patch, YARN-9227-005.patch > > > DistributedShell Job does not remove the relative path which contains jars > and localized files. > {code} > [ambari-qa@ash hadoop-yarn]$ hadoop fs -ls > /user/ambari-qa/DistributedShell/application_1542665708563_0017 > Found 2 items > -rw-r--r-- 3 ambari-qa hdfs 46636 2019-01-23 13:37 > /user/ambari-qa/DistributedShell/application_1542665708563_0017/AppMaster.jar > -rwx--x--- 3 ambari-qa hdfs 4 2019-01-23 13:37 > /user/ambari-qa/DistributedShell/application_1542665708563_0017/shellCommands > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9281) Add express upgrade button to Appcatalog UI
[ https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807028#comment-16807028 ] Hadoop QA commented on YARN-9281: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 28s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 57s{color} | {color:green} hadoop-yarn-applications-catalog-webapp in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 52m 45s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9281 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12963941/YARN-9281.006.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle | | uname | Linux f453a01ce486 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 35b0a38 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23855/testReport/ | | Max. process+thread count | 340 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-c
[jira] [Commented] (YARN-9255) Improve recommend applications order
[ https://issues.apache.org/jira/browse/YARN-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807007#comment-16807007 ] Hudson commented on YARN-9255: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16317 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16317/]) YARN-9255. Improve recommend applications order and fix findbugs (billie: rev 35b0a381e7bc8bbf74adfa51feee1d54d8675c06) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp/src/main/java/org/apache/hadoop/yarn/appcatalog/model/AppDetails.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp/src/main/java/org/apache/hadoop/yarn/appcatalog/model/Application.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp/src/main/java/org/apache/hadoop/yarn/appcatalog/application/AppCatalogSolrClient.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp/src/test/java/org/apache/hadoop/yarn/appcatalog/application/TestAppCatalogSolrClient.java > Improve recommend applications order > > > Key: YARN-9255 > URL: https://issues.apache.org/jira/browse/YARN-9255 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9255.001.patch, YARN-9255.002.patch, > YARN-9255.003.patch > > > When there is no search term in application catalog, the recommended > application list is random. The relevance can be fine tuned to be sorted by > number of downloads or alphabetic order. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9255) Improve recommend applications order
[ https://issues.apache.org/jira/browse/YARN-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806951#comment-16806951 ] Billie Rinaldi commented on YARN-9255: -- +1 for patch 3. Thanks, [~eyang]! > Improve recommend applications order > > > Key: YARN-9255 > URL: https://issues.apache.org/jira/browse/YARN-9255 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9255.001.patch, YARN-9255.002.patch, > YARN-9255.003.patch > > > When there is no search term in application catalog, the recommended > application list is random. The relevance can be fine tuned to be sorted by > number of downloads or alphabetic order. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9433) Remove unused constants from RMAuditLogger
Adam Antal created YARN-9433: Summary: Remove unused constants from RMAuditLogger Key: YARN-9433 URL: https://issues.apache.org/jira/browse/YARN-9433 Project: Hadoop YARN Issue Type: Task Components: yarn Affects Versions: 3.2.0 Reporter: Adam Antal There are some unused constants in RMAuditLogger that the IntelliJ warns you about. Currently what I'm seeing is that the following {{public static final String}} constants are unused: * AM_ALLOCATE * CHANGE_CONTAINER_RESOURCE * CREATE_NEW_RESERVATION_REQUEST Probably they are no longer needed. This task aims to remove those unused constants. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9401) Fix `yarn version` print the version info is the same as `hadoop version`
[ https://issues.apache.org/jira/browse/YARN-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806863#comment-16806863 ] Weiwei Yang commented on YARN-9401: --- The output looks good. But I think [~yufeigu] raise a good point, it seems {{VersionInfo}} was used in both HDFS/MR/YARN, they return same Hadoop version. It will make it inconsistent if this change is only made to YARN. I don't think we should change this, before we cleanly separate these projects. It's a bit overlap, but rightnow {{YarnVersionInfo}} was only used in web-app. > Fix `yarn version` print the version info is the same as `hadoop version` > - > > Key: YARN-9401 > URL: https://issues.apache.org/jira/browse/YARN-9401 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Minor > Attachments: YARN-9401.001.patch, YARN-9401.002.patch > > > It's caused by in `yarn` shell used `org.apache.hadoop.util.VersionInfo` > instead of `org.apache.hadoop.yarn.util.YarnVersionInfo` as the > `HADOOP_CLASSNAME` by mistake. > {panel:title=Before} > Hadoop 3.3.0-SNAPSHOT > Source code repository [https://github.com/apache/hadoop.git] -r > 53a86e2b8ecb83b666d4ed223fc270e1a46642c1 > Compiled by jiwq on 2019-04-01T04:55Z > Compiled with protoc 2.5.0 > From source with checksum 829bd6e22c17c6da74f5c1a61647922 > {panel} > {panel:title=After} > YARN 3.3.0-SNAPSHOT > Subversion [https://github.com/apache/hadoop.git] -r > 53a86e2b8ecb83b666d4ed223fc270e1a46642c1 > Compiled by jiwq on 2019-04-01T05:06Z > From source with checksum e10a192bd933ffdafe435d7fe99d24d > {panel} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9401) Fix `yarn version` print the version info is the same as `hadoop version`
[ https://issues.apache.org/jira/browse/YARN-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806396#comment-16806396 ] Wanqiang Ji edited comment on YARN-9401 at 4/1/19 2:23 PM: --- Hi [~cheersyang], the output of the version command before and after changed as follows: {panel:title=Before} Hadoop 3.3.0-SNAPSHOT Source code repository [https://github.com/apache/hadoop.git] -r 53a86e2b8ecb83b666d4ed223fc270e1a46642c1 Compiled by jiwq on 2019-04-01T04:55Z Compiled with protoc 2.5.0 From source with checksum 829bd6e22c17c6da74f5c1a61647922 {panel} {panel:title=After} YARN 3.3.0-SNAPSHOT Subversion [https://github.com/apache/hadoop.git] -r 53a86e2b8ecb83b666d4ed223fc270e1a46642c1 Compiled by jiwq on 2019-04-01T05:06Z From source with checksum e10a192bd933ffdafe435d7fe99d24d {panel} was (Author: jiwq): Hi [~cheersyang] {panel:title=Before} Hadoop 3.3.0-SNAPSHOT Source code repository [https://github.com/apache/hadoop.git] -r 53a86e2b8ecb83b666d4ed223fc270e1a46642c1 Compiled by jiwq on 2019-04-01T04:55Z Compiled with protoc 2.5.0 From source with checksum 829bd6e22c17c6da74f5c1a61647922 {panel} {panel:title=After} YARN 3.3.0-SNAPSHOT Subversion [https://github.com/apache/hadoop.git] -r 53a86e2b8ecb83b666d4ed223fc270e1a46642c1 Compiled by jiwq on 2019-04-01T05:06Z From source with checksum e10a192bd933ffdafe435d7fe99d24d {panel} > Fix `yarn version` print the version info is the same as `hadoop version` > - > > Key: YARN-9401 > URL: https://issues.apache.org/jira/browse/YARN-9401 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Minor > Attachments: YARN-9401.001.patch, YARN-9401.002.patch > > > It's caused by in `yarn` shell used `org.apache.hadoop.util.VersionInfo` > instead of `org.apache.hadoop.yarn.util.YarnVersionInfo` as the > `HADOOP_CLASSNAME` by mistake. > {panel:title=Before} > Hadoop 3.3.0-SNAPSHOT > Source code repository [https://github.com/apache/hadoop.git] -r > 53a86e2b8ecb83b666d4ed223fc270e1a46642c1 > Compiled by jiwq on 2019-04-01T04:55Z > Compiled with protoc 2.5.0 > From source with checksum 829bd6e22c17c6da74f5c1a61647922 > {panel} > {panel:title=After} > YARN 3.3.0-SNAPSHOT > Subversion [https://github.com/apache/hadoop.git] -r > 53a86e2b8ecb83b666d4ed223fc270e1a46642c1 > Compiled by jiwq on 2019-04-01T05:06Z > From source with checksum e10a192bd933ffdafe435d7fe99d24d > {panel} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1
[ https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806746#comment-16806746 ] Szilard Nemeth commented on YARN-8701: -- Latest patch +1 (non-binding) > If the single parameter in Resources#createResourceWithSameValue is greater > than Integer.MAX_VALUE, then the value of vcores will be -1 > --- > > Key: YARN-8701 > URL: https://issues.apache.org/jira/browse/YARN-8701 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Reporter: Sen Zhao >Assignee: Sen Zhao >Priority: Major > Attachments: YARN-8701.001.patch, YARN-8701.002.patch, > YARN-8701.003.patch, YARN-8701.004.patch > > > If I configure *MaxResources* in fair-scheduler.xml, like this: > {code}resource1=50{code} > In the queue, the *MaxResources* value will change to > {code}Max Resources: {code} > I think the value of VCores should be *CLUSTER_VCORES*. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8943) Upgrade JUnit from 4 to 5 in hadoop-yarn-api
[ https://issues.apache.org/jira/browse/YARN-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806742#comment-16806742 ] Szilard Nemeth commented on YARN-8943: -- Thanks for the follow-up on this! > Upgrade JUnit from 4 to 5 in hadoop-yarn-api > > > Key: YARN-8943 > URL: https://issues.apache.org/jira/browse/YARN-8943 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Attachments: YARN-8943.01.patch, YARN-8943.02.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9431) flaky junit test fair.TestAppRunnability after YARN-8967
[ https://issues.apache.org/jira/browse/YARN-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806613#comment-16806613 ] Peter Bacsko edited comment on YARN-9431 at 4/1/19 1:11 PM: I verified the changes by running TestAppRunnability 1000 times and things are back to normal. +1 (non-binding) from me. was (Author: pbacsko): I verified the changes with running TestAppRunnability 1000 and things are back to normal. +1 (non-binding) from me. > flaky junit test fair.TestAppRunnability after YARN-8967 > > > Key: YARN-9431 > URL: https://issues.apache.org/jira/browse/YARN-9431 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, test >Affects Versions: 3.3.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Attachments: YARN-9431.001.patch > > > In YARN-4901 one of the scheduler tests failed. This seems to be linked to > the changes around the placement rules introduced in YARN-8967. > Applications submitted in the tests are accepted and rejected at the same > time: > {code} > 2019-04-01 12:00:57,269 INFO [main] fair.FairScheduler > (FairScheduler.java:addApplication(540)) - Accepted application > application_0_0001 from user: user1, in queue: root.user1, currently num of > applications: 1 > 2019-04-01 12:00:57,269 INFO [AsyncDispatcher event handler] > fair.FairScheduler (FairScheduler.java:rejectApplicationWithMessage(1344)) - > Reject application application_0_0001 submitted by user user1 application > rejected by placement rules. > {code} > This should never happen and is most likely due to the way the tests > generates the application and events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9432) Excess reserved containers may exist for a long time after its request has been cancelled or satisfied when multi-nodes enabled
[ https://issues.apache.org/jira/browse/YARN-9432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9432: --- Attachment: YARN-9432.001.patch > Excess reserved containers may exist for a long time after its request has > been cancelled or satisfied when multi-nodes enabled > --- > > Key: YARN-9432 > URL: https://issues.apache.org/jira/browse/YARN-9432 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9432.001.patch > > > Reserved containers may change to be excess after its request has been > cancelled or satisfied, excess reserved containers need to be unreserved > quickly to release resource for others. > For multi-nodes disabled scenario, excess reserved containers can be quickly > released in next node heartbeat, the calling stack is > CapacityScheduler#nodeUpdate --> CapacityScheduler#allocateContainersToNode > --> CapacityScheduler#allocateContainerOnSingleNode. > But for multi-nodes enabled scenario, excess reserved containers have chance > to be released only in allocation process, key phase of the calling stack is > LeafQueue#assignContainers --> LeafQueue#allocateFromReservedContainer. > According to this, excess reserved containers may not be released until its > queue has pending request and has chance to be allocated, and the worst is > that excess reserved containers will never be released and keep holding > resource if there is no additional pending request for this queue. > To solve this problem, my opinion is to directly kill excess reserved > containers when request is satisfied (in FiCaSchedulerApp#apply) or the > allocation number of resource-requests/scheduling-requests is updated to be 0 > (in SchedulerApplicationAttempt#updateResourceRequests / > SchedulerApplicationAttempt#updateSchedulingRequests). > Please feel free to give your suggestions. Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9432) Excess reserved containers may exist for a long time after its request has been cancelled or satisfied when multi-nodes enabled
[ https://issues.apache.org/jira/browse/YARN-9432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9432: --- Attachment: (was: YARN-9432.001.patch) > Excess reserved containers may exist for a long time after its request has > been cancelled or satisfied when multi-nodes enabled > --- > > Key: YARN-9432 > URL: https://issues.apache.org/jira/browse/YARN-9432 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > > Reserved containers may change to be excess after its request has been > cancelled or satisfied, excess reserved containers need to be unreserved > quickly to release resource for others. > For multi-nodes disabled scenario, excess reserved containers can be quickly > released in next node heartbeat, the calling stack is > CapacityScheduler#nodeUpdate --> CapacityScheduler#allocateContainersToNode > --> CapacityScheduler#allocateContainerOnSingleNode. > But for multi-nodes enabled scenario, excess reserved containers have chance > to be released only in allocation process, key phase of the calling stack is > LeafQueue#assignContainers --> LeafQueue#allocateFromReservedContainer. > According to this, excess reserved containers may not be released until its > queue has pending request and has chance to be allocated, and the worst is > that excess reserved containers will never be released and keep holding > resource if there is no additional pending request for this queue. > To solve this problem, my opinion is to directly kill excess reserved > containers when request is satisfied (in FiCaSchedulerApp#apply) or the > allocation number of resource-requests/scheduling-requests is updated to be 0 > (in SchedulerApplicationAttempt#updateResourceRequests / > SchedulerApplicationAttempt#updateSchedulingRequests). > Please feel free to give your suggestions. Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9432) Excess reserved containers may exist for a long time after its request has been cancelled or satisfied when multi-nodes enabled
[ https://issues.apache.org/jira/browse/YARN-9432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806676#comment-16806676 ] Tao Yang commented on YARN-9432: Attached v1 patch for review. > Excess reserved containers may exist for a long time after its request has > been cancelled or satisfied when multi-nodes enabled > --- > > Key: YARN-9432 > URL: https://issues.apache.org/jira/browse/YARN-9432 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9432.001.patch > > > Reserved containers may change to be excess after its request has been > cancelled or satisfied, excess reserved containers need to be unreserved > quickly to release resource for others. > For multi-nodes disabled scenario, excess reserved containers can be quickly > released in next node heartbeat, the calling stack is > CapacityScheduler#nodeUpdate --> CapacityScheduler#allocateContainersToNode > --> CapacityScheduler#allocateContainerOnSingleNode. > But for multi-nodes enabled scenario, excess reserved containers have chance > to be released only in allocation process, key phase of the calling stack is > LeafQueue#assignContainers --> LeafQueue#allocateFromReservedContainer. > According to this, excess reserved containers may not be released until its > queue has pending request and has chance to be allocated, and the worst is > that excess reserved containers will never be released and keep holding > resource if there is no additional pending request for this queue. > To solve this problem, my opinion is to directly kill excess reserved > containers when request is satisfied (in FiCaSchedulerApp#apply) or the > allocation number of resource-requests/scheduling-requests is updated to be 0 > (in SchedulerApplicationAttempt#updateResourceRequests / > SchedulerApplicationAttempt#updateSchedulingRequests). > Please feel free to give your suggestions. Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9432) Excess reserved containers may exist for a long time after its request has been cancelled or satisfied when multi-nodes enabled
[ https://issues.apache.org/jira/browse/YARN-9432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9432: --- Attachment: YARN-9432.001.patch > Excess reserved containers may exist for a long time after its request has > been cancelled or satisfied when multi-nodes enabled > --- > > Key: YARN-9432 > URL: https://issues.apache.org/jira/browse/YARN-9432 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9432.001.patch > > > Reserved containers may change to be excess after its request has been > cancelled or satisfied, excess reserved containers need to be unreserved > quickly to release resource for others. > For multi-nodes disabled scenario, excess reserved containers can be quickly > released in next node heartbeat, the calling stack is > CapacityScheduler#nodeUpdate --> CapacityScheduler#allocateContainersToNode > --> CapacityScheduler#allocateContainerOnSingleNode. > But for multi-nodes enabled scenario, excess reserved containers have chance > to be released only in allocation process, key phase of the calling stack is > LeafQueue#assignContainers --> LeafQueue#allocateFromReservedContainer. > According to this, excess reserved containers may not be released until its > queue has pending request and has chance to be allocated, and the worst is > that excess reserved containers will never be released and keep holding > resource if there is no additional pending request for this queue. > To solve this problem, my opinion is to directly kill excess reserved > containers when request is satisfied (in FiCaSchedulerApp#apply) or the > allocation number of resource-requests/scheduling-requests is updated to be 0 > (in SchedulerApplicationAttempt#updateResourceRequests / > SchedulerApplicationAttempt#updateSchedulingRequests). > Please feel free to give your suggestions. Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9430) Recovering containers does not check available resources on node
[ https://issues.apache.org/jira/browse/YARN-9430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806627#comment-16806627 ] Szilard Nemeth edited comment on YARN-9430 at 4/1/19 10:29 AM: --- Hi [~adam.antal]! Thanks for the comment and mentioning these scenarios! In general, I would like to involve more experienced people to help with the decision: [~tangzhankun], [~sunilg], [~leftnoteasy], [~wilfreds] After our offline discussion with [~adam.antal] and [~shuzirra], we have the following questions: # Should NM kill containers when mapped GPU devices are not present (or if there's not enough resources)? For example: Requested 1 GPU but no GPU is available, what should happen? For this to decide, it's crucial to know what is the motivation behind saving container-GPU device mappings into the state store. AFAIK, NM is assigning containers to "random" GPU devices. If the mapping between GPU and the container does not matter while the container starts, why does it matter during recovery? # Do you agree to kill containers on NM-side if there's not enough resources for the container? AM should handle if any container is lost anyways. # If an assigned GPU (GPU #1) is offline after recovery but another GPU (GPU #2) is available, what should NM do? Should it allocate GPU #2 for the container? The answer for this question is also affected by the decision about whether we keep the GPU-container mappings or not. Thanks! was (Author: snemeth): Hi [~adam.antal]! Thanks for the comment and mentioning these scenarios! In general, I would like to involve more experienced people to help with the decision: [~tangzhankun], [~sunilg], [~leftnoteasy], [~wilfreds] After our offline discussion with [~adam.antal] and [~shuzirra], we have the following questions: # Should NM kill containers when mapped GPU devices are not present (or if there's not enough resources)? For example: Requested 1 GPU but no GPU is available, what should happen? For this to decide, it's crucial to know what is the motivation behind saving container-GPU device mappings into the state store. AFAIK, NM is assigning containers to "random" GPU devices. If the mapping between GPU and the container does not matter while the container starts, why does it matter during recovery? # Do you agree to kill containers on NM-side if there's not enough resources for the container? AM should handle if any container is lost anyways. # If an assigned GPU (GPU #1) is offline after recovery but another GPU (GPU #2) is available, what should NM do? Should it allocate GPU #2 for the container? The answer for this question is also affected by the decision about whether we keep the GPU-container mappings or not. Thanks! > Recovering containers does not check available resources on node > > > Key: YARN-9430 > URL: https://issues.apache.org/jira/browse/YARN-9430 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Critical > > I have a testcase that checks if some GPU devices gone offline and recovery > happens, only the containers that fit into the node's resources will be > recovered. Unfortunately, this is not the case: RM does not check available > resources on node during recovery. > *Detailed explanation:* > *Testcase:* > 1. There are 2 nodes running NodeManagers > 2. nvidia-smi is replaced with a fake bash script that reports 2 GPU devices > per node, initially. This means 4 GPU devices in the cluster altogether. > 3. RM / NM recovery is enabled > 4. The test starts off a sleep job, requesting 4 containers, 1 GPU device > for each (AM does not request GPUs) > 5. Before restart, the fake bash script is adjusted to report 1 GPU device > per node (2 in the cluster) after restart. > 6. Restart is initiated. > > *Expected behavior:* > After restart, only the AM and 2 normal containers should have been started, > as there are only 2 GPU devices in the cluster. > > *Actual behaviour:* > AM + 4 containers are allocated, this is all containers started originally > with step 4. > App id was: 1553977186701_0001 > *Logs*: > > {code:java} > 2019-03-30 13:22:30,299 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Processing event for appattempt_1553977186701_0001_01 of type RECOVER > 2019-03-30 13:22:30,366 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Added Application Attempt appattempt_1553977186701_0001_01 to scheduler > from user: systest > 2019-03-30 13:22:30,366 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > appattempt_1553977186701_0001_01 is recovering. Ski
[jira] [Comment Edited] (YARN-9430) Recovering containers does not check available resources on node
[ https://issues.apache.org/jira/browse/YARN-9430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806627#comment-16806627 ] Szilard Nemeth edited comment on YARN-9430 at 4/1/19 10:29 AM: --- Hi [~adam.antal]! Thanks for the comment and mentioning these scenarios! In general, I would like to involve more experienced people to help with the decision: [~tangzhankun], [~sunilg], [~leftnoteasy], [~wilfreds] After our offline discussion with [~adam.antal] and [~shuzirra], we have the following questions: # Should NM kill containers when mapped GPU devices are not present (or if there's not enough resources)? For example: Requested 1 GPU but no GPU is available, what should happen? For this to decide, it's crucial to know what is the motivation behind saving container-GPU device mappings into the state store. AFAIK, NM is assigning containers to "random" GPU devices. If the mapping between GPU and the container does not matter while the container starts, why does it matter during recovery? # Do you agree to kill containers on NM-side if there's not enough resources for the container? AM should handle if any container is lost anyways. # If an assigned GPU (GPU #1) is offline after recovery but another GPU (GPU #2) is available, what should NM do? Should it allocate GPU #2 for the container? The answer for this question is also affected by the decision about whether we keep the GPU-container mappings or not. Thanks! was (Author: snemeth): Hi [~adam.antal]! Thanks for the comment and mentioning these scenarios! In general, I would like to involve more experienced people to help with the decision: [~tangzhankun], [~sunilg], [~leftnoteasy], [~wilfreds] After our offline discussion with [~adam.antal] and [~shuzirra], we have the following questions: # Should NM kill containers when mapped GPU devices are not present (or if there's not enough resources)? For example: Requested 1 GPU but no GPU is available, what should happen? For this to decide, it's crucial to know what is the motivation behind saving container-GPU device mappings into the state store. AFAIK, NM is assigning containers to "random" GPU devices. If the mapping between GPU and the container does not matter while the container starts, why does it matter during recovery? # Do you agree to kill containers on NM-side if there's not enough resources for the container? AM should handle if any container is lost anyways. # If an assigned GPU (GPU #1) is offline after recovery but another GPU (GPU #2) is available, what should NM do? Should it allocate GPU #2 for the container? The answer for this question is also affected by the decision about whether we keep the GPU-container mappings or not. Thanks! > Recovering containers does not check available resources on node > > > Key: YARN-9430 > URL: https://issues.apache.org/jira/browse/YARN-9430 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Critical > > I have a testcase that checks if some GPU devices gone offline and recovery > happens, only the containers that fit into the node's resources will be > recovered. Unfortunately, this is not the case: RM does not check available > resources on node during recovery. > *Detailed explanation:* > *Testcase:* > 1. There are 2 nodes running NodeManagers > 2. nvidia-smi is replaced with a fake bash script that reports 2 GPU devices > per node, initially. This means 4 GPU devices in the cluster altogether. > 3. RM / NM recovery is enabled > 4. The test starts off a sleep job, requesting 4 containers, 1 GPU device > for each (AM does not request GPUs) > 5. Before restart, the fake bash script is adjusted to report 1 GPU device > per node (2 in the cluster) after restart. > 6. Restart is initiated. > > *Expected behavior:* > After restart, only the AM and 2 normal containers should have been started, > as there are only 2 GPU devices in the cluster. > > *Actual behaviour:* > AM + 4 containers are allocated, this is all containers started originally > with step 4. > App id was: 1553977186701_0001 > *Logs*: > > {code:java} > 2019-03-30 13:22:30,299 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Processing event for appattempt_1553977186701_0001_01 of type RECOVER > 2019-03-30 13:22:30,366 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Added Application Attempt appattempt_1553977186701_0001_01 to scheduler > from user: systest > 2019-03-30 13:22:30,366 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > appattempt_1553977186701_0001_01 is recovering. S
[jira] [Commented] (YARN-9430) Recovering containers does not check available resources on node
[ https://issues.apache.org/jira/browse/YARN-9430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806627#comment-16806627 ] Szilard Nemeth commented on YARN-9430: -- Hi [~adam.antal]! Thanks for the comment and mentioning these scenarios! In general, I would like to involve more experienced people to help with the decision: [~tangzhankun], [~sunilg], [~leftnoteasy], [~wilfreds] After our offline discussion with [~adam.antal] and [~shuzirra], we have the following questions: # Should NM kill containers when mapped GPU devices are not present (or if there's not enough resources)? For example: Requested 1 GPU but no GPU is available, what should happen? For this to decide, it's crucial to know what is the motivation behind saving container-GPU device mappings into the state store. AFAIK, NM is assigning containers to "random" GPU devices. If the mapping between GPU and the container does not matter while the container starts, why does it matter during recovery? # Do you agree to kill containers on NM-side if there's not enough resources for the container? AM should handle if any container is lost anyways. # If an assigned GPU (GPU #1) is offline after recovery but another GPU (GPU #2) is available, what should NM do? Should it allocate GPU #2 for the container? The answer for this question is also affected by the decision about whether we keep the GPU-container mappings or not. Thanks! > Recovering containers does not check available resources on node > > > Key: YARN-9430 > URL: https://issues.apache.org/jira/browse/YARN-9430 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Critical > > I have a testcase that checks if some GPU devices gone offline and recovery > happens, only the containers that fit into the node's resources will be > recovered. Unfortunately, this is not the case: RM does not check available > resources on node during recovery. > *Detailed explanation:* > *Testcase:* > 1. There are 2 nodes running NodeManagers > 2. nvidia-smi is replaced with a fake bash script that reports 2 GPU devices > per node, initially. This means 4 GPU devices in the cluster altogether. > 3. RM / NM recovery is enabled > 4. The test starts off a sleep job, requesting 4 containers, 1 GPU device > for each (AM does not request GPUs) > 5. Before restart, the fake bash script is adjusted to report 1 GPU device > per node (2 in the cluster) after restart. > 6. Restart is initiated. > > *Expected behavior:* > After restart, only the AM and 2 normal containers should have been started, > as there are only 2 GPU devices in the cluster. > > *Actual behaviour:* > AM + 4 containers are allocated, this is all containers started originally > with step 4. > App id was: 1553977186701_0001 > *Logs*: > > {code:java} > 2019-03-30 13:22:30,299 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Processing event for appattempt_1553977186701_0001_01 of type RECOVER > 2019-03-30 13:22:30,366 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Added Application Attempt appattempt_1553977186701_0001_01 to scheduler > from user: systest > 2019-03-30 13:22:30,366 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > appattempt_1553977186701_0001_01 is recovering. Skipping notifying > ATTEMPT_ADDED > 2019-03-30 13:22:30,367 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1553977186701_0001_01 State change from NEW to LAUNCHED on > event = RECOVER > 2019-03-30 13:22:33,257 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: > Recovering container [container_e84_1553977186701_0001_01_01, > CreateTime: 1553977260732, Version: 0, State: RUNNING, Capability: > , Diagnostics: , ExitStatus: -1000, > NodeLabelExpression: Priority: 0] > 2019-03-30 13:22:33,275 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: > Recovering container [container_e84_1553977186701_0001_01_04, > CreateTime: 1553977272802, Version: 0, State: RUNNING, Capability: > , Diagnostics: , ExitStatus: -1000, > NodeLabelExpression: Priority: 0] > 2019-03-30 13:22:33,275 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: > Assigned container container_e84_1553977186701_0001_01_04 of capacity > on host > snemeth-gpu-2.vpc.cloudera.com:8041, which has 2 containers, vCores:2, yarn.io/gpu: 1> used and available after > allocation > 2019-03-30 13:22:33,276 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnSchedu
[jira] [Commented] (YARN-9431) flaky junit test fair.TestAppRunnability after YARN-8967
[ https://issues.apache.org/jira/browse/YARN-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806613#comment-16806613 ] Peter Bacsko commented on YARN-9431: I verified the changes with running TestAppRunnability 1000 and things are back to normal. +1 (non-binding) from me. > flaky junit test fair.TestAppRunnability after YARN-8967 > > > Key: YARN-9431 > URL: https://issues.apache.org/jira/browse/YARN-9431 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, test >Affects Versions: 3.3.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Attachments: YARN-9431.001.patch > > > In YARN-4901 one of the scheduler tests failed. This seems to be linked to > the changes around the placement rules introduced in YARN-8967. > Applications submitted in the tests are accepted and rejected at the same > time: > {code} > 2019-04-01 12:00:57,269 INFO [main] fair.FairScheduler > (FairScheduler.java:addApplication(540)) - Accepted application > application_0_0001 from user: user1, in queue: root.user1, currently num of > applications: 1 > 2019-04-01 12:00:57,269 INFO [AsyncDispatcher event handler] > fair.FairScheduler (FairScheduler.java:rejectApplicationWithMessage(1344)) - > Reject application application_0_0001 submitted by user user1 application > rejected by placement rules. > {code} > This should never happen and is most likely due to the way the tests > generates the application and events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9429) A status code error in ResourceManager REST api doc
[ https://issues.apache.org/jira/browse/YARN-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-9429: - Summary: A status code error in ResourceManager REST api doc (was: A status code error in RssourceManager REST api doc) > A status code error in ResourceManager REST api doc > --- > > Key: YARN-9429 > URL: https://issues.apache.org/jira/browse/YARN-9429 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Jinjiang Ling >Assignee: Jinjiang Ling >Priority: Major > Attachments: YARN-9429.001.patch > > > A status code error in ResourceManager api docs. > In section "Cluster Application State API",the unauthorized error response > header is described blow > {code} > Response Header: > HTTP/1.1 403 Unauthorized > Server: Jetty(6.1.26) > {code} > As commonly known, the unauthorized status code should be *401*. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9429) A status code error in ResourceManager REST api doc
[ https://issues.apache.org/jira/browse/YARN-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-9429: - Description: A status code error in ResourceManager api docs. In section "Cluster Application State API",the unauthorized error response header is described below. {code:java} Response Header: HTTP/1.1 403 Unauthorized Server: Jetty(6.1.26) {code} As commonly known, the unauthorized status code should be *401*. was: A status code error in ResourceManager api docs. In section "Cluster Application State API",the unauthorized error response header is described blow {code} Response Header: HTTP/1.1 403 Unauthorized Server: Jetty(6.1.26) {code} As commonly known, the unauthorized status code should be *401*. > A status code error in ResourceManager REST api doc > --- > > Key: YARN-9429 > URL: https://issues.apache.org/jira/browse/YARN-9429 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Jinjiang Ling >Assignee: Jinjiang Ling >Priority: Major > Attachments: YARN-9429.001.patch > > > A status code error in ResourceManager api docs. > In section "Cluster Application State API",the unauthorized error response > header is described below. > {code:java} > Response Header: > HTTP/1.1 403 Unauthorized > Server: Jetty(6.1.26) > {code} > As commonly known, the unauthorized status code should be *401*. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1
[ https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806539#comment-16806539 ] Hadoop QA commented on YARN-8701: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 57s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 50s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 42s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 84m 38s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8701 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964408/YARN-8701.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 12852357125c 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 53a86e2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23854/testReport/ | | Max. process+thread count | 306 (vs. ulimit of 1)
[jira] [Commented] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat
[ https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806536#comment-16806536 ] Hadoop QA commented on YARN-2889: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 59s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 14s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 22s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 50s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 44s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 58s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 84m 7s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}201m 49s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-2889 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964398/YARN-2889.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5f9508e4a68f 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri O
[jira] [Created] (YARN-9432) Excess reserved containers may exist for a long time after its request has been cancelled or satisfied when multi-nodes enabled
Tao Yang created YARN-9432: -- Summary: Excess reserved containers may exist for a long time after its request has been cancelled or satisfied when multi-nodes enabled Key: YARN-9432 URL: https://issues.apache.org/jira/browse/YARN-9432 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Tao Yang Assignee: Tao Yang Reserved containers may change to be excess after its request has been cancelled or satisfied, excess reserved containers need to be unreserved quickly to release resource for others. For multi-nodes disabled scenario, excess reserved containers can be quickly released in next node heartbeat, the calling stack is CapacityScheduler#nodeUpdate --> CapacityScheduler#allocateContainersToNode --> CapacityScheduler#allocateContainerOnSingleNode. But for multi-nodes enabled scenario, excess reserved containers have chance to be released only in allocation process, key phase of the calling stack is LeafQueue#assignContainers --> LeafQueue#allocateFromReservedContainer. According to this, excess reserved containers may not be released until its queue has pending request and has chance to be allocated, and the worst is that excess reserved containers will never be released and keep holding resource if there is no additional pending request for this queue. To solve this problem, my opinion is to directly kill excess reserved containers when request is satisfied (in FiCaSchedulerApp#apply) or the allocation number of resource-requests/scheduling-requests is updated to be 0 (in SchedulerApplicationAttempt#updateResourceRequests / SchedulerApplicationAttempt#updateSchedulingRequests). Please feel free to give your suggestions. Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code
[ https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806490#comment-16806490 ] Hadoop QA commented on YARN-9214: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 58s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 79m 50s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}130m 27s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9214 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964399/YARN-9214.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 668110710b54 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 53a86e2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23852/testReport/ | | Max. process+thread count | 926 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23852/console | | Powered by | Apa
[jira] [Commented] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1
[ https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806489#comment-16806489 ] Sen Zhao commented on YARN-8701: Patch 004 to fix checkstyle. > If the single parameter in Resources#createResourceWithSameValue is greater > than Integer.MAX_VALUE, then the value of vcores will be -1 > --- > > Key: YARN-8701 > URL: https://issues.apache.org/jira/browse/YARN-8701 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Reporter: Sen Zhao >Assignee: Sen Zhao >Priority: Major > Attachments: YARN-8701.001.patch, YARN-8701.002.patch, > YARN-8701.003.patch, YARN-8701.004.patch > > > If I configure *MaxResources* in fair-scheduler.xml, like this: > {code}resource1=50{code} > In the queue, the *MaxResources* value will change to > {code}Max Resources: {code} > I think the value of VCores should be *CLUSTER_VCORES*. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1
[ https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sen Zhao updated YARN-8701: --- Attachment: YARN-8701.004.patch > If the single parameter in Resources#createResourceWithSameValue is greater > than Integer.MAX_VALUE, then the value of vcores will be -1 > --- > > Key: YARN-8701 > URL: https://issues.apache.org/jira/browse/YARN-8701 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Reporter: Sen Zhao >Assignee: Sen Zhao >Priority: Major > Attachments: YARN-8701.001.patch, YARN-8701.002.patch, > YARN-8701.003.patch, YARN-8701.004.patch > > > If I configure *MaxResources* in fair-scheduler.xml, like this: > {code}resource1=50{code} > In the queue, the *MaxResources* value will change to > {code}Max Resources: {code} > I think the value of VCores should be *CLUSTER_VCORES*. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1
[ https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806482#comment-16806482 ] Hadoop QA commented on YARN-8701: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 54s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 58s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 25s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 17 unchanged - 0 fixed = 18 total (was 17) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 15s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 44s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 86m 43s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8701 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964403/YARN-8701.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 99dbd111cac7 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 53a86e2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreComm
[jira] [Commented] (YARN-9430) Recovering containers does not check available resources on node
[ https://issues.apache.org/jira/browse/YARN-9430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806481#comment-16806481 ] Adam Antal commented on YARN-9430: -- Thanks for the thorough investigation, [~snemeth]. It's not 100% clear to me what should be the correct behaviour. Referring to the example: what decides which containers recover and which one fails to do that? It might be a viable approach to track which gpus have been assigned to the containers and a container fails to recover if the associated resources are not available, but it still produces further concerns. You can find one example demonstrating it. (what if in one node we have 5 gpus and 2 containers with 2-2 gpus. Before recovery we remove 1-1 gpu from each of the containers running. After recovery do we 1) have no containers recovered as the gpus are removed even though we would have enough resource to recover exactly one app 2) have one (randomly chosen?) containers running as we have 3 gpus in total, each container requests 2 gpus, so one container will have enough gpus but the other doesn't?) This would need to be discussed. IMO tracking which exact gpus have been assigned to a specific container, and failing to recover if that particular resource is removed (but NOT checking if further ones are available) is the one I'm voting for. Opinions? > Recovering containers does not check available resources on node > > > Key: YARN-9430 > URL: https://issues.apache.org/jira/browse/YARN-9430 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Critical > > I have a testcase that checks if some GPU devices gone offline and recovery > happens, only the containers that fit into the node's resources will be > recovered. Unfortunately, this is not the case: RM does not check available > resources on node during recovery. > *Detailed explanation:* > *Testcase:* > 1. There are 2 nodes running NodeManagers > 2. nvidia-smi is replaced with a fake bash script that reports 2 GPU devices > per node, initially. This means 4 GPU devices in the cluster altogether. > 3. RM / NM recovery is enabled > 4. The test starts off a sleep job, requesting 4 containers, 1 GPU device > for each (AM does not request GPUs) > 5. Before restart, the fake bash script is adjusted to report 1 GPU device > per node (2 in the cluster) after restart. > 6. Restart is initiated. > > *Expected behavior:* > After restart, only the AM and 2 normal containers should have been started, > as there are only 2 GPU devices in the cluster. > > *Actual behaviour:* > AM + 4 containers are allocated, this is all containers started originally > with step 4. > App id was: 1553977186701_0001 > *Logs*: > > {code:java} > 2019-03-30 13:22:30,299 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Processing event for appattempt_1553977186701_0001_01 of type RECOVER > 2019-03-30 13:22:30,366 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Added Application Attempt appattempt_1553977186701_0001_01 to scheduler > from user: systest > 2019-03-30 13:22:30,366 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > appattempt_1553977186701_0001_01 is recovering. Skipping notifying > ATTEMPT_ADDED > 2019-03-30 13:22:30,367 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1553977186701_0001_01 State change from NEW to LAUNCHED on > event = RECOVER > 2019-03-30 13:22:33,257 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: > Recovering container [container_e84_1553977186701_0001_01_01, > CreateTime: 1553977260732, Version: 0, State: RUNNING, Capability: > , Diagnostics: , ExitStatus: -1000, > NodeLabelExpression: Priority: 0] > 2019-03-30 13:22:33,275 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: > Recovering container [container_e84_1553977186701_0001_01_04, > CreateTime: 1553977272802, Version: 0, State: RUNNING, Capability: > , Diagnostics: , ExitStatus: -1000, > NodeLabelExpression: Priority: 0] > 2019-03-30 13:22:33,275 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: > Assigned container container_e84_1553977186701_0001_01_04 of capacity > on host > snemeth-gpu-2.vpc.cloudera.com:8041, which has 2 containers, vCores:2, yarn.io/gpu: 1> used and available after > allocation > 2019-03-30 13:22:33,276 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: > Recovering container [container_e84_1553977186701_0001_01_05, > Creat