[jira] [Commented] (YARN-9418) ATSV2 /apps/appId/entities/YARN_CONTAINER rest api does not show metrics
[ https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923091#comment-16923091 ] Rohith Sharma K S commented on YARN-9418: - [~Prabhu Joseph] why this isn't back ported to branch-3.2? > ATSV2 /apps/appId/entities/YARN_CONTAINER rest api does not show metrics > > > Key: YARN-9418 > URL: https://issues.apache.org/jira/browse/YARN-9418 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > Fix For: 3.3.0 > > Attachments: YARN-9418-001.patch, YARN-9418-002.patch, > YARN-9418-003.patch > > > ATSV2 entities rest api does not show the metrics > {code:java} > [hbase@yarn-ats-3 centos]$ curl -s > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS"; > | jq . > { > "metrics": [], > "events": [], > "createdtime": 1553695002014, > "idprefix": 0, > "type": "YARN_CONTAINER", > "id": "container_e18_1553685341603_0006_01_01", > "info": { > "UID": > "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01", > "FROM_ID": > "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01" > }, > "configs": {}, > "isrelatedto": {}, > "relatesto": {} > }{code} > NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this > is not shown in above output. Found NM container entities are set with > entityIdPrefix as inverted container starttime whereas RM container entities > are set with default 0. TimelineReader fetches only RM container entries. > Confirmed with setting NM container entities entityIdPrefix to 0 same as RM > (for testing purpose) and found metrics are shown. > {code:java} > "metrics": [ > { > "type": "SINGLE_VALUE", > "id": "MEMORY", > "aggregationOp": "NOP", > "values": { > "1553774981355": 490430464 > } > }, > { > "type": "SINGLE_VALUE", > "id": "CPU", > "aggregationOp": "NOP", > "values": { > "1553774981355": 5 > } > } > ]{code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9795) ClusterMetrics to include AM allocation delay
[ https://issues.apache.org/jira/browse/YARN-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923048#comment-16923048 ] Fengnan Li commented on YARN-9795: -- Thanks very much [~Tao Yang] for the review. Uploaded [^YARN-9795.002.patch] > ClusterMetrics to include AM allocation delay > - > > Key: YARN-9795 > URL: https://issues.apache.org/jira/browse/YARN-9795 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Minor > Attachments: YARN-9795.001.patch, YARN-9795.002.patch > > > Add AM container allocation in QueueMetrics to help diagnose performance > issue. This is following > [YARN-2802|https://jira.apache.org/jira/browse/YARN-2802] > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9728) ResourceManager REST API can produce an illegal xml response
[ https://issues.apache.org/jira/browse/YARN-9728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9728: Attachment: YARN-9728-005.patch > ResourceManager REST API can produce an illegal xml response > - > > Key: YARN-9728 > URL: https://issues.apache.org/jira/browse/YARN-9728 > Project: Hadoop YARN > Issue Type: Bug > Components: api, resourcemanager >Affects Versions: 2.7.3 >Reporter: Thomas >Assignee: Prabhu Joseph >Priority: Major > Attachments: IllegalResponseChrome.png, YARN-9728-001.patch, > YARN-9728-002.patch, YARN-9728-003.patch, YARN-9728-004.patch, > YARN-9728-005.patch > > > When a spark job throws an exception with a message containing a character > out of the range supported by xml 1.0, then > the application fails and the stack trace will be stored into the > {{diagnostics}} field. So far, so good. > But the issue occurred when we try to get application information with the > ResourceManager REST API > The xml response will contain the illegal xml 1.0 char and will be invalid. > *+Examples of illegals characters in xml 1.0 :+* > * {{\u}} > * {{\u0001}} > * {{\u0002}} > * {{\u0003}} > * {{\u0004}} > _For more information about supported characters :_ > [https://www.w3.org/TR/xml/#charsets] > *+Example of illegal response from the Ressource Manager API :+* > {code:xml} > > > application_1326821518301_0005 > user1 > job > a1 > FINISHED > FAILED > 100.0 > History > > http://host.domain.com:8088/proxy/application_1326821518301_0005/jobhistory/job/job_1326821518301_5_5 > Exception in thread "main" java.lang.Exception: \u0001 > at com..main(JobWithSpecialCharMain.java:6) > [...] > > {code} > > *+Example of job to reproduce :+* > {code:java} > public class JobWithSpecialCharMain { > public static void main(String[] args) throws Exception { > throw new Exception("\u0001"); > } > } > {code} > {code:bash} > javac -d . JobWithSpecialCharMain.java > jar cvf repro.jar com/ > spark-submit --class com.JobWithSpecialCharMain --master yarn-cluster > repro.jar > {code} > !IllegalResponseChrome.png! -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9795) ClusterMetrics to include AM allocation delay
[ https://issues.apache.org/jira/browse/YARN-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengnan Li updated YARN-9795: - Attachment: YARN-9795.002.patch > ClusterMetrics to include AM allocation delay > - > > Key: YARN-9795 > URL: https://issues.apache.org/jira/browse/YARN-9795 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Minor > Attachments: YARN-9795.001.patch, YARN-9795.002.patch > > > Add AM container allocation in QueueMetrics to help diagnose performance > issue. This is following > [YARN-2802|https://jira.apache.org/jira/browse/YARN-2802] > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.
[ https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923046#comment-16923046 ] Hadoop QA commented on YARN-8995: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 49s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 24s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 17s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 40s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 81m 39s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | YARN-8995 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12979507/YARN-8995.016.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 9c16f2568269 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3db7184 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | |
[jira] [Commented] (YARN-9795) ClusterMetrics to include AM allocation delay
[ https://issues.apache.org/jira/browse/YARN-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923024#comment-16923024 ] Tao Yang commented on YARN-9795: Thanks [~fengnanli] for this improvement. Patch almost LGTM, IMO, there's no need to set -1 as the initial value of scheduledTime and add the special annotation, 0 should be the proper initial value like other times. And new check-style warnings should be fixed as well. > ClusterMetrics to include AM allocation delay > - > > Key: YARN-9795 > URL: https://issues.apache.org/jira/browse/YARN-9795 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Minor > Attachments: YARN-9795.001.patch > > > Add AM container allocation in QueueMetrics to help diagnose performance > issue. This is following > [YARN-2802|https://jira.apache.org/jira/browse/YARN-2802] > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls
[ https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi reassigned YARN-9812: --- Assignee: Abhishek Modi > mvn javadoc:javadoc fails in hadoop-sls > --- > > Key: YARN-9812 > URL: https://issues.apache.org/jira/browse/YARN-9812 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Akira Ajisaka >Assignee: Abhishek Modi >Priority: Major > Labels: newbie > > {noformat} > [ERROR] > hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57: > error: bad use of '>' > [ERROR] * pending -> requests which are NOT yet sent to RM. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58: > error: bad use of '>' > [ERROR] * scheduled -> requests which are sent to RM but not yet assigned. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59: > error: bad use of '>' > [ERROR] * assigned -> requests which are assigned to a container. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60: > error: bad use of '>' > [ERROR] * completed -> request corresponding to which container has > completed. > [ERROR] ^ > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9804) Update ATSv2 document for latest feature supports
[ https://issues.apache.org/jira/browse/YARN-9804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923019#comment-16923019 ] Hudson commented on YARN-9804: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17227 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17227/]) YARN-9804. Update ATSv2 document for latest feature supports. (rohithsharmaks: rev 3db71840824c58344c2c59423fd605808785dc2c) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServiceV2.md > Update ATSv2 document for latest feature supports > - > > Key: YARN-9804 > URL: https://issues.apache.org/jira/browse/YARN-9804 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-9804.01.patch, YARN-9804.02.patch > > > Revisit ATSv2 documents and update for GA features. And also for the road map. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9804) Update ATSv2 document for latest feature supports
[ https://issues.apache.org/jira/browse/YARN-9804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923012#comment-16923012 ] Rohith Sharma K S commented on YARN-9804: - committing shortly > Update ATSv2 document for latest feature supports > - > > Key: YARN-9804 > URL: https://issues.apache.org/jira/browse/YARN-9804 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-9804.01.patch, YARN-9804.02.patch > > > Revisit ATSv2 documents and update for GA features. And also for the road map. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.
[ https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922996#comment-16922996 ] Tao Yang commented on YARN-8995: Hi, [~zhuqi], I found another place need to be improved. {{ if (qSize % detailsInterval == 0) }} should be updated to {{ if (qSize != 0 && qSize % detailsInterval == 0 && lastEventDetailsQueueSizeLogged != qSize )}}, avoid printing for empty queue and print details redundantly. > Log the event type of the too big AsyncDispatcher event queue size, and add > the information to the metrics. > > > Key: YARN-8995 > URL: https://issues.apache.org/jira/browse/YARN-8995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, nodemanager, resourcemanager >Affects Versions: 3.2.0, 3.3.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: TestStreamPerf.java, YARN-8995.001.patch, > YARN-8995.002.patch, YARN-8995.003.patch, YARN-8995.004.patch, > YARN-8995.005.patch, YARN-8995.006.patch, YARN-8995.007.patch, > YARN-8995.008.patch, YARN-8995.009.patch, YARN-8995.010.patch, > YARN-8995.011.patch, YARN-8995.012.patch, YARN-8995.013.patch, > YARN-8995.014.patch, image-2019-09-04-15-20-02-914.png > > > In our growing cluster,there are unexpected situations that cause some event > queues to block the performance of the cluster, such as the bug of > https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to > log the event type of the too big event queue size, and add the information > to the metrics, and the threshold of queue size is a parametor which can be > changed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls
[ https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-9812: Labels: newbie (was: ) > mvn javadoc:javadoc fails in hadoop-sls > --- > > Key: YARN-9812 > URL: https://issues.apache.org/jira/browse/YARN-9812 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Akira Ajisaka >Priority: Major > Labels: newbie > > {noformat} > [ERROR] > hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57: > error: bad use of '>' > [ERROR] * pending -> requests which are NOT yet sent to RM. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58: > error: bad use of '>' > [ERROR] * scheduled -> requests which are sent to RM but not yet assigned. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59: > error: bad use of '>' > [ERROR] * assigned -> requests which are assigned to a container. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60: > error: bad use of '>' > [ERROR] * completed -> request corresponding to which container has > completed. > [ERROR] ^ > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls
Akira Ajisaka created YARN-9812: --- Summary: mvn javadoc:javadoc fails in hadoop-sls Key: YARN-9812 URL: https://issues.apache.org/jira/browse/YARN-9812 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Akira Ajisaka {noformat} [ERROR] hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57: error: bad use of '>' [ERROR] * pending -> requests which are NOT yet sent to RM. [ERROR] ^ [ERROR] hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58: error: bad use of '>' [ERROR] * scheduled -> requests which are sent to RM but not yet assigned. [ERROR] ^ [ERROR] hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59: error: bad use of '>' [ERROR] * assigned -> requests which are assigned to a container. [ERROR] ^ [ERROR] hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60: error: bad use of '>' [ERROR] * completed -> request corresponding to which container has completed. [ERROR] ^ {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9810) Add queue capacity/maxcapacity percentage metrics
[ https://issues.apache.org/jira/browse/YARN-9810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922920#comment-16922920 ] Hadoop QA commented on YARN-9810: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 6s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 35s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:blue}0{color} | {color:blue} asflicense {color} | {color:blue} 0m 12s{color} | {color:blue} ASF License check generated no output? {color} | | {color:black}{color} | {color:black} {color} | {color:black}128m 41s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | YARN-9810 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12979463/YARN-9810.01.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 54cc2a75091c 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 337e9b7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/24727/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24727/testReport/ | | Max. process+thread count | 827 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24727/con
[jira] [Commented] (YARN-9795) ClusterMetrics to include AM allocation delay
[ https://issues.apache.org/jira/browse/YARN-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922915#comment-16922915 ] Wangda Tan commented on YARN-9795: -- [~fengnanli], thanks for working on the Jira. I just added you to contributor list so you can assign YARN JIRAs to yourself in the future. It looks like an important improvement. [~Tao Yang] , [~tangzhankun] can you help to review the patch? Thanks > ClusterMetrics to include AM allocation delay > - > > Key: YARN-9795 > URL: https://issues.apache.org/jira/browse/YARN-9795 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Minor > Attachments: YARN-9795.001.patch > > > Add AM container allocation in QueueMetrics to help diagnose performance > issue. This is following > [YARN-2802|https://jira.apache.org/jira/browse/YARN-2802] > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9795) ClusterMetrics to include AM allocation delay
[ https://issues.apache.org/jira/browse/YARN-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-9795: Assignee: Fengnan Li > ClusterMetrics to include AM allocation delay > - > > Key: YARN-9795 > URL: https://issues.apache.org/jira/browse/YARN-9795 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Minor > Attachments: YARN-9795.001.patch > > > Add AM container allocation in QueueMetrics to help diagnose performance > issue. This is following > [YARN-2802|https://jira.apache.org/jira/browse/YARN-2802] > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9561) Add C changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922888#comment-16922888 ] Eric Yang commented on YARN-9561: - [~ebadger] thank you for the debugging session today. I got mapreduce pi to run correctly after adding /etc/krb5.conf to default mount location. Some improvements to make this better: 1. The current output looks like this when container run fails: {code} [2019-09-04 13:26:30.726]Exception from container-launch. Container id: container_1567624987243_0004_01_06 Exit code: 1 Exception message: Launch container failed [2019-09-04 13:26:30.731]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : [2019-09-04 13:26:30.734]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : 2019-09-04 13:26:31,455 INFO mapreduce.Job: Task Id : attempt_1567624987243_0004_m_03_0, Status : FAILED {code} Print a line of output before calling runc. Prelaunch initialization is completed or print the formatted json in prelaunch.out. This helps to narrow down the root of the problem is caused by user job configuration or bugs in container-executor code. Docker runtime shows the command line for calling docker. This helps to troubleshoot the actual problem sooner. 2. ENTRY_POINT support. Instead of calling out to launch_container.sh, it would be nice to dup the stdout, stderr without launch_container.sh wrapper. This helps to remove the requirement of bind mounting log or workdir directories into the container for some use cases. 3. User defined properties integration: YARN Docker integration have a list of [configurable properties|https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/DockerContainers.html#Application_Submission]. These settings do not work with runc container today. Without hinder progress, I suggest to open new issues to improve integration. 4. YARN service uses the properties defined in #3 for customize YARN services mount points, network to use, and privilege container flag. Similar feature sets need new tickets to ensure the new runtime can integrate well with YARN service programming interfaces. > Add C changes for the new RuncContainerRuntime > -- > > Key: YARN-9561 > URL: https://issues.apache.org/jira/browse/YARN-9561 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9561.001.patch, YARN-9561.002.patch, > YARN-9561.003.patch, YARN-9561.004.patch > > > This JIRA will be used to add the C changes to the container-executor native > binary that are necessary for the new RuncContainerRuntime. There should be > no changes to existing code paths. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9728) ResourceManager REST API can produce an illegal xml response
[ https://issues.apache.org/jira/browse/YARN-9728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922827#comment-16922827 ] Hadoop QA commented on YARN-9728: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 47s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 11s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 34s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 15s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 265 unchanged - 0 fixed = 266 total (was 265) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 56s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 52s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 80m 54s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 44s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}167m 6s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | YARN-9728 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12979370/YARN-9728-004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 2fa4628c520e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019
[jira] [Commented] (YARN-9795) ClusterMetrics to include AM allocation delay
[ https://issues.apache.org/jira/browse/YARN-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922759#comment-16922759 ] Fengnan Li commented on YARN-9795: -- [~leftnoteasy] Can you help here? Thanks! > ClusterMetrics to include AM allocation delay > - > > Key: YARN-9795 > URL: https://issues.apache.org/jira/browse/YARN-9795 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Fengnan Li >Priority: Minor > Attachments: YARN-9795.001.patch > > > Add AM container allocation in QueueMetrics to help diagnose performance > issue. This is following > [YARN-2802|https://jira.apache.org/jira/browse/YARN-2802] > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9764) Print application submission context label in application summary
[ https://issues.apache.org/jira/browse/YARN-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-9764: -- Assignee: Manoj Kumar > Print application submission context label in application summary > - > > Key: YARN-9764 > URL: https://issues.apache.org/jira/browse/YARN-9764 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Manoj Kumar >Priority: Major > Labels: release-blocker > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9762) Add submission context label to audit logs
[ https://issues.apache.org/jira/browse/YARN-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-9762: -- Assignee: Manoj Kumar > Add submission context label to audit logs > -- > > Key: YARN-9762 > URL: https://issues.apache.org/jira/browse/YARN-9762 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Manoj Kumar >Priority: Major > Labels: release-blocker > > Currently we log NODELABEL in container allocation/release audit logs, we > should also log NODELABEL of application submission context on app submission. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9763) Print application tags in application summary
[ https://issues.apache.org/jira/browse/YARN-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-9763: -- Assignee: Manoj Kumar > Print application tags in application summary > - > > Key: YARN-9763 > URL: https://issues.apache.org/jira/browse/YARN-9763 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Manoj Kumar >Priority: Major > Labels: release-blocker > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9810) Add queue capacity/maxcapacity percentage metrics
[ https://issues.apache.org/jira/browse/YARN-9810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-9810: -- Assignee: Shubham Gupta (was: Jonathan Hung) > Add queue capacity/maxcapacity percentage metrics > - > > Key: YARN-9810 > URL: https://issues.apache.org/jira/browse/YARN-9810 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Shubham Gupta >Priority: Major > Labels: release-blocker > Attachments: YARN-9810.01.patch > > > Similar to YARN-9085, it'd be good to have queue (absolute) capacity / > (absolute) max capacity metrics in CSQueueMetrics. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9810) Add queue capacity/maxcapacity percentage metrics
[ https://issues.apache.org/jira/browse/YARN-9810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922690#comment-16922690 ] Jonathan Hung commented on YARN-9810: - Uploading 01 patch on behalf of [~shubham29]. > Add queue capacity/maxcapacity percentage metrics > - > > Key: YARN-9810 > URL: https://issues.apache.org/jira/browse/YARN-9810 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Labels: release-blocker > Attachments: YARN-9810.01.patch > > > Similar to YARN-9085, it'd be good to have queue (absolute) capacity / > (absolute) max capacity metrics in CSQueueMetrics. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9810) Add queue capacity/maxcapacity percentage metrics
[ https://issues.apache.org/jira/browse/YARN-9810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-9810: Attachment: YARN-9810.01.patch > Add queue capacity/maxcapacity percentage metrics > - > > Key: YARN-9810 > URL: https://issues.apache.org/jira/browse/YARN-9810 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Labels: release-blocker > Attachments: YARN-9810.01.patch > > > Similar to YARN-9085, it'd be good to have queue (absolute) capacity / > (absolute) max capacity metrics in CSQueueMetrics. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9810) Add queue capacity/maxcapacity percentage metrics
[ https://issues.apache.org/jira/browse/YARN-9810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-9810: Target Version/s: 2.10.0 Labels: release-blocker (was: ) > Add queue capacity/maxcapacity percentage metrics > - > > Key: YARN-9810 > URL: https://issues.apache.org/jira/browse/YARN-9810 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Labels: release-blocker > > Similar to YARN-9085, it'd be good to have queue (absolute) capacity / > (absolute) max capacity metrics in CSQueueMetrics. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9810) Add queue capacity/maxcapacity percentage metrics
[ https://issues.apache.org/jira/browse/YARN-9810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung reassigned YARN-9810: --- Assignee: Jonathan Hung > Add queue capacity/maxcapacity percentage metrics > - > > Key: YARN-9810 > URL: https://issues.apache.org/jira/browse/YARN-9810 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > > Similar to YARN-9085, it'd be good to have queue (absolute) capacity / > (absolute) max capacity metrics in CSQueueMetrics. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9810) Add queue capacity/maxcapacity percentage metrics
[ https://issues.apache.org/jira/browse/YARN-9810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922633#comment-16922633 ] Shubham Gupta commented on YARN-9810: - +1 > Add queue capacity/maxcapacity percentage metrics > - > > Key: YARN-9810 > URL: https://issues.apache.org/jira/browse/YARN-9810 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Priority: Major > > Similar to YARN-9085, it'd be good to have queue (absolute) capacity / > (absolute) max capacity metrics in CSQueueMetrics. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9761) Allow overriding application submissions based on server side configs
[ https://issues.apache.org/jira/browse/YARN-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922482#comment-16922482 ] pralabhkumar edited comment on YARN-9761 at 9/4/19 3:31 PM: Address [~jhung] comment testSubmissionContextWithAbsentTAG is in line with testAppSubmitWithSubmissionPreProcessor (method length is more that 150 , that's why created separate method) was (Author: pralabhkumar): Address jonathan comment > Allow overriding application submissions based on server side configs > - > > Key: YARN-9761 > URL: https://issues.apache.org/jira/browse/YARN-9761 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jonathan Hung >Assignee: pralabhkumar >Priority: Major > Labels: release-blocker > Attachments: YARN-9761.01.patch, YARN-9761.02.patch, > YARN-9761.03.patch, YARN-9761.04.patch, YARN-9761.05.patch > > > Create a preprocessor/interceptor which takes each app submitted to RM and > overrides the submission context based on server side configs. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9776) yarn logs throws an error "Not a valid BCFile"
[ https://issues.apache.org/jira/browse/YARN-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922561#comment-16922561 ] Prabhu Joseph commented on YARN-9776: - [~jordanagoodboy] The meta file removed was written with IFile whereas the client reads the log file uses TFile format. Setting yarn.log-aggregation.file-formats to IFile in Client machine would have solved the issue. This does not look like a Bug. Can you share what we need to fix as part of this Jira. Thanks. > yarn logs throws an error "Not a valid BCFile" > -- > > Key: YARN-9776 > URL: https://issues.apache.org/jira/browse/YARN-9776 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.1.0 > Environment: HDP 3.1.0.78 > >Reporter: agoodboy >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > Env: hdp 3.1.0.0-78 > Command: yarn logs -applicationId xxx, throws an error "Not a valid BCFile.", > and then exit. > After open debug log using "export YARN_ROOT_LOGGER="DEBUG,console", and > rerun command. It shows that > "fileName=/data1/app-logs/hadoop/logs/application_1566555356033_0032/meta" is > not a valid BCFile. And after I remove the file from hdfs, and rerun command, > it success. > So, how to generate this meta file? > I guess that is because this in yarn-site.xml: > > > yarn.timeline-service.generic-application-history.save-non-am-container-meta-info > true > > I set this value to true because I want to see all container logs in timeline > web page, orelse I can just see the am container log. > So, it seems that yarn logs can't properly handle this suituation. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9809) NMs should supply a health status when registering with RM
[ https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922542#comment-16922542 ] Eric Badger commented on YARN-9809: --- bq. Although it is good to have a way to prevent scheduling containers to a node manager that is going through registration process to save network round trips and compute resources, the existing async design allows the node to show up in Resource Manager as quickly as possible to improve system admin user experience. But if that node is bad, then registering to the RM is just adding unnecessary work. The NM health check script can check for many things that are known without a container being run. For example, docker could not be installed, or nscd not running (causing a user lookup for every new container). These could be reasons for the node to declare itself as unhealthy depending on the specific health check script. If we register with the RM and then declare the node unhealthy afterwards then we have to kill every container that was scheduled in the period between registration and first heartbeat. > NMs should supply a health status when registering with RM > -- > > Key: YARN-9809 > URL: https://issues.apache.org/jira/browse/YARN-9809 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > > Currently if the NM registers with the RM and it is unhealthy, it can be > scheduled many containers before the first heartbeat. After the first > heartbeat, the RM will mark the NM as unhealthy and kill all of the > containers. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9761) Allow overriding application submissions based on server side configs
[ https://issues.apache.org/jira/browse/YARN-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pralabhkumar updated YARN-9761: --- Attachment: YARN-9761.05.patch > Allow overriding application submissions based on server side configs > - > > Key: YARN-9761 > URL: https://issues.apache.org/jira/browse/YARN-9761 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jonathan Hung >Assignee: pralabhkumar >Priority: Major > Labels: release-blocker > Attachments: YARN-9761.01.patch, YARN-9761.02.patch, > YARN-9761.03.patch, YARN-9761.04.patch, YARN-9761.05.patch > > > Create a preprocessor/interceptor which takes each app submitted to RM and > overrides the submission context based on server side configs. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9776) yarn logs throws an error "Not a valid BCFile"
[ https://issues.apache.org/jira/browse/YARN-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922472#comment-16922472 ] agoodboy commented on YARN-9776: [~Prabhu Joseph] I dont't think that we should close the issue. > yarn logs throws an error "Not a valid BCFile" > -- > > Key: YARN-9776 > URL: https://issues.apache.org/jira/browse/YARN-9776 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.1.0 > Environment: HDP 3.1.0.78 > >Reporter: agoodboy >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > Env: hdp 3.1.0.0-78 > Command: yarn logs -applicationId xxx, throws an error "Not a valid BCFile.", > and then exit. > After open debug log using "export YARN_ROOT_LOGGER="DEBUG,console", and > rerun command. It shows that > "fileName=/data1/app-logs/hadoop/logs/application_1566555356033_0032/meta" is > not a valid BCFile. And after I remove the file from hdfs, and rerun command, > it success. > So, how to generate this meta file? > I guess that is because this in yarn-site.xml: > > > yarn.timeline-service.generic-application-history.save-non-am-container-meta-info > true > > I set this value to true because I want to see all container logs in timeline > web page, orelse I can just see the am container log. > So, it seems that yarn logs can't properly handle this suituation. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9030) Log aggregation changes to handle filesystems which do not support setting permissions
[ https://issues.apache.org/jira/browse/YARN-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal updated YARN-9030: - Component/s: log-aggregation > Log aggregation changes to handle filesystems which do not support setting > permissions > -- > > Key: YARN-9030 > URL: https://issues.apache.org/jira/browse/YARN-9030 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Reporter: Suma Shivaprasad >Assignee: Suma Shivaprasad >Priority: Major > Fix For: 3.3.0, 3.2.1 > > Attachments: YARN-9030.1.patch, YARN-9030.2.patch > > > Some cloud storages like ADLS do not support permissions in which case they > throw an UnsupportedOperationException. Log aggregation code should > log/ignore these exceptions and not set permissions henceforth for log > aggregation base dir/sub dirs > {noformat} > 2018-11-12 15:37:28,726 WARN logaggregation.LogAggregationService > (LogAggregationService.java:initApp(209)) - Application failed to init > aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to check > permissions for dir [abfs://testc...@test.blob.core.windows.net/app-logs] > at > org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController.verifyAndCreateRemoteLogDir(LogAggregationFileController.java:277) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:238) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:204) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:347) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:69) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9811) FederationInterceptor fails to recover in Kerberos environment
Xie YiFan created YARN-9811: --- Summary: FederationInterceptor fails to recover in Kerberos environment Key: YARN-9811 URL: https://issues.apache.org/jira/browse/YARN-9811 Project: Hadoop YARN Issue Type: Bug Components: amrmproxy Reporter: Xie YiFan Assignee: Xie YiFan *scenario*: Start up cluster in Kerberos environment with enable recover & AMRMProxy in NM. Submit one application to cluster, and restart NM which has master container. The NM will block in FederationInterceptor recover. *LOG* {code:java} INFO org.apache.hadoop.yarn.server.nodemanager.amrmproxy.FederationInterceptor: Recovering data for FederationInterceptor INFO org.apache.hadoop.yarn.server.nodemanager.amrmproxy.FederationInterceptor: Found 0 existing UAMs for application application_1561534175896_4102 in NMStateStore INFO org.apache.hadoop.yarn.server.utils.AMRMClientUtils: Creating RMProxy to RM online-bx for protocol ApplicationClientProtocol for user recommend (auth:SIMPLE) INFO org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider: Initialized Federation proxy for user: recommend INFO org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider: Failing over to the ResourceManager for SubClusterId: online-bx INFO org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider: Connecting to /10.88.86.142:8032 subClusterId online-bx with protocol ApplicationClientProtocol as user recommend (auth:SIMPLE) WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] INFO org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider: Failing over to the ResourceManager for SubClusterId: online-bx INFO org.apache.hadoop.yarn.server.federation.utils.FederationStateStoreFacade: Flushing subClusters from cache and rehydrating from store, most likely on account of RM failover. INFO org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider: Connecting to /10.88.86.142:8032 subClusterId online-bx with protocol ApplicationClientProtocol as user recommend (auth:SIMPLE) WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] INFO org.apache.hadoop.io.retry.RetryInvocationHandler: java.io.IOException: DestHost:destPort hadoop1684.bx.momo.com:8032 , LocalHost:localPort hadoop999.bx.momo.com/10.88.64.186:0. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS], while invoking ApplicationClientProtocolPBClientImpl.getContainers over online-bx after 1 failover attempts. Trying to failover after sleeping for 3244ms.{code} *Analysis* rmclient.getContainers is called. But AuthMethod of appSubmitter is SIMPLE.We should use createProxyUser instead of createRemoteUser in Security. {code:java} UserGroupInformation appSubmitter = UserGroupInformation .createRemoteUser(getApplicationContext().getUser()); ApplicationClientProtocol rmClient = createHomeRMProxy(getApplicationContext(), ApplicationClientProtocol.class, appSubmitter); GetContainersResponse response = rmClient .getContainers(GetContainersRequest.newInstance(this.attemptId)); {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9804) Update ATSv2 document for latest feature supports
[ https://issues.apache.org/jira/browse/YARN-9804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922320#comment-16922320 ] Sunil Govindan commented on YARN-9804: -- +1. Thanks [~rohithsharma] > Update ATSv2 document for latest feature supports > - > > Key: YARN-9804 > URL: https://issues.apache.org/jira/browse/YARN-9804 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-9804.01.patch, YARN-9804.02.patch > > > Revisit ATSv2 documents and update for GA features. And also for the road map. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9698) [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922315#comment-16922315 ] Gergely Pollak commented on YARN-9698: -- I'm attaching a documentation we created with [~Prabhu Joseph] [~sunilg] [~wangda] [~wilfreds] [~snemeth]. It includes the main features of the schedulers and the configuration mapping. Please feel free to comment and share your thoughts. > [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler > > > Key: YARN-9698 > URL: https://issues.apache.org/jira/browse/YARN-9698 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Weiwei Yang >Priority: Major > Labels: fs2cs > Attachments: FS-CS Migration.pdf > > > We see some users want to migrate from Fair Scheduler to Capacity Scheduler, > this Jira is created as an umbrella to track all related efforts for the > migration, the scope contains > * Bug fixes > * Add missing features > * Migration tools that help to generate CS configs based on FS, validate > configs etc > * Documents > this is part of CS component, the purpose is to make the migration process > smooth. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.
[ https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922309#comment-16922309 ] Weiwei Yang commented on YARN-8995: --- Also looks good to me, [~Tao Yang], feel free to commit this. > Log the event type of the too big AsyncDispatcher event queue size, and add > the information to the metrics. > > > Key: YARN-8995 > URL: https://issues.apache.org/jira/browse/YARN-8995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, nodemanager, resourcemanager >Affects Versions: 3.2.0, 3.3.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: TestStreamPerf.java, YARN-8995.001.patch, > YARN-8995.002.patch, YARN-8995.003.patch, YARN-8995.004.patch, > YARN-8995.005.patch, YARN-8995.006.patch, YARN-8995.007.patch, > YARN-8995.008.patch, YARN-8995.009.patch, YARN-8995.010.patch, > YARN-8995.011.patch, YARN-8995.012.patch, YARN-8995.013.patch, > YARN-8995.014.patch, image-2019-09-04-15-20-02-914.png > > > In our growing cluster,there are unexpected situations that cause some event > queues to block the performance of the cluster, such as the bug of > https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to > log the event type of the too big event queue size, and add the information > to the metrics, and the threshold of queue size is a parametor which can be > changed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9698) [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gergely Pollak updated YARN-9698: - Attachment: FS-CS Migration.pdf > [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler > > > Key: YARN-9698 > URL: https://issues.apache.org/jira/browse/YARN-9698 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Weiwei Yang >Priority: Major > Labels: fs2cs > Attachments: FS-CS Migration.pdf > > > We see some users want to migrate from Fair Scheduler to Capacity Scheduler, > this Jira is created as an umbrella to track all related efforts for the > migration, the scope contains > * Bug fixes > * Add missing features > * Migration tools that help to generate CS configs based on FS, validate > configs etc > * Documents > this is part of CS component, the purpose is to make the migration process > smooth. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.
[ https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922279#comment-16922279 ] Tao Yang commented on YARN-8995: Confirmed that latest patch should not fail like that. Now the patch LGTM, waiting for feedbacks from [~cheersyang], thanks. > Log the event type of the too big AsyncDispatcher event queue size, and add > the information to the metrics. > > > Key: YARN-8995 > URL: https://issues.apache.org/jira/browse/YARN-8995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, nodemanager, resourcemanager >Affects Versions: 3.2.0, 3.3.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: TestStreamPerf.java, YARN-8995.001.patch, > YARN-8995.002.patch, YARN-8995.003.patch, YARN-8995.004.patch, > YARN-8995.005.patch, YARN-8995.006.patch, YARN-8995.007.patch, > YARN-8995.008.patch, YARN-8995.009.patch, YARN-8995.010.patch, > YARN-8995.011.patch, YARN-8995.012.patch, YARN-8995.013.patch, > YARN-8995.014.patch, image-2019-09-04-15-20-02-914.png > > > In our growing cluster,there are unexpected situations that cause some event > queues to block the performance of the cluster, such as the bug of > https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to > log the event type of the too big event queue size, and add the information > to the metrics, and the threshold of queue size is a parametor which can be > changed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9728) ResourceManager REST API can produce an illegal xml response
[ https://issues.apache.org/jira/browse/YARN-9728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922280#comment-16922280 ] Prabhu Joseph commented on YARN-9728: - Thanks [~eyang] and [~tde] for reviewing. Have handled below changes in [^YARN-9728-004.patch] . 1. Added unicode characters in x1-#x10 range. 2. Used \uFFFd as the substitute. 3. Fixed the camel case issue. 4. Fixed the description of {{yarn.webapp.filter-invalid-xml-chars}}. > ResourceManager REST API can produce an illegal xml response > - > > Key: YARN-9728 > URL: https://issues.apache.org/jira/browse/YARN-9728 > Project: Hadoop YARN > Issue Type: Bug > Components: api, resourcemanager >Affects Versions: 2.7.3 >Reporter: Thomas >Assignee: Prabhu Joseph >Priority: Major > Attachments: IllegalResponseChrome.png, YARN-9728-001.patch, > YARN-9728-002.patch, YARN-9728-003.patch, YARN-9728-004.patch > > > When a spark job throws an exception with a message containing a character > out of the range supported by xml 1.0, then > the application fails and the stack trace will be stored into the > {{diagnostics}} field. So far, so good. > But the issue occurred when we try to get application information with the > ResourceManager REST API > The xml response will contain the illegal xml 1.0 char and will be invalid. > *+Examples of illegals characters in xml 1.0 :+* > * {{\u}} > * {{\u0001}} > * {{\u0002}} > * {{\u0003}} > * {{\u0004}} > _For more information about supported characters :_ > [https://www.w3.org/TR/xml/#charsets] > *+Example of illegal response from the Ressource Manager API :+* > {code:xml} > > > application_1326821518301_0005 > user1 > job > a1 > FINISHED > FAILED > 100.0 > History > > http://host.domain.com:8088/proxy/application_1326821518301_0005/jobhistory/job/job_1326821518301_5_5 > Exception in thread "main" java.lang.Exception: \u0001 > at com..main(JobWithSpecialCharMain.java:6) > [...] > > {code} > > *+Example of job to reproduce :+* > {code:java} > public class JobWithSpecialCharMain { > public static void main(String[] args) throws Exception { > throw new Exception("\u0001"); > } > } > {code} > {code:bash} > javac -d . JobWithSpecialCharMain.java > jar cvf repro.jar com/ > spark-submit --class com.JobWithSpecialCharMain --master yarn-cluster > repro.jar > {code} > !IllegalResponseChrome.png! -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9728) ResourceManager REST API can produce an illegal xml response
[ https://issues.apache.org/jira/browse/YARN-9728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9728: Attachment: YARN-9728-004.patch > ResourceManager REST API can produce an illegal xml response > - > > Key: YARN-9728 > URL: https://issues.apache.org/jira/browse/YARN-9728 > Project: Hadoop YARN > Issue Type: Bug > Components: api, resourcemanager >Affects Versions: 2.7.3 >Reporter: Thomas >Assignee: Prabhu Joseph >Priority: Major > Attachments: IllegalResponseChrome.png, YARN-9728-001.patch, > YARN-9728-002.patch, YARN-9728-003.patch, YARN-9728-004.patch > > > When a spark job throws an exception with a message containing a character > out of the range supported by xml 1.0, then > the application fails and the stack trace will be stored into the > {{diagnostics}} field. So far, so good. > But the issue occurred when we try to get application information with the > ResourceManager REST API > The xml response will contain the illegal xml 1.0 char and will be invalid. > *+Examples of illegals characters in xml 1.0 :+* > * {{\u}} > * {{\u0001}} > * {{\u0002}} > * {{\u0003}} > * {{\u0004}} > _For more information about supported characters :_ > [https://www.w3.org/TR/xml/#charsets] > *+Example of illegal response from the Ressource Manager API :+* > {code:xml} > > > application_1326821518301_0005 > user1 > job > a1 > FINISHED > FAILED > 100.0 > History > > http://host.domain.com:8088/proxy/application_1326821518301_0005/jobhistory/job/job_1326821518301_5_5 > Exception in thread "main" java.lang.Exception: \u0001 > at com..main(JobWithSpecialCharMain.java:6) > [...] > > {code} > > *+Example of job to reproduce :+* > {code:java} > public class JobWithSpecialCharMain { > public static void main(String[] args) throws Exception { > throw new Exception("\u0001"); > } > } > {code} > {code:bash} > javac -d . JobWithSpecialCharMain.java > jar cvf repro.jar com/ > spark-submit --class com.JobWithSpecialCharMain --master yarn-cluster > repro.jar > {code} > !IllegalResponseChrome.png! -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9785) Fix DominantResourceCalculator when one resource is zero
[ https://issues.apache.org/jira/browse/YARN-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9785: --- Fix Version/s: 3.1.3 > Fix DominantResourceCalculator when one resource is zero > > > Key: YARN-9785 > URL: https://issues.apache.org/jira/browse/YARN-9785 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Blocker > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: YARN-9785-001.patch, YARN-9785-branch-3.1.001.patch, > YARN-9785.002.patch, YARN-9785.003.patch, YARN-9785.wip.patch > > > Configure below property in resource-types.xml > {quote} > yarn.resource-types > yarn.io/gpu > > {quote} > Submit applications even after AM limit for a queue is reached. Applications > get activated even after limit is reached > !queue.png! -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.
[ https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1699#comment-1699 ] zhuqi commented on YARN-8995: - Hi [~Tao Yang]. !image-2019-09-04-15-20-02-914.png! The metric that i have changed.Now not in thousand, but i forget to change it in the last two patch. Sorry for my mistake. Thanks. > Log the event type of the too big AsyncDispatcher event queue size, and add > the information to the metrics. > > > Key: YARN-8995 > URL: https://issues.apache.org/jira/browse/YARN-8995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, nodemanager, resourcemanager >Affects Versions: 3.2.0, 3.3.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: TestStreamPerf.java, YARN-8995.001.patch, > YARN-8995.002.patch, YARN-8995.003.patch, YARN-8995.004.patch, > YARN-8995.005.patch, YARN-8995.006.patch, YARN-8995.007.patch, > YARN-8995.008.patch, YARN-8995.009.patch, YARN-8995.010.patch, > YARN-8995.011.patch, YARN-8995.012.patch, YARN-8995.013.patch, > YARN-8995.014.patch, image-2019-09-04-15-20-02-914.png > > > In our growing cluster,there are unexpected situations that cause some event > queues to block the performance of the cluster, such as the bug of > https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to > log the event type of the too big event queue size, and add the information > to the metrics, and the threshold of queue size is a parametor which can be > changed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6715) Fix documentation about NodeHealthScriptRunner
[ https://issues.apache.org/jira/browse/YARN-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1691#comment-1691 ] Peter Bacsko commented on YARN-6715: [~szegedim] yes, that's fine. > Fix documentation about NodeHealthScriptRunner > --- > > Key: YARN-6715 > URL: https://issues.apache.org/jira/browse/YARN-6715 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, nodemanager >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-6715-001.patch, YARN-6715-002.patch, > YARN-6715-003.patch > > > NodeHealthScriptRunner does *not* report a bad health if the script exits > with an exit code other than 0. Look at the {{FAILED_WITH_EXIT_CODE}} case: > {noformat} > void reportHealthStatus(HealthCheckerExitStatus status) { > long now = System.currentTimeMillis(); > switch (status) { > case SUCCESS: > setHealthStatus(true, "", now); > break; > case TIMED_OUT: > setHealthStatus(false, NODE_HEALTH_SCRIPT_TIMED_OUT_MSG); > break; > case FAILED_WITH_EXCEPTION: > setHealthStatus(false, exceptionStackTrace); > break; > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > case FAILED: > setHealthStatus(false, shexec.getOutput()); > break; > } > } > {noformat} > Based on the discussion in YARN-5567, this is intentional, but conflicts with > the upstream document, which says: > "If the script *exits with a non-zero exit code*, times out or results in an > exception being thrown, the node is marked as unhealthy" > This statement can be extremely misleading and must be corrected. We might > also add an extra comment to {{reportHealthStatus()}} which explains that > {{FAILED_WITH_EXIT_CODE}} is not buggy. > This case also lacks unit test coverage. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.
[ https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated YARN-8995: Attachment: image-2019-09-04-15-20-02-914.png > Log the event type of the too big AsyncDispatcher event queue size, and add > the information to the metrics. > > > Key: YARN-8995 > URL: https://issues.apache.org/jira/browse/YARN-8995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, nodemanager, resourcemanager >Affects Versions: 3.2.0, 3.3.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: TestStreamPerf.java, YARN-8995.001.patch, > YARN-8995.002.patch, YARN-8995.003.patch, YARN-8995.004.patch, > YARN-8995.005.patch, YARN-8995.006.patch, YARN-8995.007.patch, > YARN-8995.008.patch, YARN-8995.009.patch, YARN-8995.010.patch, > YARN-8995.011.patch, YARN-8995.012.patch, YARN-8995.013.patch, > YARN-8995.014.patch, image-2019-09-04-15-20-02-914.png > > > In our growing cluster,there are unexpected situations that cause some event > queues to block the performance of the cluster, such as the bug of > https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to > log the event type of the too big event queue size, and add the information > to the metrics, and the threshold of queue size is a parametor which can be > changed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436
[ https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922215#comment-16922215 ] Adam Antal commented on YARN-9511: -- Hi [~seanlau], Szilard is on vacation, so there isn't going to be any update on this for the next 2-3 weeks. If it's urgent for you, I think [~snemeth] wouldn't mind if you take this over. I can take a look at it as well next week, the JDK11 issues are in my scope. > [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: > The remote jarfile should not be writable by group or others. The current > Permission is 436 > --- > > Key: YARN-9511 > URL: https://issues.apache.org/jira/browse/YARN-9511 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Siyao Meng >Assignee: Szilard Nemeth >Priority: Major > > Found in maven JDK 11 unit test run. Compiled on JDK 8. > {code} > [ERROR] > testRemoteAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices) > Time elapsed: 0.551 s <<< > ERROR!org.apache.hadoop.yarn.exceptions.YarnRuntimeException: The remote > jarfile should not be writable by group or others. The current Permission is > 436 > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:202) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testRemoteAuxServiceClassPath(TestAuxServices.java:268) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436
[ https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922211#comment-16922211 ] liusheng commented on YARN-9511: Hi, Any update about this issue ? > [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: > The remote jarfile should not be writable by group or others. The current > Permission is 436 > --- > > Key: YARN-9511 > URL: https://issues.apache.org/jira/browse/YARN-9511 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Siyao Meng >Assignee: Szilard Nemeth >Priority: Major > > Found in maven JDK 11 unit test run. Compiled on JDK 8. > {code} > [ERROR] > testRemoteAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices) > Time elapsed: 0.551 s <<< > ERROR!org.apache.hadoop.yarn.exceptions.YarnRuntimeException: The remote > jarfile should not be writable by group or others. The current Permission is > 436 > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:202) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testRemoteAuxServiceClassPath(TestAuxServices.java:268) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9784) org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue is flaky
[ https://issues.apache.org/jira/browse/YARN-9784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922205#comment-16922205 ] Julia Kinga Marton commented on YARN-9784: -- Thank you [~adam.antal] and [~sunilg] for the review! > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue > is flaky > --- > > Key: YARN-9784 > URL: https://issues.apache.org/jira/browse/YARN-9784 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.3.0 >Reporter: Julia Kinga Marton >Assignee: Julia Kinga Marton >Priority: Major > Attachments: YARN-9784.001.patch > > > There are some test cases in TestLeafQueue which are failing intermittently. > From 100 runs, there were 16 failures. > Some failure examples are the following ones: > {code:java} > 2019-08-26 13:18:13 [ERROR] Errors: > 2019-08-26 13:18:13 [ERROR] TestLeafQueue.setUp:144->setUpInternal:221 > WrongTypeOfReturnValue > 2019-08-26 13:18:13 YarnConfigu... > 2019-08-26 13:18:13 [ERROR] TestLeafQueue.setUp:144->setUpInternal:221 > WrongTypeOfReturnValue > 2019-08-26 13:18:13 YarnConfigu... > 2019-08-26 13:18:13 [INFO] > 2019-08-26 13:18:13 [ERROR] Tests run: 36, Failures: 0, Errors: 2, Skipped: 0 > {code} > {code:java} > 2019-08-26 13:18:09 [ERROR] Failures: > 2019-08-26 13:18:09 [ERROR] TestLeafQueue.testHeadroomWithMaxCap:1373 > expected:<2048> but was:<0> > 2019-08-26 13:18:09 [INFO] > 2019-08-26 13:18:09 [ERROR] Tests run: 36, Failures: 1, Errors: 0, Skipped: 0 > {code} > {code:java} > 2019-08-26 13:18:18 [ERROR] Errors: > 2019-08-26 13:18:18 [ERROR] TestLeafQueue.setUp:144->setUpInternal:221 > WrongTypeOfReturnValue > 2019-08-26 13:18:18 YarnConfigu... > 2019-08-26 13:18:18 [ERROR] TestLeafQueue.testHeadroomWithMaxCap:1307 ? > ClassCast org.apache.hadoop.yarn.c... > 2019-08-26 13:18:18 [INFO] > 2019-08-26 13:18:18 [ERROR] Tests run: 36, Failures: 0, Errors: 2, Skipped: 0 > {code} > {code:java} > 2019-08-26 13:18:10 [ERROR] Failures: > 2019-08-26 13:18:10 [ERROR] TestLeafQueue.testDRFUserLimits:847 Verify > user_0 got resources > 2019-08-26 13:18:10 [INFO] > 2019-08-26 13:18:10 [ERROR] Tests run: 36, Failures: 1, Errors: 0, Skipped: 0 > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org