[jira] [Resolved] (YARN-6875) New aggregated log file format for YARN log aggregation.
[ https://issues.apache.org/jira/browse/YARN-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Francke resolved YARN-6875. Resolution: Fixed > New aggregated log file format for YARN log aggregation. > > > Key: YARN-6875 > URL: https://issues.apache.org/jira/browse/YARN-6875 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Major > Attachments: YARN-6875-NewLogAggregationFormat-design-doc.pdf > > > T-file is the underlying log format for the aggregated logs in YARN. We have > seen several performance issues, especially for very large log files. > We will introduce a new log format which have better performance for large > log files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9619) Transfer error AM host/ip when launching app using docker container with bridge network
caozhiqiang created YARN-9619: - Summary: Transfer error AM host/ip when launching app using docker container with bridge network Key: YARN-9619 URL: https://issues.apache.org/jira/browse/YARN-9619 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.3.0 Reporter: caozhiqiang When launching application using docker container with bridge network in overlay networks, client will polling the rate of application process from ApplicationMaster with error host/IP. client also polling from the nodemanager's hostname/IP, but not from the docker's IP which AM real running in. The error message is below(the server hadoop3-1/192.168.2.105 is NM's, not AM's docker IP, so it can't be accessed): 2019-05-11 08:28:46,361 INFO ipc.Client: Retrying connect to server: hadoop3-1/192.168.2.105:37963. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2019-05-11 08:28:47,363 INFO ipc.Client: Retrying connect to server: hadoop3-1/192.168.2.105:37963. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2019-05-11 08:28:48,365 INFO ipc.Client: Retrying connect to server: hadoop3-1/192.168.2.105:37963. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2019-05-10 08:34:40,235 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server 2019-05-10 08:35:00,408 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:12020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-05-10 08:35:00,408 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:12020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) java.io.IOException: java.net.ConnectException: Your endpoint configuration is wrong; For more details see: http://wiki.apache.org/hadoop/UnsetHostnameOrPort at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:345) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:430) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:871) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:331) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:328) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:328) at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:612) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1629) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1591) at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:307) at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:360) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:368) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:323) at org.apache.hadoop.util.RunJar.main(RunJar.java:236) Caused by: java.net.ConnectException: Your endpoint configuration is wrong; For more details see: http://wiki.apache.org/hadoop/UnsetHostnameOrPort at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) at org.apache.hadoop.ipc.Client.call(Client.java:1457) at or
[jira] [Commented] (YARN-9612) Support using ip to register NodeID
[ https://issues.apache.org/jira/browse/YARN-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861773#comment-16861773 ] Zhankun Tang commented on YARN-9612: [~cane], Thanks. But maybe I don't have much context here. Could please you elaborate on why service name make things worse? > Support using ip to register NodeID > --- > > Key: YARN-9612 > URL: https://issues.apache.org/jira/browse/YARN-9612 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhoukang >Priority: Major > > In the environment like k8s. We should support ip when register NodeID with > RM since the hostname will be podName which can not be be resolved by DNS of > k8s -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861745#comment-16861745 ] Bibin A Chundatt commented on YARN-9615: Thank you [~jhung] Incase you havnt started working on this , i could add patch we had done for our setup, which exports to JMX too. Visualization through grafana !screenshot-1.png! > Add dispatcher metrics to RM > > > Key: YARN-9615 > URL: https://issues.apache.org/jira/browse/YARN-9615 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: screenshot-1.png > > > It'd be good to have counts/processing times for each event type in RM async > dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-9615: --- Attachment: screenshot-1.png > Add dispatcher metrics to RM > > > Key: YARN-9615 > URL: https://issues.apache.org/jira/browse/YARN-9615 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: screenshot-1.png > > > It'd be good to have counts/processing times for each event type in RM async > dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.
[ https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861741#comment-16861741 ] Tao Yang commented on YARN-8995: Thanks [~zhuqi] for updating the patch. Comments about the new patch: * For the latest event, I didn't mean that it should be control separately from the counter info, we can add a boolean flag defaults to false, which can be updated to true when triggering to print the details (for example queue size has reached N*5000) and to false after latest event has already been printed. * Configuration reading logic should be moved to serviceStart() for better performance. * The printEventQueueDetails method can be simplified via stream API, moreover, value type of counterMap should use Long instead of long[]. * The new configuration entry should have a clear name, for example "yarn.dispatcher.print-events-debug-info.interval-in-thousands" in a random think, you can give a better name for it. I suppose we should take thousands as the unit since the print switch is due to another condition (qSize % 1000 == 0). > Log the event type of the too big AsyncDispatcher event queue size, and add > the information to the metrics. > > > Key: YARN-8995 > URL: https://issues.apache.org/jira/browse/YARN-8995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, nodemanager, resourcemanager >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: YARN-8995.001.patch, YARN-8995.002.patch > > > In our growing cluster,there are unexpected situations that cause some event > queues to block the performance of the cluster, such as the bug of > https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to > log the event type of the too big event queue size, and add the information > to the metrics, and the threshold of queue size is a parametor which can be > changed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.
[ https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated YARN-8995: Attachment: (was: YARN-8995.002.patch) > Log the event type of the too big AsyncDispatcher event queue size, and add > the information to the metrics. > > > Key: YARN-8995 > URL: https://issues.apache.org/jira/browse/YARN-8995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, nodemanager, resourcemanager >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: YARN-8995.001.patch, YARN-8995.002.patch > > > In our growing cluster,there are unexpected situations that cause some event > queues to block the performance of the cluster, such as the bug of > https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to > log the event type of the too big event queue size, and add the information > to the metrics, and the threshold of queue size is a parametor which can be > changed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.
[ https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated YARN-8995: Attachment: YARN-8995.002.patch > Log the event type of the too big AsyncDispatcher event queue size, and add > the information to the metrics. > > > Key: YARN-8995 > URL: https://issues.apache.org/jira/browse/YARN-8995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, nodemanager, resourcemanager >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: YARN-8995.001.patch, YARN-8995.002.patch > > > In our growing cluster,there are unexpected situations that cause some event > queues to block the performance of the cluster, such as the bug of > https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to > log the event type of the too big event queue size, and add the information > to the metrics, and the threshold of queue size is a parametor which can be > changed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9327) ProtoUtils#convertToProtoFormat block Application Master Service and many more
[ https://issues.apache.org/jira/browse/YARN-9327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861483#comment-16861483 ] Hadoop QA commented on YARN-9327: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 45s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 55s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 11s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 49s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 59m 57s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=18.09.5 Server=18.09.5 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | YARN-9327 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12959792/YARN-9327.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f103033c7194 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 5740eea | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24259/testReport/ | | Max. process+thread count | 351 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24259/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Pr
[jira] [Commented] (YARN-9327) ProtoUtils#convertToProtoFormat block Application Master Service and many more
[ https://issues.apache.org/jira/browse/YARN-9327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861400#comment-16861400 ] Hadoop QA commented on YARN-9327: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 41s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 43s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 50s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 46s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 58m 58s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9327 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12959792/YARN-9327.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4cd10e82dde3 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e997f2a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24258/testReport/ | | Max. process+thread count | 446 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24258/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated.
[jira] [Commented] (YARN-9301) Too many InvalidStateTransitionException with SLS
[ https://issues.apache.org/jira/browse/YARN-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861376#comment-16861376 ] Hadoop QA commented on YARN-9301: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 13s{color} | {color:orange} hadoop-tools/hadoop-sls: The patch generated 1 new + 3 unchanged - 0 fixed = 4 total (was 3) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 51s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 47s{color} | {color:green} hadoop-sls in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 62m 18s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=18.09.5 Server=18.09.5 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | YARN-9301 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12969964/YARN-9301-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 117d368dc7e4 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3c9a5e7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/24257/artifact/out/diff-checkstyle-hadoop-tools_hadoop-sls.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24257/testReport/ | | Max. process+thread count | 448 (vs. ulimit of 5500) | | modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls | | Console output | https://builds.apache.org/job
[jira] [Commented] (YARN-9557) Application fails in diskchecker when ReadWriteDiskValidator is configured.
[ https://issues.apache.org/jira/browse/YARN-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861290#comment-16861290 ] Bibin A Chundatt commented on YARN-9557: Committed to trunk [~BilwaST] Could you please upload patch for branch-3.2 > Application fails in diskchecker when ReadWriteDiskValidator is configured. > --- > > Key: YARN-9557 > URL: https://issues.apache.org/jira/browse/YARN-9557 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 > Environment: Configure: > > yarn.nodemanager.disk-validator > read-write > >Reporter: Anuruddh Nayak >Assignee: Bilwa S T >Priority: Critical > Attachments: YARN-9557-001.patch, YARN-9557-002.patch, > YARN-9557-003.patch > > > Application fails to execute successfully when ReadWriteDiskValidator is > configured. > {code:java} > > yarn.nodemanager.disk-validator > read-write > > {code} > {noformat} > Exception thrown while starting Container: > java.io.IOException: org.apache.hadoop.util.DiskChecker$DiskErrorException: > Disk Check failed! > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:200) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1233) > Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Disk Check > failed! > at > org.apache.hadoop.util.ReadWriteDiskValidator.checkStatus(ReadWriteDiskValidator.java:82) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:255) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:312) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:198) > ... 2 more > Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: > /opt/HA/AN0805/nmlocal/usercache/dsperf/appcache/application_1557736108162_0009/filecache/11 > is not a directory! > at > org.apache.hadoop.util.ReadWriteDiskValidator.checkStatus(ReadWriteDiskValidator.java:50) > {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9557) Application fails in diskchecker when ReadWriteDiskValidator is configured.
[ https://issues.apache.org/jira/browse/YARN-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861291#comment-16861291 ] Hudson commented on YARN-9557: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16723 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16723/]) YARN-9557. Application fails in diskchecker when ReadWriteDiskValidator (bibinchundatt: rev 2263ead3657fbb7ce641dcde9b40f15113b21720) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java > Application fails in diskchecker when ReadWriteDiskValidator is configured. > --- > > Key: YARN-9557 > URL: https://issues.apache.org/jira/browse/YARN-9557 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 > Environment: Configure: > > yarn.nodemanager.disk-validator > read-write > >Reporter: Anuruddh Nayak >Assignee: Bilwa S T >Priority: Critical > Attachments: YARN-9557-001.patch, YARN-9557-002.patch, > YARN-9557-003.patch > > > Application fails to execute successfully when ReadWriteDiskValidator is > configured. > {code:java} > > yarn.nodemanager.disk-validator > read-write > > {code} > {noformat} > Exception thrown while starting Container: > java.io.IOException: org.apache.hadoop.util.DiskChecker$DiskErrorException: > Disk Check failed! > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:200) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1233) > Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Disk Check > failed! > at > org.apache.hadoop.util.ReadWriteDiskValidator.checkStatus(ReadWriteDiskValidator.java:82) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:255) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:312) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:198) > ... 2 more > Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: > /opt/HA/AN0805/nmlocal/usercache/dsperf/appcache/application_1557736108162_0009/filecache/11 > is not a directory! > at > org.apache.hadoop.util.ReadWriteDiskValidator.checkStatus(ReadWriteDiskValidator.java:50) > {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9602) Use logger format in Container Executor.
[ https://issues.apache.org/jira/browse/YARN-9602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861276#comment-16861276 ] Abhishek Modi commented on YARN-9602: - Thanks [~bibinchundatt] and [~elgoiri]. > Use logger format in Container Executor. > > > Key: YARN-9602 > URL: https://issues.apache.org/jira/browse/YARN-9602 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9602.001.patch, YARN-9602.002.patch, > YARN-9602.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9565) RMAppImpl#ranNodes not cleared on FinalTransition
[ https://issues.apache.org/jira/browse/YARN-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861260#comment-16861260 ] Hudson commented on YARN-9565: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16722 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16722/]) YARN-9565. RMAppImpl#ranNodes not cleared on FinalTransition. (bibinchundatt: rev 60c95e9b6a899e37ecdc8bce7bb6d9ed0dc7a6be) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java > RMAppImpl#ranNodes not cleared on FinalTransition > - > > Key: YARN-9565 > URL: https://issues.apache.org/jira/browse/YARN-9565 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-9565-001.patch, YARN-9565-002.patch, > YARN-9565-003.patch > > > RMAppImpl holds the list of nodes on which containers ran which is never > cleared. > This could cause memory leak -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9594) Fix missing break statement in ContainerScheduler#handle
[ https://issues.apache.org/jira/browse/YARN-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861235#comment-16861235 ] Hudson commented on YARN-9594: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16720 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16720/]) YARN-9594. Fix missing break statement in ContainerScheduler#handle. (bibinchundatt: rev 6d80b9bc3ff3ba8073e3faf64551b9109d2aa2ad) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/ContainerScheduler.java > Fix missing break statement in ContainerScheduler#handle > > > Key: YARN-9594 > URL: https://issues.apache.org/jira/browse/YARN-9594 > Project: Hadoop YARN > Issue Type: Bug >Reporter: lujie >Assignee: lujie >Priority: Major > Attachments: YARN-9594_1.patch > > > It seems that we miss a break in switch-case > {code:java} > case RECOVERY_COMPLETED: > startPendingContainers(maxOppQueueLength <= 0); > metrics.setQueuedContainers(queuedOpportunisticContainers.size(), > queuedGuaranteedContainers.size()); > //break;missed > default: > LOG.error("Unknown event arrived at ContainerScheduler: " > + event.toString()); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9594) Fix missing break statement in ContainerScheduler#handle
[ https://issues.apache.org/jira/browse/YARN-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-9594: --- Summary: Fix missing break statement in ContainerScheduler#handle (was: Unknown event arrived at ContainerScheduler: EventType: RECOVERY_COMPLETED) > Fix missing break statement in ContainerScheduler#handle > > > Key: YARN-9594 > URL: https://issues.apache.org/jira/browse/YARN-9594 > Project: Hadoop YARN > Issue Type: Bug >Reporter: lujie >Assignee: lujie >Priority: Major > Attachments: YARN-9594_1.patch > > > It seems that we miss a break in switch-case > {code:java} > case RECOVERY_COMPLETED: > startPendingContainers(maxOppQueueLength <= 0); > metrics.setQueuedContainers(queuedOpportunisticContainers.size(), > queuedGuaranteedContainers.size()); > //break;missed > default: > LOG.error("Unknown event arrived at ContainerScheduler: " > + event.toString()); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9602) Use logger format in Container Executor.
[ https://issues.apache.org/jira/browse/YARN-9602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861227#comment-16861227 ] Hudson commented on YARN-9602: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16719 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16719/]) YARN-9602. Use logger format in Container Executor. Contributed by (bibinchundatt: rev f7df55f4a89ed2d75d874b32209647ef4f448875) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java > Use logger format in Container Executor. > > > Key: YARN-9602 > URL: https://issues.apache.org/jira/browse/YARN-9602 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9602.001.patch, YARN-9602.002.patch, > YARN-9602.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9202) RM does not track nodes that are in the include list and never register
[ https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861188#comment-16861188 ] Jim Brennan commented on YARN-9202: --- Thanks for the explanation [~kshukla]! The test failure looks similar to the one reported in YARN-9540. > RM does not track nodes that are in the include list and never register > --- > > Key: YARN-9202 > URL: https://issues.apache.org/jira/browse/YARN-9202 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.2, 3.0.3, 2.8.5 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Attachments: YARN-9202.001.patch, YARN-9202.002.patch > > > The RM state machine decides to put new or running nodes in inactive state > only past the point of either registration or being in the exclude list. This > does not cover the case where a node is the in the include list but never > registers and since all state changes are based on these NodeState > transitions, having NEW nodes be listed as inactive first may help. This > would change the semantics of how inactiveNodes are looked at today. Another > state addition might help this case too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9202) RM does not track nodes that are in the include list and never register
[ https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861171#comment-16861171 ] Kuhu Shukla commented on YARN-9202: --- I am unable to reproduce this case locally but investigating some more. AFAICT , so far, it seems unrelated. > RM does not track nodes that are in the include list and never register > --- > > Key: YARN-9202 > URL: https://issues.apache.org/jira/browse/YARN-9202 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.2, 3.0.3, 2.8.5 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Attachments: YARN-9202.001.patch, YARN-9202.002.patch > > > The RM state machine decides to put new or running nodes in inactive state > only past the point of either registration or being in the exclude list. This > does not cover the case where a node is the in the include list but never > registers and since all state changes are based on these NodeState > transitions, having NEW nodes be listed as inactive first may help. This > would change the semantics of how inactiveNodes are looked at today. Another > state addition might help this case too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9202) RM does not track nodes that are in the include list and never register
[ https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861170#comment-16861170 ] Kuhu Shukla commented on YARN-9202: --- [~Jim_Brennan], the nodes from the inactive list (with port= -1) are thrown away once the actual NM registration come through and creates the new RMNode object. Since that is the case for any new node trying to register, we do not need the shutdown to running transition since the rmnode object that is in shutdown state is never really used so to say. > RM does not track nodes that are in the include list and never register > --- > > Key: YARN-9202 > URL: https://issues.apache.org/jira/browse/YARN-9202 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.2, 3.0.3, 2.8.5 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Attachments: YARN-9202.001.patch, YARN-9202.002.patch > > > The RM state machine decides to put new or running nodes in inactive state > only past the point of either registration or being in the exclude list. This > does not cover the case where a node is the in the include list but never > registers and since all state changes are based on these NodeState > transitions, having NEW nodes be listed as inactive first may help. This > would change the semantics of how inactiveNodes are looked at today. Another > state addition might help this case too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.
[ https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861069#comment-16861069 ] zhuqi commented on YARN-8995: - cc [~Tao Yang] Thanks [~Tao Yang] for your comment. Now i have fixed my patch: # Count events details in realtime. # Add the configurable record interval. # Add a boolean flag to control whether to print the latest event. Thanks a lot. > Log the event type of the too big AsyncDispatcher event queue size, and add > the information to the metrics. > > > Key: YARN-8995 > URL: https://issues.apache.org/jira/browse/YARN-8995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, nodemanager, resourcemanager >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: YARN-8995.001.patch, YARN-8995.002.patch > > > In our growing cluster,there are unexpected situations that cause some event > queues to block the performance of the cluster, such as the bug of > https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to > log the event type of the too big event queue size, and add the information > to the metrics, and the threshold of queue size is a parametor which can be > changed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.
[ https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated YARN-8995: Attachment: YARN-8995.002.patch > Log the event type of the too big AsyncDispatcher event queue size, and add > the information to the metrics. > > > Key: YARN-8995 > URL: https://issues.apache.org/jira/browse/YARN-8995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, nodemanager, resourcemanager >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: YARN-8995.001.patch, YARN-8995.002.patch > > > In our growing cluster,there are unexpected situations that cause some event > queues to block the performance of the cluster, such as the bug of > https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to > log the event type of the too big event queue size, and add the information > to the metrics, and the threshold of queue size is a parametor which can be > changed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9618) NodeListManager event improvement
[ https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-9618: --- Description: Current implementation nodelistmanager event blocks async dispacher and can cause RM crash and slowing down event processing. # Cluster restart with 1K running apps . Each usable event will create 1K events over all events could be 5k*1k events for 5K cluster # Event processing is blocked till new events are added to queue. Solution : # Add another async Event handler similar to scheduler. # Instead of adding events to dispatcher directly call RMApp event handler. was: Current implementation nodelistmanager event blocks async dispacher and can cause RM crash and slowing down event processing. # Cluster restart with 1K running apps . Each usable event will create 1K events over all events could be 1k*1k events # Event processing is blocked till new events are added to queue. Solution : # Add another async Event handler similar to scheduler. # Instead of adding events to dispatcher directly call RMApp event handler. > NodeListManager event improvement > - > > Key: YARN-9618 > URL: https://issues.apache.org/jira/browse/YARN-9618 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Priority: Critical > > Current implementation nodelistmanager event blocks async dispacher and can > cause RM crash and slowing down event processing. > # Cluster restart with 1K running apps . Each usable event will create 1K > events over all events could be 5k*1k events for 5K cluster > # Event processing is blocked till new events are added to queue. > Solution : > # Add another async Event handler similar to scheduler. > # Instead of adding events to dispatcher directly call RMApp event handler. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9618) NodeListManager event improvement
Bibin A Chundatt created YARN-9618: -- Summary: NodeListManager event improvement Key: YARN-9618 URL: https://issues.apache.org/jira/browse/YARN-9618 Project: Hadoop YARN Issue Type: Improvement Reporter: Bibin A Chundatt Current implementation nodelistmanager event blocks async dispacher and can cause RM crash and slowing down event processing. # Cluster restart with 1K running apps . Each usable event will create 1K events over all events could be 1k*1k events # Event processing is blocked till new events are added to queue. Solution : # Add another async Event handler similar to scheduler. # Instead of adding events to dispatcher directly call RMApp event handler. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9618) NodeListManager event improvement
[ https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861022#comment-16861022 ] Bibin A Chundatt commented on YARN-9618: cc:// [~sunil.gov...@gmail.com],[~leftnoteasy] > NodeListManager event improvement > - > > Key: YARN-9618 > URL: https://issues.apache.org/jira/browse/YARN-9618 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Priority: Critical > > Current implementation nodelistmanager event blocks async dispacher and can > cause RM crash and slowing down event processing. > # Cluster restart with 1K running apps . Each usable event will create 1K > events over all events could be 1k*1k events > # Event processing is blocked till new events are added to queue. > Solution : > # Add another async Event handler similar to scheduler. > # Instead of adding events to dispatcher directly call RMApp event handler. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9617) RM UI enables viewing pages using Timeline Reader for a user who can not access the YARN config endpoint
Balázs Szabó created YARN-9617: -- Summary: RM UI enables viewing pages using Timeline Reader for a user who can not access the YARN config endpoint Key: YARN-9617 URL: https://issues.apache.org/jira/browse/YARN-9617 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Affects Versions: 3.1.1 Reporter: Balázs Szabó Attachments: 1.png, 2.png If a user who can not access the /conf endpoint she/he will be unable to query the address of the Timeline Service Reader (yarn.timeline-service.reader.webapp.address). In this case, the user receives a "403 Unauthenticated users are not authorized to access this page" response, when trying to view pages requesting data from the Timeline Reader (i.e. Flow Activity tab). In this case the UI is falling back to the default address (localhost:8188), which eventually yields the 401 response (see attached screenshots). !1.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9609) Nodemanager Web Service should return logAggregationType for each file
[ https://issues.apache.org/jira/browse/YARN-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860683#comment-16860683 ] Prabhu Joseph edited comment on YARN-9609 at 6/11/19 8:49 AM: -- [~yeshavora] {{NMWebService}} groups the LOCAL files in one {{containerLogsInfo}} and groups the AGGREGATED files in another {{containerLogsInfo}}. Below is the output when both LOCAL and AGGREGATED files are present for a container. I think the below structure looks fine, showing logAggregationType for each log file will be verbose and redundant. {code:xml} prelaunch.out 100 Tue Jun 11 08:20:10 + 2019 launch_container.sh 5403 Tue Jun 11 08:20:10 + 2019 LOCAL container_e01_1559302665385_0004_01_02 yarn-ats-1:45454 auditlog 123 Tue Jun 11 08:20:10 + 2019 auditlog 123 Tue Jun 11 08:20:10 + 2019 AGGREGATED container_e01_1559302665385_0004_01_02 yarn-ats-1:45454 {code} was (Author: prabhu joseph): [~yvora] {{NMWebService}} groups the LOCAL files in one {{containerLogsInfo}} and groups the AGGREGATED files in another {{containerLogsInfo}}. Below is the output when both LOCAL and AGGREGATED files are present for a container. I think the below structure looks fine, showing logAggregationType for each log file will be verbose and redundant. {code:xml} prelaunch.out 100 Tue Jun 11 08:20:10 + 2019 launch_container.sh 5403 Tue Jun 11 08:20:10 + 2019 LOCAL container_e01_1559302665385_0004_01_02 yarn-ats-1:45454 auditlog 123 Tue Jun 11 08:20:10 + 2019 auditlog 123 Tue Jun 11 08:20:10 + 2019 AGGREGATED container_e01_1559302665385_0004_01_02 yarn-ats-1:45454 {code} > Nodemanager Web Service should return logAggregationType for each file > -- > > Key: YARN-9609 > URL: https://issues.apache.org/jira/browse/YARN-9609 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Prabhu Joseph >Priority: Critical > > Steps: > 1) Launch sleeper yarn service > 2) When sleeper component is in READY state, call NM web service to list the > container files and its log aggregation status. > http://NMHost:NMPort/ws/v1/node/containers/CONTAINERID/logs > NM web service response shows a common log aggregation type response for all > files. > Instead, NM web service should return a log aggregation type for each file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9609) Nodemanager Web Service should return logAggregationType for each file
[ https://issues.apache.org/jira/browse/YARN-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860683#comment-16860683 ] Prabhu Joseph commented on YARN-9609: - [~yvora] {{NMWebService}} groups the LOCAL files in one {{containerLogsInfo}} and groups the AGGREGATED files in another {{containerLogsInfo}}. Below is the output when both LOCAL and AGGREGATED files are present for a container. I think the below structure looks fine, showing logAggregationType for each log file will be verbose and redundant. {code:xml} prelaunch.out 100 Tue Jun 11 08:20:10 + 2019 launch_container.sh 5403 Tue Jun 11 08:20:10 + 2019 LOCAL container_e01_1559302665385_0004_01_02 yarn-ats-1:45454 auditlog 123 Tue Jun 11 08:20:10 + 2019 auditlog 123 Tue Jun 11 08:20:10 + 2019 AGGREGATED container_e01_1559302665385_0004_01_02 yarn-ats-1:45454 {code} > Nodemanager Web Service should return logAggregationType for each file > -- > > Key: YARN-9609 > URL: https://issues.apache.org/jira/browse/YARN-9609 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Prabhu Joseph >Priority: Critical > > Steps: > 1) Launch sleeper yarn service > 2) When sleeper component is in READY state, call NM web service to list the > container files and its log aggregation status. > http://NMHost:NMPort/ws/v1/node/containers/CONTAINERID/logs > NM web service response shows a common log aggregation type response for all > files. > Instead, NM web service should return a log aggregation type for each file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9471) Cleanup in TestLogAggregationIndexFileController
[ https://issues.apache.org/jira/browse/YARN-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860628#comment-16860628 ] Adam Antal commented on YARN-9471: -- Thanks for the review [~snemeth], and [~jojochuang] for the commit! > Cleanup in TestLogAggregationIndexFileController > > > Key: YARN-9471 > URL: https://issues.apache.org/jira/browse/YARN-9471 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, yarn >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9471.001.patch, YARN-9471.002.patch > > > {{TestLogAggregationIndexFileController}} class can be cleaned up a bit: > - bad javadoc link > - should be renamed to TestLogAggregationIndex *ed* FileController > - some private class members can be removed > - static fields from Assert can be imported > - {{StringBuilder}} can be removed from {{logMessage}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9608) DecommissioningNodesWatcher should get lists of running applications on node from RMNode.
[ https://issues.apache.org/jira/browse/YARN-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860614#comment-16860614 ] jialei weng commented on YARN-9608: --- Thanks, [~abmodi]. Good to learn more. > DecommissioningNodesWatcher should get lists of running applications on node > from RMNode. > - > > Key: YARN-9608 > URL: https://issues.apache.org/jira/browse/YARN-9608 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9608.001.patch > > > At present, DecommissioningNodesWatcher tracks list of running applications > and triggers decommission of nodes when all the applications that ran on the > node completes. This Jira proposes to solve following problem: > # DecommissioningNodesWatcher skips tracking application containers on a > particular node before the node is in DECOMMISSIONING state. It only tracks > containers once the node is in DECOMMISSIONING state. This can lead to > shuffle data loss of apps whose containers ran on this node before it was > moved to decommissioning state. > # It is keeping track of running apps. We can leverage this directly from > RMNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org