[jira] [Resolved] (YARN-6875) New aggregated log file format for YARN log aggregation.

2019-06-11 Thread Lars Francke (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke resolved YARN-6875.

Resolution: Fixed

> New aggregated log file format for YARN log aggregation.
> 
>
> Key: YARN-6875
> URL: https://issues.apache.org/jira/browse/YARN-6875
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-6875-NewLogAggregationFormat-design-doc.pdf
>
>
> T-file is the underlying log format for the aggregated logs in YARN. We have 
> seen several performance issues, especially for very large log files.
> We will introduce a new log format which have better performance for large 
> log files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9619) Transfer error AM host/ip when launching app using docker container with bridge network

2019-06-11 Thread caozhiqiang (JIRA)
caozhiqiang created YARN-9619:
-

 Summary: Transfer error AM host/ip when launching app using docker 
container with bridge network
 Key: YARN-9619
 URL: https://issues.apache.org/jira/browse/YARN-9619
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.3.0
Reporter: caozhiqiang


When launching application using docker container with bridge network in 
overlay networks, client will polling the rate of application process from 
ApplicationMaster with error host/IP. client also polling from the 
nodemanager's hostname/IP, but not from the docker's IP which AM real running 
in. The error message is below(the server hadoop3-1/192.168.2.105 is NM's, not 
AM's docker IP, so it can't be accessed):

2019-05-11 08:28:46,361 INFO ipc.Client: Retrying connect to server: 
hadoop3-1/192.168.2.105:37963. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2019-05-11 08:28:47,363 INFO ipc.Client: Retrying connect to server: 
hadoop3-1/192.168.2.105:37963. Already tried 1 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2019-05-11 08:28:48,365 INFO ipc.Client: Retrying connect to server: 
hadoop3-1/192.168.2.105:37963. Already tried 2 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2019-05-10 08:34:40,235 INFO mapred.ClientServiceDelegate: Application state is 
completed. FinalApplicationStatus=FAILED. Redirecting to job history server
2019-05-10 08:35:00,408 INFO ipc.Client: Retrying connect to server: 
0.0.0.0/0.0.0.0:12020. Already tried 8 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-05-10 08:35:00,408 INFO ipc.Client: Retrying connect to server: 
0.0.0.0/0.0.0.0:12020. Already tried 9 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
java.io.IOException: java.net.ConnectException: Your endpoint configuration is 
wrong; For more details see: http://wiki.apache.org/hadoop/UnsetHostnameOrPort
 at 
org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:345)
 at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:430)
 at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:871)
 at org.apache.hadoop.mapreduce.Job$1.run(Job.java:331)
 at org.apache.hadoop.mapreduce.Job$1.run(Job.java:328)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:328)
 at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:612)
 at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1629)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1591)
 at 
org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:307)
 at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:360)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
 at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:368)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.net.ConnectException: Your endpoint configuration is wrong; For 
more details see: http://wiki.apache.org/hadoop/UnsetHostnameOrPort
 at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown Source)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751)
 at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
 at org.apache.hadoop.ipc.Client.call(Client.java:1457)
 at or

[jira] [Commented] (YARN-9612) Support using ip to register NodeID

2019-06-11 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861773#comment-16861773
 ] 

Zhankun Tang commented on YARN-9612:


[~cane], Thanks. But maybe I don't have much context here. Could please you 
elaborate on why service name make things worse?

> Support using ip to register NodeID
> ---
>
> Key: YARN-9612
> URL: https://issues.apache.org/jira/browse/YARN-9612
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhoukang
>Priority: Major
>
> In the environment like k8s. We should support ip when register NodeID with 
> RM since the hostname will be podName which can not be be resolved by DNS of 
> k8s



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM

2019-06-11 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861745#comment-16861745
 ] 

Bibin A Chundatt commented on YARN-9615:


Thank you [~jhung]

Incase you havnt started  working on this , i could add  patch we had done for 
our setup, which exports to JMX too.
Visualization through grafana

 !screenshot-1.png! 


> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9615) Add dispatcher metrics to RM

2019-06-11 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-9615:
---
Attachment: screenshot-1.png

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.

2019-06-11 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861741#comment-16861741
 ] 

Tao Yang commented on YARN-8995:


Thanks [~zhuqi] for updating the patch.
Comments about the new patch:
* For the latest event, I didn't mean that it should be control separately from 
the counter info, we can add a boolean flag defaults to false, which can be 
updated to true when triggering to print the details (for example queue size 
has reached N*5000) and to false after latest event has already been printed.
* Configuration reading logic should be moved to serviceStart() for better 
performance.
* The printEventQueueDetails method can be simplified via stream API, moreover, 
value type of counterMap should use Long instead of long[].
* The new configuration entry should have a clear name, for example 
"yarn.dispatcher.print-events-debug-info.interval-in-thousands" in a random 
think, you can give a better name for it. I suppose we should take thousands as 
the unit since the print switch is due to another condition (qSize % 1000 == 0).

> Log the event type of the too big AsyncDispatcher event queue size, and add 
> the information to the metrics. 
> 
>
> Key: YARN-8995
> URL: https://issues.apache.org/jira/browse/YARN-8995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, nodemanager, resourcemanager
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-8995.001.patch, YARN-8995.002.patch
>
>
> In our growing cluster,there are unexpected situations that cause some event 
> queues to block the performance of the cluster, such as the bug of  
> https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to 
> log the event type of the too big event queue size, and add the information 
> to the metrics, and the threshold of queue size is a parametor which can be 
> changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.

2019-06-11 Thread zhuqi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated YARN-8995:

Attachment: (was: YARN-8995.002.patch)

> Log the event type of the too big AsyncDispatcher event queue size, and add 
> the information to the metrics. 
> 
>
> Key: YARN-8995
> URL: https://issues.apache.org/jira/browse/YARN-8995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, nodemanager, resourcemanager
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-8995.001.patch, YARN-8995.002.patch
>
>
> In our growing cluster,there are unexpected situations that cause some event 
> queues to block the performance of the cluster, such as the bug of  
> https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to 
> log the event type of the too big event queue size, and add the information 
> to the metrics, and the threshold of queue size is a parametor which can be 
> changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.

2019-06-11 Thread zhuqi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated YARN-8995:

Attachment: YARN-8995.002.patch

> Log the event type of the too big AsyncDispatcher event queue size, and add 
> the information to the metrics. 
> 
>
> Key: YARN-8995
> URL: https://issues.apache.org/jira/browse/YARN-8995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, nodemanager, resourcemanager
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-8995.001.patch, YARN-8995.002.patch
>
>
> In our growing cluster,there are unexpected situations that cause some event 
> queues to block the performance of the cluster, such as the bug of  
> https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to 
> log the event type of the too big event queue size, and add the information 
> to the metrics, and the threshold of queue size is a parametor which can be 
> changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9327) ProtoUtils#convertToProtoFormat block Application Master Service and many more

2019-06-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861483#comment-16861483
 ] 

Hadoop QA commented on YARN-9327:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
45s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
49s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.5 Server=18.09.5 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9327 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12959792/YARN-9327.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f103033c7194 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 5740eea |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24259/testReport/ |
| Max. process+thread count | 351 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24259/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Pr

[jira] [Commented] (YARN-9327) ProtoUtils#convertToProtoFormat block Application Master Service and many more

2019-06-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861400#comment-16861400
 ] 

Hadoop QA commented on YARN-9327:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
46s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 58m 58s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9327 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12959792/YARN-9327.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4cd10e82dde3 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e997f2a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24258/testReport/ |
| Max. process+thread count | 446 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24258/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.




[jira] [Commented] (YARN-9301) Too many InvalidStateTransitionException with SLS

2019-06-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861376#comment-16861376
 ] 

Hadoop QA commented on YARN-9301:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 13s{color} | {color:orange} hadoop-tools/hadoop-sls: The patch generated 1 
new + 3 unchanged - 0 fixed = 4 total (was 3) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 51s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
47s{color} | {color:green} hadoop-sls in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.5 Server=18.09.5 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9301 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12969964/YARN-9301-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 117d368dc7e4 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 3c9a5e7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24257/artifact/out/diff-checkstyle-hadoop-tools_hadoop-sls.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24257/testReport/ |
| Max. process+thread count | 448 (vs. ulimit of 5500) |
| modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls |
| Console output | 
https://builds.apache.org/job

[jira] [Commented] (YARN-9557) Application fails in diskchecker when ReadWriteDiskValidator is configured.

2019-06-11 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861290#comment-16861290
 ] 

Bibin A Chundatt commented on YARN-9557:


Committed to trunk

[~BilwaST] Could you please upload patch for branch-3.2



> Application fails in diskchecker when ReadWriteDiskValidator is configured.
> ---
>
> Key: YARN-9557
> URL: https://issues.apache.org/jira/browse/YARN-9557
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
> Environment: Configure:
> 
>  yarn.nodemanager.disk-validator
>  read-write
>  
>Reporter: Anuruddh Nayak
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9557-001.patch, YARN-9557-002.patch, 
> YARN-9557-003.patch
>
>
> Application fails to execute successfully when ReadWriteDiskValidator is 
> configured.
> {code:java}
> 
> yarn.nodemanager.disk-validator
> read-write
> 
> {code}
> {noformat}
> Exception thrown while starting Container:
> java.io.IOException: org.apache.hadoop.util.DiskChecker$DiskErrorException: 
> Disk Check failed!
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:200)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1233)
>  Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Disk Check 
> failed!
>  at 
> org.apache.hadoop.util.ReadWriteDiskValidator.checkStatus(ReadWriteDiskValidator.java:82)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:255)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:312)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:198)
>  ... 2 more
>  Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: 
> /opt/HA/AN0805/nmlocal/usercache/dsperf/appcache/application_1557736108162_0009/filecache/11
>  is not a directory!
>  at 
> org.apache.hadoop.util.ReadWriteDiskValidator.checkStatus(ReadWriteDiskValidator.java:50)
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9557) Application fails in diskchecker when ReadWriteDiskValidator is configured.

2019-06-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861291#comment-16861291
 ] 

Hudson commented on YARN-9557:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16723 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16723/])
YARN-9557. Application fails in diskchecker when ReadWriteDiskValidator 
(bibinchundatt: rev 2263ead3657fbb7ce641dcde9b40f15113b21720)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java


> Application fails in diskchecker when ReadWriteDiskValidator is configured.
> ---
>
> Key: YARN-9557
> URL: https://issues.apache.org/jira/browse/YARN-9557
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
> Environment: Configure:
> 
>  yarn.nodemanager.disk-validator
>  read-write
>  
>Reporter: Anuruddh Nayak
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9557-001.patch, YARN-9557-002.patch, 
> YARN-9557-003.patch
>
>
> Application fails to execute successfully when ReadWriteDiskValidator is 
> configured.
> {code:java}
> 
> yarn.nodemanager.disk-validator
> read-write
> 
> {code}
> {noformat}
> Exception thrown while starting Container:
> java.io.IOException: org.apache.hadoop.util.DiskChecker$DiskErrorException: 
> Disk Check failed!
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:200)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1233)
>  Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Disk Check 
> failed!
>  at 
> org.apache.hadoop.util.ReadWriteDiskValidator.checkStatus(ReadWriteDiskValidator.java:82)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:255)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:312)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:198)
>  ... 2 more
>  Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: 
> /opt/HA/AN0805/nmlocal/usercache/dsperf/appcache/application_1557736108162_0009/filecache/11
>  is not a directory!
>  at 
> org.apache.hadoop.util.ReadWriteDiskValidator.checkStatus(ReadWriteDiskValidator.java:50)
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9602) Use logger format in Container Executor.

2019-06-11 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861276#comment-16861276
 ] 

Abhishek Modi commented on YARN-9602:
-

Thanks [~bibinchundatt] and [~elgoiri].

> Use logger format in Container Executor.
> 
>
> Key: YARN-9602
> URL: https://issues.apache.org/jira/browse/YARN-9602
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9602.001.patch, YARN-9602.002.patch, 
> YARN-9602.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9565) RMAppImpl#ranNodes not cleared on FinalTransition

2019-06-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861260#comment-16861260
 ] 

Hudson commented on YARN-9565:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16722 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16722/])
YARN-9565. RMAppImpl#ranNodes not cleared on FinalTransition. (bibinchundatt: 
rev 60c95e9b6a899e37ecdc8bce7bb6d9ed0dc7a6be)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


> RMAppImpl#ranNodes not cleared on FinalTransition
> -
>
> Key: YARN-9565
> URL: https://issues.apache.org/jira/browse/YARN-9565
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-9565-001.patch, YARN-9565-002.patch, 
> YARN-9565-003.patch
>
>
> RMAppImpl holds the list of  nodes on which containers ran which is never 
> cleared.
> This could cause memory leak



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9594) Fix missing break statement in ContainerScheduler#handle

2019-06-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861235#comment-16861235
 ] 

Hudson commented on YARN-9594:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16720 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16720/])
YARN-9594. Fix missing break statement in ContainerScheduler#handle. 
(bibinchundatt: rev 6d80b9bc3ff3ba8073e3faf64551b9109d2aa2ad)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/ContainerScheduler.java


> Fix missing break statement in ContainerScheduler#handle
> 
>
> Key: YARN-9594
> URL: https://issues.apache.org/jira/browse/YARN-9594
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Attachments: YARN-9594_1.patch
>
>
> It seems that we miss a break in switch-case
> {code:java}
> case RECOVERY_COMPLETED:
>   startPendingContainers(maxOppQueueLength <= 0);
>   metrics.setQueuedContainers(queuedOpportunisticContainers.size(),
>  queuedGuaranteedContainers.size());
> //break;missed
> default:
>   LOG.error("Unknown event arrived at ContainerScheduler: "
> + event.toString());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9594) Fix missing break statement in ContainerScheduler#handle

2019-06-11 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-9594:
---
Summary: Fix missing break statement in ContainerScheduler#handle  (was: 
Unknown event arrived at ContainerScheduler: EventType: RECOVERY_COMPLETED)

> Fix missing break statement in ContainerScheduler#handle
> 
>
> Key: YARN-9594
> URL: https://issues.apache.org/jira/browse/YARN-9594
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Attachments: YARN-9594_1.patch
>
>
> It seems that we miss a break in switch-case
> {code:java}
> case RECOVERY_COMPLETED:
>   startPendingContainers(maxOppQueueLength <= 0);
>   metrics.setQueuedContainers(queuedOpportunisticContainers.size(),
>  queuedGuaranteedContainers.size());
> //break;missed
> default:
>   LOG.error("Unknown event arrived at ContainerScheduler: "
> + event.toString());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9602) Use logger format in Container Executor.

2019-06-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861227#comment-16861227
 ] 

Hudson commented on YARN-9602:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16719 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16719/])
YARN-9602. Use logger format in Container Executor. Contributed by 
(bibinchundatt: rev f7df55f4a89ed2d75d874b32209647ef4f448875)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java


> Use logger format in Container Executor.
> 
>
> Key: YARN-9602
> URL: https://issues.apache.org/jira/browse/YARN-9602
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9602.001.patch, YARN-9602.002.patch, 
> YARN-9602.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9202) RM does not track nodes that are in the include list and never register

2019-06-11 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861188#comment-16861188
 ] 

Jim Brennan commented on YARN-9202:
---

Thanks for the explanation [~kshukla]!   The test failure looks similar to the 
one reported in YARN-9540.

> RM does not track nodes that are in the include list and never register
> ---
>
> Key: YARN-9202
> URL: https://issues.apache.org/jira/browse/YARN-9202
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.2, 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9202.001.patch, YARN-9202.002.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state 
> only past the point of either registration or being in the exclude list. This 
> does not cover the case where a node is the in the include list but never 
> registers and since all state changes are based on these NodeState 
> transitions, having NEW nodes be listed as inactive first may help. This 
> would change the semantics of how inactiveNodes are looked at today. Another 
> state addition might help this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9202) RM does not track nodes that are in the include list and never register

2019-06-11 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861171#comment-16861171
 ] 

Kuhu Shukla commented on YARN-9202:
---

I am unable to reproduce this case locally but investigating some more. AFAICT 
, so far, it seems unrelated.

> RM does not track nodes that are in the include list and never register
> ---
>
> Key: YARN-9202
> URL: https://issues.apache.org/jira/browse/YARN-9202
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.2, 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9202.001.patch, YARN-9202.002.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state 
> only past the point of either registration or being in the exclude list. This 
> does not cover the case where a node is the in the include list but never 
> registers and since all state changes are based on these NodeState 
> transitions, having NEW nodes be listed as inactive first may help. This 
> would change the semantics of how inactiveNodes are looked at today. Another 
> state addition might help this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9202) RM does not track nodes that are in the include list and never register

2019-06-11 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861170#comment-16861170
 ] 

Kuhu Shukla commented on YARN-9202:
---

[~Jim_Brennan], the nodes from the inactive list (with port= -1) are thrown 
away once the actual NM registration come through and creates the new RMNode 
object. Since that is the case for any new node trying to register, we do not 
need the shutdown to running transition since the rmnode object that is in 
shutdown state is never really used so to say.

 

> RM does not track nodes that are in the include list and never register
> ---
>
> Key: YARN-9202
> URL: https://issues.apache.org/jira/browse/YARN-9202
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.2, 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9202.001.patch, YARN-9202.002.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state 
> only past the point of either registration or being in the exclude list. This 
> does not cover the case where a node is the in the include list but never 
> registers and since all state changes are based on these NodeState 
> transitions, having NEW nodes be listed as inactive first may help. This 
> would change the semantics of how inactiveNodes are looked at today. Another 
> state addition might help this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.

2019-06-11 Thread zhuqi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861069#comment-16861069
 ] 

zhuqi commented on YARN-8995:
-

cc  [~Tao Yang]

Thanks [~Tao Yang]  for your comment.

Now i have fixed my patch:
 # Count events  details in realtime.
 # Add the configurable record interval.
 # Add a boolean flag to control whether to print the latest event.   

Thanks a lot.

> Log the event type of the too big AsyncDispatcher event queue size, and add 
> the information to the metrics. 
> 
>
> Key: YARN-8995
> URL: https://issues.apache.org/jira/browse/YARN-8995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, nodemanager, resourcemanager
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-8995.001.patch, YARN-8995.002.patch
>
>
> In our growing cluster,there are unexpected situations that cause some event 
> queues to block the performance of the cluster, such as the bug of  
> https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to 
> log the event type of the too big event queue size, and add the information 
> to the metrics, and the threshold of queue size is a parametor which can be 
> changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8995) Log the event type of the too big AsyncDispatcher event queue size, and add the information to the metrics.

2019-06-11 Thread zhuqi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated YARN-8995:

Attachment: YARN-8995.002.patch

> Log the event type of the too big AsyncDispatcher event queue size, and add 
> the information to the metrics. 
> 
>
> Key: YARN-8995
> URL: https://issues.apache.org/jira/browse/YARN-8995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, nodemanager, resourcemanager
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-8995.001.patch, YARN-8995.002.patch
>
>
> In our growing cluster,there are unexpected situations that cause some event 
> queues to block the performance of the cluster, such as the bug of  
> https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to 
> log the event type of the too big event queue size, and add the information 
> to the metrics, and the threshold of queue size is a parametor which can be 
> changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9618) NodeListManager event improvement

2019-06-11 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-9618:
---
Description: 
Current implementation nodelistmanager event blocks async dispacher and can 
cause RM crash and slowing down event processing.


# Cluster restart with 1K running apps . Each usable event will create 1K 
events over all events could be 5k*1k events for 5K cluster
# Event processing is blocked till new events are added to queue.

Solution :

# Add another async Event handler similar to scheduler.
# Instead of adding events to dispatcher directly call RMApp event handler.



  was:
Current implementation nodelistmanager event blocks async dispacher and can 
cause RM crash and slowing down event processing.


# Cluster restart with 1K running apps . Each usable event will create 1K 
events over all events could be 1k*1k events
# Event processing is blocked till new events are added to queue.

Solution :

# Add another async Event handler similar to scheduler.
# Instead of adding events to dispatcher directly call RMApp event handler.




> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Priority: Critical
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9618) NodeListManager event improvement

2019-06-11 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-9618:
--

 Summary: NodeListManager event improvement
 Key: YARN-9618
 URL: https://issues.apache.org/jira/browse/YARN-9618
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bibin A Chundatt


Current implementation nodelistmanager event blocks async dispacher and can 
cause RM crash and slowing down event processing.


# Cluster restart with 1K running apps . Each usable event will create 1K 
events over all events could be 1k*1k events
# Event processing is blocked till new events are added to queue.

Solution :

# Add another async Event handler similar to scheduler.
# Instead of adding events to dispatcher directly call RMApp event handler.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9618) NodeListManager event improvement

2019-06-11 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861022#comment-16861022
 ] 

Bibin A Chundatt commented on YARN-9618:


cc:// [~sunil.gov...@gmail.com],[~leftnoteasy]

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Priority: Critical
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 1k*1k events
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9617) RM UI enables viewing pages using Timeline Reader for a user who can not access the YARN config endpoint

2019-06-11 Thread JIRA
Balázs Szabó created YARN-9617:
--

 Summary: RM UI enables viewing pages using Timeline Reader for a 
user who can not access the YARN config endpoint
 Key: YARN-9617
 URL: https://issues.apache.org/jira/browse/YARN-9617
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Affects Versions: 3.1.1
Reporter: Balázs Szabó
 Attachments: 1.png, 2.png

If a user who can not access the /conf endpoint she/he will be unable to query 
the address of the Timeline Service Reader 
(yarn.timeline-service.reader.webapp.address). In this case, the user receives 
a "403 Unauthenticated users are not authorized to access this page" response, 
when trying to view pages requesting data from the Timeline Reader (i.e. Flow 
Activity tab). In this case the UI is falling back to the default address 
(localhost:8188), which eventually yields the 401 response (see attached 
screenshots).

 

!1.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9609) Nodemanager Web Service should return logAggregationType for each file

2019-06-11 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860683#comment-16860683
 ] 

Prabhu Joseph edited comment on YARN-9609 at 6/11/19 8:49 AM:
--

[~yeshavora]  {{NMWebService}} groups the LOCAL files in one 
{{containerLogsInfo}} and groups the AGGREGATED files in another 
{{containerLogsInfo}}. Below is the output when both LOCAL and AGGREGATED files 
are present for a container. I think the below structure looks fine, showing 
logAggregationType for each log file will be verbose and redundant.
{code:xml}


   

 prelaunch.out
 100
 Tue Jun 11 08:20:10 + 2019


 launch_container.sh
 5403
 Tue Jun 11 08:20:10 + 2019

LOCAL
container_e01_1559302665385_0004_01_02
yarn-ats-1:45454
   

   

 auditlog
 123
 Tue Jun 11 08:20:10 + 
2019 


 auditlog
 123
 Tue Jun 11 08:20:10 + 2019
 
AGGREGATED
container_e01_1559302665385_0004_01_02
yarn-ats-1:45454
   


{code}


was (Author: prabhu joseph):
[~yvora]  {{NMWebService}} groups the LOCAL files in one {{containerLogsInfo}} 
and groups the AGGREGATED files in another {{containerLogsInfo}}. Below is the 
output when both LOCAL and AGGREGATED files are present for a container. I 
think the below structure looks fine, showing logAggregationType for each log 
file will be verbose and redundant.
{code:xml}


   

 prelaunch.out
 100
 Tue Jun 11 08:20:10 + 2019


 launch_container.sh
 5403
 Tue Jun 11 08:20:10 + 2019

LOCAL
container_e01_1559302665385_0004_01_02
yarn-ats-1:45454
   

   

 auditlog
 123
 Tue Jun 11 08:20:10 + 
2019 


 auditlog
 123
 Tue Jun 11 08:20:10 + 2019
 
AGGREGATED
container_e01_1559302665385_0004_01_02
yarn-ats-1:45454
   


{code}

> Nodemanager Web Service should return logAggregationType for each file
> --
>
> Key: YARN-9609
> URL: https://issues.apache.org/jira/browse/YARN-9609
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Prabhu Joseph
>Priority: Critical
>
> Steps:
> 1) Launch sleeper yarn service
> 2) When sleeper component is in READY state, call NM web service to list the 
> container files and its log aggregation status.
> http://NMHost:NMPort/ws/v1/node/containers/CONTAINERID/logs
> NM web service response shows a common log aggregation type response for all 
> files.
> Instead, NM web service should return a log aggregation type for each file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9609) Nodemanager Web Service should return logAggregationType for each file

2019-06-11 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860683#comment-16860683
 ] 

Prabhu Joseph commented on YARN-9609:
-

[~yvora]  {{NMWebService}} groups the LOCAL files in one {{containerLogsInfo}} 
and groups the AGGREGATED files in another {{containerLogsInfo}}. Below is the 
output when both LOCAL and AGGREGATED files are present for a container. I 
think the below structure looks fine, showing logAggregationType for each log 
file will be verbose and redundant.
{code:xml}


   

 prelaunch.out
 100
 Tue Jun 11 08:20:10 + 2019


 launch_container.sh
 5403
 Tue Jun 11 08:20:10 + 2019

LOCAL
container_e01_1559302665385_0004_01_02
yarn-ats-1:45454
   

   

 auditlog
 123
 Tue Jun 11 08:20:10 + 
2019 


 auditlog
 123
 Tue Jun 11 08:20:10 + 2019
 
AGGREGATED
container_e01_1559302665385_0004_01_02
yarn-ats-1:45454
   


{code}

> Nodemanager Web Service should return logAggregationType for each file
> --
>
> Key: YARN-9609
> URL: https://issues.apache.org/jira/browse/YARN-9609
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Prabhu Joseph
>Priority: Critical
>
> Steps:
> 1) Launch sleeper yarn service
> 2) When sleeper component is in READY state, call NM web service to list the 
> container files and its log aggregation status.
> http://NMHost:NMPort/ws/v1/node/containers/CONTAINERID/logs
> NM web service response shows a common log aggregation type response for all 
> files.
> Instead, NM web service should return a log aggregation type for each file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9471) Cleanup in TestLogAggregationIndexFileController

2019-06-11 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860628#comment-16860628
 ] 

Adam Antal commented on YARN-9471:
--

Thanks for the review [~snemeth], and [~jojochuang] for the commit!

> Cleanup in TestLogAggregationIndexFileController
> 
>
> Key: YARN-9471
> URL: https://issues.apache.org/jira/browse/YARN-9471
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9471.001.patch, YARN-9471.002.patch
>
>
> {{TestLogAggregationIndexFileController}} class can be cleaned up a bit:
> - bad javadoc link
> - should be renamed to TestLogAggregationIndex *ed* FileController
> - some private class members can be removed
> - static fields from Assert can be imported
> - {{StringBuilder}} can be removed from {{logMessage}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9608) DecommissioningNodesWatcher should get lists of running applications on node from RMNode.

2019-06-11 Thread jialei weng (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860614#comment-16860614
 ] 

jialei weng commented on YARN-9608:
---

Thanks, [~abmodi]. Good to learn more.

> DecommissioningNodesWatcher should get lists of running applications on node 
> from RMNode.
> -
>
> Key: YARN-9608
> URL: https://issues.apache.org/jira/browse/YARN-9608
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9608.001.patch
>
>
> At present, DecommissioningNodesWatcher tracks list of running applications 
> and triggers decommission of nodes when all the applications that ran on the 
> node completes. This Jira proposes to solve following problem:
>  # DecommissioningNodesWatcher skips tracking application containers on a 
> particular node before the node is in DECOMMISSIONING state. It only tracks 
> containers once the node is in DECOMMISSIONING state. This can lead to 
> shuffle data loss of apps whose containers ran on this node before it was 
> moved to decommissioning state.
>  # It is keeping track of running apps. We can leverage this directly from 
> RMNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org