Re: The auxService:mapreduce_shuffle does not exist

2020-09-10 Thread bilwa st
Hi Sitakant,

Have you added below configuration in your setup?


   yarn.nodemanager.aux-services.mapreduce_shuffle.class
   org.apache.hadoop.mapred.ShuffleHandler
 

You can refer this link
https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html

Thanks,
Bilwa

On Thu, Sep 10, 2020, 10:08 PM Sitakant Mishra 
wrote:

> Hi,
>
> I have been struggling with the issue above for some time now. I have added
> all the mailing lists because of my desperation. Any help or suggestions
> would be greatly appreciated.
>
> Thanks and Regards,
> Sitakanta Mishra
>
> On Wed, Sep 9, 2020 at 4:24 PM Sitakant Mishra  >
> wrote:
>
> > Hi,
> >
> > I have set up a new Hadoop cluster with hadoop version 3.3.0. I have a 5
> > node setup where the namenode and hive run on one server, yarn and
> > secondary namenode on the second server and the last three nodes are only
> > datanodes. The cluster is up and running. However, when I run an example
> > wordcount map-reduce job, it throws the following exception.
> >
> > 2020-09-09 21:35:06,656 INFO mapreduce.Job: Task Id :
> > attempt_1599672538759_0004_m_00_2, Status : FAILED
> > Container launch failed for container_1599672538759_0004_01_04 :
> > org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The
> > auxService:mapreduce_shuffle does not exist
> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > Method)
> > at
> >
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:83)
> > at
> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:57)
> > at
> java.lang.reflect.Constructor.newInstance(Constructor.java:437)
> > at
> >
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateExceptionImpl(SerializedExceptionPBImpl.java:171)
> > at
> >
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:182)
> > at
> >
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
> > at
> >
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:163)
> > at
> >
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:394)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> > at java.lang.Thread.run(Thread.java:820)
> >
> >
> > After digging the web, I get a lot of suggestions for adding properties.
> I
> > did follow all the suggestions from the official hadoop documentation and
> > other links. Right now, I am using the following property in
> yarn-site.xml
> > in all the nodes and restarted dfs/yarn.
> >
> > 
> > yarn.nodemanager.aux-services
> > mapreduce_shuffle
> > 
> >
> > It works once in 10 times, but most of the time, the mapper fails. I have
> > no clue on how to fix this and I am badly stuck with this problem. Any
> help
> > is greatly appreciated. I am looking forward to getting some help.
> >
> > Thanks and Regards,
> > Sitakanta Mishra
> >
> >
> >
>


[jira] [Updated] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-09-10 Thread Yuanbo Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanbo Liu updated YARN-10393:
--
Attachment: YARN-10393.draft.patch

> MR job live lock caused by completed state container leak in heartbeat 
> between node manager and RM
> --
>
> Key: YARN-10393
> URL: https://issues.apache.org/jira/browse/YARN-10393
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.6.1, 2.7.2, 2.6.2, 3.0.0, 2.9.2, 3.3.0, 3.2.1, 3.1.3, 
> 3.4.0
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
> Attachments: YARN-10393.draft.patch
>
>
> This was a bug we had seen multiple times on Hadoop 2.6.2. And the following 
> analysis is based on the core dump, logs, and code in 2017 with Hadoop 2.6.2. 
> We hadn't seen it after 2.9 in our env. However, it was because of the RPC 
> retry policy change and other changes. There's still a possibility even with 
> the current code if I didn't miss anything.
> *High-level description:*
>  We had seen a starving mapper issue several times. The MR job stuck in a 
> live lock state and couldn't make any progress. The queue is full so the 
> pending mapper can’t get any resource to continue, and the application master 
> failed to preempt the reducer, thus causing the job to be stuck. The reason 
> why the application master didn’t preempt the reducer was that there was a 
> leaked container in assigned mappers. The node manager failed to report the 
> completed container to the resource manager.
> *Detailed steps:*
>  
>  # Container_1501226097332_249991_01_000199 was assigned to 
> attempt_1501226097332_249991_m_95_0 on 2017-08-08 16:00:00,417.
> {code:java}
> appmaster.log:6464:2017-08-08 16:00:00,417 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned 
> container container_1501226097332_249991_01_000199 to 
> attempt_1501226097332_249991_m_95_0
> {code}
>  # The container finished on 2017-08-08 16:02:53,313.
> {code:java}
> yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1501226097332_249991_01_000199 transitioned from RUNNING 
> to EXITED_WITH_SUCCESS
> yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_1501226097332_249991_01_000199
> {code}
>  # The NodeStatusUpdater go an exception in the heartbeat on 2017-08-08 
> 16:07:04,238. In fact, the heartbeat request is actually handled by resource 
> manager, however, the node manager failed to receive the response. Let’s 
> assume the heartBeatResponseId=$hid in node manager. According to our current 
> configuration, next heartbeat will be 10s later.
> {code:java}
> 2017-08-08 16:07:04,238 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught 
> exception in status-updater
> java.io.IOException: Failed on local exception: java.io.IOException: 
> Connection reset by peer; Host Details : local host is: ; destination host 
> is: XXX
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
> at org.apache.hadoop.ipc.Client.call(Client.java:1472)
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy33.nodeHeartbeat(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
> at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy34.nodeHeartbeat(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:597)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> at 

[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-09-10 Thread Yuanbo Liu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193975#comment-17193975
 ] 

Yuanbo Liu commented on YARN-10393:
---

Also, we should avoid adding new container to the same heartBeat id as 
[~wzzdreamer] has clarified in the descrption. Resending old containers is not 
avoidable and changing protocol is not a good idea, so we could use  
pendingCompletedContainers to fix it.
I've attached a draft patch for this issue so that we can speed up and conclude 
our ideas. [~wzzdreamer] feel free to attach a new pr if you have it.

[~wzzdreamer]  [~Jim_Brennan] [~adam.antal] 
Any comment will be welcome. 

> MR job live lock caused by completed state container leak in heartbeat 
> between node manager and RM
> --
>
> Key: YARN-10393
> URL: https://issues.apache.org/jira/browse/YARN-10393
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.6.1, 2.7.2, 2.6.2, 3.0.0, 2.9.2, 3.3.0, 3.2.1, 3.1.3, 
> 3.4.0
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
>
> This was a bug we had seen multiple times on Hadoop 2.6.2. And the following 
> analysis is based on the core dump, logs, and code in 2017 with Hadoop 2.6.2. 
> We hadn't seen it after 2.9 in our env. However, it was because of the RPC 
> retry policy change and other changes. There's still a possibility even with 
> the current code if I didn't miss anything.
> *High-level description:*
>  We had seen a starving mapper issue several times. The MR job stuck in a 
> live lock state and couldn't make any progress. The queue is full so the 
> pending mapper can’t get any resource to continue, and the application master 
> failed to preempt the reducer, thus causing the job to be stuck. The reason 
> why the application master didn’t preempt the reducer was that there was a 
> leaked container in assigned mappers. The node manager failed to report the 
> completed container to the resource manager.
> *Detailed steps:*
>  
>  # Container_1501226097332_249991_01_000199 was assigned to 
> attempt_1501226097332_249991_m_95_0 on 2017-08-08 16:00:00,417.
> {code:java}
> appmaster.log:6464:2017-08-08 16:00:00,417 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned 
> container container_1501226097332_249991_01_000199 to 
> attempt_1501226097332_249991_m_95_0
> {code}
>  # The container finished on 2017-08-08 16:02:53,313.
> {code:java}
> yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1501226097332_249991_01_000199 transitioned from RUNNING 
> to EXITED_WITH_SUCCESS
> yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_1501226097332_249991_01_000199
> {code}
>  # The NodeStatusUpdater go an exception in the heartbeat on 2017-08-08 
> 16:07:04,238. In fact, the heartbeat request is actually handled by resource 
> manager, however, the node manager failed to receive the response. Let’s 
> assume the heartBeatResponseId=$hid in node manager. According to our current 
> configuration, next heartbeat will be 10s later.
> {code:java}
> 2017-08-08 16:07:04,238 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught 
> exception in status-updater
> java.io.IOException: Failed on local exception: java.io.IOException: 
> Connection reset by peer; Host Details : local host is: ; destination host 
> is: XXX
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
> at org.apache.hadoop.ipc.Client.call(Client.java:1472)
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy33.nodeHeartbeat(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
> at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy34.nodeHeartbeat(Unknown Source)
> at 
> 

[jira] [Commented] (YARN-10390) LeafQueue: retain user limits cache across assignContainers() calls

2020-09-10 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193963#comment-17193963
 ] 

Hadoop QA commented on YARN-10390:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
59s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} branch-3.1 passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
41s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
37s{color} | {color:green} branch-3.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 37s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 31 new + 533 unchanged - 10 fixed = 564 total (was 543) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 74m 
17s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}141m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/172/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10390 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13011367/YARN-10390-branch-3.1.002.patch
 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux b0ff87d5160a 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | branch-3.1 / daff39a |
| Default Java | Private Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 |
| checkstyle | 

[jira] [Updated] (YARN-10390) LeafQueue: retain user limits cache across assignContainers() calls

2020-09-10 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-10390:
--
Attachment: YARN-10390-branch-3.1.002.patch

> LeafQueue: retain user limits cache across assignContainers() calls
> ---
>
> Key: YARN-10390
> URL: https://issues.apache.org/jira/browse/YARN-10390
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, capacityscheduler
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: YARN-10390-branch-3.1.002.patch, 
> YARN-10390-branch-3.2.002.patch, YARN-10390.002.patch, user limit caching 
> profile.pdf
>
>
> Currently, user limits are cached locally in leafQueue.assignContainers call 
> to avoid repeating some steps. This cache can be retained across the calls.
> Will put up a PR soon. Profiling was done using the proposed changes in 
> TestCapacitySchedulerPerf.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10390) LeafQueue: retain user limits cache across assignContainers() calls

2020-09-10 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193888#comment-17193888
 ] 

Hadoop QA commented on YARN-10390:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
30s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} branch-3.2 passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
34s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
31s{color} | {color:green} branch-3.2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 37s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 31 new + 536 unchanged - 10 fixed = 567 total (was 546) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 37s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}153m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2 |
|   | hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy 
|
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.TestZKConfigurationStore
 |
|   | 
hadoop.yarn.server.resourcemanager.metrics.TestCombinedSystemMetricsPublisher |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.TestFSSchedulerConfigurationStore
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/171/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10390 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13011361/YARN-10390-branch-3.2.002.patch
 |
| 

[jira] [Updated] (YARN-8210) AMRMClient logging on every heartbeat to track updation of AM RM token causes too many log lines to be generated in AM logs

2020-09-10 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8210:

Fix Version/s: (was: 2.10.2)
   2.10.1

> AMRMClient logging on every heartbeat to track updation of AM RM token causes 
> too many log lines to be generated in AM logs
> ---
>
> Key: YARN-8210
> URL: https://issues.apache.org/jira/browse/YARN-8210
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.3, 2.10.1
>
> Attachments: YARN-8210.1.patch
>
>
> YARN-4682 added logs to track when AM RM token is updated for debuggability 
> purposes. However this is printed on every heartbeat and could cause the AM 
> logs to be flooded with this whenever RM's Master key is rolled over 
> especially if its a long running AM. Hence proposing to remove this log line. 
> As explained in 
> https://issues.apache.org/jira/browse/YARN-3104?focusedCommentId=14298692=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14298692
>  , the AM-RM connection  is not re-established so the updated token in the 
> client's UGI is never re-sent to the RPC server andRM continues to send the 
> token each heartbeat since it  cannot be sure whether the client really has 
> the new token. Hence the log lines are printed on every heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8210) AMRMClient logging on every heartbeat to track updation of AM RM token causes too many log lines to be generated in AM logs

2020-09-10 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8210:

Fix Version/s: 2.10.2

Pushed to branch-2.10.

> AMRMClient logging on every heartbeat to track updation of AM RM token causes 
> too many log lines to be generated in AM logs
> ---
>
> Key: YARN-8210
> URL: https://issues.apache.org/jira/browse/YARN-8210
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.3, 2.10.2
>
> Attachments: YARN-8210.1.patch
>
>
> YARN-4682 added logs to track when AM RM token is updated for debuggability 
> purposes. However this is printed on every heartbeat and could cause the AM 
> logs to be flooded with this whenever RM's Master key is rolled over 
> especially if its a long running AM. Hence proposing to remove this log line. 
> As explained in 
> https://issues.apache.org/jira/browse/YARN-3104?focusedCommentId=14298692=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14298692
>  , the AM-RM connection  is not re-established so the updated token in the 
> client's UGI is never re-sent to the RPC server andRM continues to send the 
> token each heartbeat since it  cannot be sure whether the client really has 
> the new token. Hence the log lines are printed on every heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10390) LeafQueue: retain user limits cache across assignContainers() calls

2020-09-10 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193841#comment-17193841
 ] 

Eric Payne commented on YARN-10390:
---

Uploaded branch-3.2 patch because of merge conflicts in unit tests.

> LeafQueue: retain user limits cache across assignContainers() calls
> ---
>
> Key: YARN-10390
> URL: https://issues.apache.org/jira/browse/YARN-10390
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, capacityscheduler
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: YARN-10390-branch-3.2.002.patch, YARN-10390.002.patch, 
> user limit caching profile.pdf
>
>
> Currently, user limits are cached locally in leafQueue.assignContainers call 
> to avoid repeating some steps. This cache can be retained across the calls.
> Will put up a PR soon. Profiling was done using the proposed changes in 
> TestCapacitySchedulerPerf.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10390) LeafQueue: retain user limits cache across assignContainers() calls

2020-09-10 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-10390:
--
Attachment: YARN-10390-branch-3.2.002.patch

> LeafQueue: retain user limits cache across assignContainers() calls
> ---
>
> Key: YARN-10390
> URL: https://issues.apache.org/jira/browse/YARN-10390
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, capacityscheduler
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: YARN-10390-branch-3.2.002.patch, YARN-10390.002.patch, 
> user limit caching profile.pdf
>
>
> Currently, user limits are cached locally in leafQueue.assignContainers call 
> to avoid repeating some steps. This cache can be retained across the calls.
> Will put up a PR soon. Profiling was done using the proposed changes in 
> TestCapacitySchedulerPerf.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10390) LeafQueue: retain user limits cache across assignContainers() calls

2020-09-10 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193833#comment-17193833
 ] 

Eric Payne commented on YARN-10390:
---

Unit test failure for {{TestFairSchedulerPreemption}} is the same as YARN-9333.

> LeafQueue: retain user limits cache across assignContainers() calls
> ---
>
> Key: YARN-10390
> URL: https://issues.apache.org/jira/browse/YARN-10390
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, capacityscheduler
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: YARN-10390.002.patch, user limit caching profile.pdf
>
>
> Currently, user limits are cached locally in leafQueue.assignContainers call 
> to avoid repeating some steps. This cache can be retained across the calls.
> Will put up a PR soon. Profiling was done using the proposed changes in 
> TestCapacitySchedulerPerf.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10390) LeafQueue: retain user limits cache across assignContainers() calls

2020-09-10 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193817#comment-17193817
 ] 

Hadoop QA commented on YARN-10390:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
58s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 31 new + 533 unchanged - 10 fixed = 564 total (was 543) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 50s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} 

[jira] [Commented] (YARN-9333) TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes fails intermittent

2020-09-10 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193762#comment-17193762
 ] 

Hadoop QA commented on YARN-9333:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
50s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 17s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License 

[jira] [Commented] (YARN-10421) Create YarnDiagnosticsService to serve diagnostic queries

2020-09-10 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193754#comment-17193754
 ] 

Hadoop QA commented on YARN-10421:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
31s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
52s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
49s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
35s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
48s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
55s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
55s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 58s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 2 new + 
16 unchanged - 0 fixed = 18 total (was 16) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 1s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
14s{color} | {color:green} The patch generated 0 new + 104 unchanged - 132 
fixed = 104 total (was 236) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 15m 
41s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed with JDK 

[jira] [Commented] (YARN-10390) LeafQueue: retain user limits cache across assignContainers() calls

2020-09-10 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193731#comment-17193731
 ] 

Eric Payne commented on YARN-10390:
---

Uploaded patch to kick pre-commit build so we can have a record in this JIRA.

> LeafQueue: retain user limits cache across assignContainers() calls
> ---
>
> Key: YARN-10390
> URL: https://issues.apache.org/jira/browse/YARN-10390
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, capacityscheduler
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: YARN-10390.002.patch, user limit caching profile.pdf
>
>
> Currently, user limits are cached locally in leafQueue.assignContainers call 
> to avoid repeating some steps. This cache can be retained across the calls.
> Will put up a PR soon. Profiling was done using the proposed changes in 
> TestCapacitySchedulerPerf.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10390) LeafQueue: retain user limits cache across assignContainers() calls

2020-09-10 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-10390:
--
Attachment: YARN-10390.002.patch

> LeafQueue: retain user limits cache across assignContainers() calls
> ---
>
> Key: YARN-10390
> URL: https://issues.apache.org/jira/browse/YARN-10390
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, capacityscheduler
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: YARN-10390.002.patch, user limit caching profile.pdf
>
>
> Currently, user limits are cached locally in leafQueue.assignContainers call 
> to avoid repeating some steps. This cache can be retained across the calls.
> Will put up a PR soon. Profiling was done using the proposed changes in 
> TestCapacitySchedulerPerf.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10435) Upgrade ember 2.x to ember 3.20

2020-09-10 Thread Akhil PB (Jira)
Akhil PB created YARN-10435:
---

 Summary: Upgrade ember 2.x to ember 3.20
 Key: YARN-10435
 URL: https://issues.apache.org/jira/browse/YARN-10435
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn-ui-v2
Reporter: Akhil PB
Assignee: Akhil PB


This is the full rewrite of the current YARN UI2 framework ember 2.x to ember 
3.20. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10368) Log aggregation reset to NOT_START after RM restart.

2020-09-10 Thread Anuj (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193663#comment-17193663
 ] 

Anuj commented on YARN-10368:
-

[~Amithsha] we have modified yarn code to ignore the log aggregation status 
while removing an completed app.

> Log aggregation reset to NOT_START after RM restart.
> 
>
> Key: YARN-10368
> URL: https://issues.apache.org/jira/browse/YARN-10368
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager, yarn
>Affects Versions: 3.2.1
>Reporter: Anuj
>Priority: Major
> Attachments: Screenshot 2020-07-27 at 2.35.15 PM.png
>
>
> Attempt recovered after RM restart the log aggregation status is not 
> preserved and it come to NOT_START.
> From NOT_START it never moves to TIMED_OUT and then never cleaned up RM App 
> in memory resulting max-completed-app in memory limit hit and RM stops 
> accepting new apps.
> https://issues.apache.org/jira/browse/YARN-7952



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9333) TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes fails intermittent

2020-09-10 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193660#comment-17193660
 ] 

Peter Bacsko commented on YARN-9333:


OK, from what I understand , {{SchedulerUpdateThread}} sometimes interferes 
with the testcase - the last runs are very telling, the test in question didn't 
fail. I can't see the root cause due to the huge number of updates by default 
which makes Jenkins truncate the output. Anyway, I'll bump the update interval 
to 60 seconds, that should do the job. The tests still pass.

> TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes
>  fails intermittent
> --
>
> Key: YARN-9333
> URL: https://issues.apache.org/jira/browse/YARN-9333
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9333-001.patch, YARN-9333-002.patch, 
> YARN-9333-003.patch, YARN-9333-debug1.patch
>
>
> TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes
>  fails intermittent - observed in YARN-9311.
> {code}
> [ERROR] 
> testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes[MinSharePreemptionWithDRF](org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption)
>   Time elapsed: 11.056 s  <<< FAILURE!
> java.lang.AssertionError: Incorrect # of containers on the greedy app 
> expected:<6> but was:<4>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption.verifyPreemption(TestFairSchedulerPreemption.java:296)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption.verifyRelaxLocalityPreemption(TestFairSchedulerPreemption.java:537)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes(TestFairSchedulerPreemption.java:473)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   

[jira] [Updated] (YARN-9333) TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes fails intermittent

2020-09-10 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9333:
---
Attachment: YARN-9333-003.patch

> TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes
>  fails intermittent
> --
>
> Key: YARN-9333
> URL: https://issues.apache.org/jira/browse/YARN-9333
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9333-001.patch, YARN-9333-002.patch, 
> YARN-9333-003.patch, YARN-9333-debug1.patch
>
>
> TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes
>  fails intermittent - observed in YARN-9311.
> {code}
> [ERROR] 
> testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes[MinSharePreemptionWithDRF](org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption)
>   Time elapsed: 11.056 s  <<< FAILURE!
> java.lang.AssertionError: Incorrect # of containers on the greedy app 
> expected:<6> but was:<4>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption.verifyPreemption(TestFairSchedulerPreemption.java:296)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption.verifyRelaxLocalityPreemption(TestFairSchedulerPreemption.java:537)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes(TestFairSchedulerPreemption.java:473)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> 

[jira] [Commented] (YARN-9333) TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes fails intermittent

2020-09-10 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193655#comment-17193655
 ] 

Hadoop QA commented on YARN-9333:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
56s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 24s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
58s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 6 new + 85 unchanged - 0 fixed = 91 total (was 85) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}106m 37s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| 

[jira] [Commented] (YARN-9333) TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes fails intermittent

2020-09-10 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193646#comment-17193646
 ] 

Hadoop QA commented on YARN-9333:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
54s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 33s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 6 new + 85 unchanged - 0 fixed = 91 total (was 85) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 29s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| 

[jira] [Updated] (YARN-10421) Create YarnDiagnosticsService to serve diagnostic queries

2020-09-10 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-10421:
-
Description: 
YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet 
forks a separate process, which executes a shell/Python/etc script. Based on 
the use-cases listed below the script collects information, bundles it and 
sends it to UI2. The diagnostic options are the following:
 # Application hanging: 
 ** Application logs
 ** Find the hanging container and get multiple Jstacks
 ** ResourceManager logs during job lifecycle
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL
 # Application failed: 
 ** Application logs
 ** ResourceManager logs during job lifecycle.
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job Configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL.
 ** Job related metrics like container, attempts.
 # Scheduler related issue:
 ** ResourceManager Scheduler logs with DEBUG enabled for 2 minutes.
 ** Multiple Jstacks of ResourceManager
 ** YARN and Scheduler Configuration
 ** Cluster Scheduler API _/ws/v1/cluster/scheduler_ and Cluster Nodes API 
_/ws/v1/cluster/nodes response_
 ** Scheduler Activities _/ws/v1/cluster/scheduler/bulkactivities_ response 
(YARN-10319)
 # ResourceManager / NodeManager daemon fails to start:
 ** ResourceManager and NodeManager out and log file
 ** YARN and Scheduler Configuration

Two new endpoints should be added to the RM web service: one for listing the 
available diagnostic options (_/common-issue/list_), and one for calling a 
selected option with the user provided parameters (_/common-issue/collect_). 
The service should be transparent to the script changes to help with the 
(on-the-fly) extensibility of the diagnostic tool. To split the changes to 
smaller chunks the implementation behind _collect_ endpoint is to be provided 
in YARN-10433.

  was:
YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet 
forks a separate process, which executes a shell/Python/etc script. Based on 
the use-cases listed below the script collects information, bundles it and 
sends it to UI2. The diagnostic cases are the following:
 # Application hanging: 
 ** Application logs
 ** Find the hanging container and get multiple Jstacks
 ** ResourceManager logs during job lifecycle
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL
 # Application failed: 
 ** Application logs
 ** ResourceManager logs during job lifecycle.
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job Configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL.
 ** Job related metrics like container, attempts.
 # Scheduler related issue:
 ** ResourceManager Scheduler logs with DEBUG enabled for 2 minutes.
 ** Multiple Jstacks of ResourceManager
 ** YARN and Scheduler Configuration
 ** Cluster Scheduler API _/ws/v1/cluster/scheduler_ and Cluster Nodes API 
_/ws/v1/cluster/nodes response_
 ** Scheduler Activities _/ws/v1/cluster/scheduler/bulkactivities_ response 
(YARN-10319)
 # ResourceManager / NodeManager daemon fails to start:
 ** ResourceManager and NodeManager out and log file
 ** YARN and Scheduler Configuration

To ease the load on the RM, the servlet should allow only one HTTP request at a 
time. If a new request comes in while serving another an appropriate response 
code should be returned, with the message "Diagnostics Collection in Progress”. 
The servlet should list the possible diagnostic cases to the UI. The cases will 
be implemented in the script. The servlet should be transparent to the script 
changes to help with the (on-the-fly) extensibility of the diagnostic tool. 

 

The diag bundle can become large in size, so a threshold functionality should 
be added. If the bundle's size exceeds the threshold the bundle will be stored 
in a local folder on the host of the RM, and the path will be returned.


> Create YarnDiagnosticsService to serve diagnostic queries 
> --
>
> Key: YARN-10421
> URL: https://issues.apache.org/jira/browse/YARN-10421
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10421.001.patch
>
>
> YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet 
> forks a separate process, which executes a shell/Python/etc script. Based on 
> the use-cases listed below the script collects information, bundles it and 
> 

[jira] [Updated] (YARN-10421) Create YarnDiagnosticsService to serve diagnostic queries

2020-09-10 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-10421:
-
Summary: Create YarnDiagnosticsService to serve diagnostic queries   (was: 
Create YarnDiagnosticsServlet to serve diagnostic queries )

> Create YarnDiagnosticsService to serve diagnostic queries 
> --
>
> Key: YARN-10421
> URL: https://issues.apache.org/jira/browse/YARN-10421
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10421.001.patch
>
>
> YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet 
> forks a separate process, which executes a shell/Python/etc script. Based on 
> the use-cases listed below the script collects information, bundles it and 
> sends it to UI2. The diagnostic cases are the following:
>  # Application hanging: 
>  ** Application logs
>  ** Find the hanging container and get multiple Jstacks
>  ** ResourceManager logs during job lifecycle
>  ** NodeManager logs from NodeManager where the hanging containers of the 
> jobs ran
>  ** Job configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
> History URL
>  # Application failed: 
>  ** Application logs
>  ** ResourceManager logs during job lifecycle.
>  ** NodeManager logs from NodeManager where the hanging containers of the 
> jobs ran
>  ** Job Configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
> History URL.
>  ** Job related metrics like container, attempts.
>  # Scheduler related issue:
>  ** ResourceManager Scheduler logs with DEBUG enabled for 2 minutes.
>  ** Multiple Jstacks of ResourceManager
>  ** YARN and Scheduler Configuration
>  ** Cluster Scheduler API _/ws/v1/cluster/scheduler_ and Cluster Nodes API 
> _/ws/v1/cluster/nodes response_
>  ** Scheduler Activities _/ws/v1/cluster/scheduler/bulkactivities_ response 
> (YARN-10319)
>  # ResourceManager / NodeManager daemon fails to start:
>  ** ResourceManager and NodeManager out and log file
>  ** YARN and Scheduler Configuration
> To ease the load on the RM, the servlet should allow only one HTTP request at 
> a time. If a new request comes in while serving another an appropriate 
> response code should be returned, with the message "Diagnostics Collection in 
> Progress”. The servlet should list the possible diagnostic cases to the UI. 
> The cases will be implemented in the script. The servlet should be 
> transparent to the script changes to help with the (on-the-fly) extensibility 
> of the diagnostic tool. 
>  
> The diag bundle can become large in size, so a threshold functionality should 
> be added. If the bundle's size exceeds the threshold the bundle will be 
> stored in a local folder on the host of the RM, and the path will be returned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10434) Create extensive unittests under RMWebServices for the diagnostic functionality

2020-09-10 Thread Benjamin Teke (Jira)
Benjamin Teke created YARN-10434:


 Summary: Create extensive unittests under RMWebServices for the 
diagnostic functionality
 Key: YARN-10434
 URL: https://issues.apache.org/jira/browse/YARN-10434
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Benjamin Teke
Assignee: Benjamin Teke


After the functionality and form is finalized new unittest class(es) should be 
added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9333) TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes fails intermittent

2020-09-10 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193633#comment-17193633
 ] 

Hadoop QA commented on YARN-9333:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
49s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 33s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 6 new + 85 unchanged - 0 fixed = 91 total (was 85) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 32s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| 

[jira] [Created] (YARN-10433) Extend the DiagnosticService to initiate the diagnostic bundle collection

2020-09-10 Thread Benjamin Teke (Jira)
Benjamin Teke created YARN-10433:


 Summary: Extend the DiagnosticService to initiate the diagnostic 
bundle collection
 Key: YARN-10433
 URL: https://issues.apache.org/jira/browse/YARN-10433
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Benjamin Teke
Assignee: Benjamin Teke


YARN-10421 introduces the new DiagnosticService class, the two new endpoints 
for listing the available actions and starting the diagnostic script collect 
method, and a basic diagnostic collector script. After the scripts form is 
finalized (YARN-10422) the DiagnosticService should be extended to spawn the 
requested collection method based on the input parameters and return the 
collected bundle as a response.

To ease the load on the RM, the servlet should allow only one HTTP request at a 
time. If a new request comes in while serving another an appropriate response 
code should be returned, with the message "Diagnostics Collection in Progress”. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10432) Introduce the necessary configuration entries to YARN

2020-09-10 Thread Benjamin Teke (Jira)
Benjamin Teke created YARN-10432:


 Summary: Introduce the necessary configuration entries to YARN
 Key: YARN-10432
 URL: https://issues.apache.org/jira/browse/YARN-10432
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Benjamin Teke
Assignee: Benjamin Teke


The following two requirements need configuration entries:
 * The location of the diagnostic script and its output folder (where the 
bundle will be saved) on the RM's filesystem should be configurable - _script 
location; output folder_
 * The bundle can become large in size, so a configurable threshold 
functionality should be added. If the bundle's size exceeds the threshold the 
bundle will be stored in a local folder on the host of the RM, and the path 
will be returned  - _bundle forwarding size threshold_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10421) Create YarnDiagnosticsServlet to serve diagnostic queries

2020-09-10 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-10421:
-
Attachment: YARN-10421.001.patch

> Create YarnDiagnosticsServlet to serve diagnostic queries 
> --
>
> Key: YARN-10421
> URL: https://issues.apache.org/jira/browse/YARN-10421
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10421.001.patch
>
>
> YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet 
> forks a separate process, which executes a shell/Python/etc script. Based on 
> the use-cases listed below the script collects information, bundles it and 
> sends it to UI2. The diagnostic cases are the following:
>  # Application hanging: 
>  ** Application logs
>  ** Find the hanging container and get multiple Jstacks
>  ** ResourceManager logs during job lifecycle
>  ** NodeManager logs from NodeManager where the hanging containers of the 
> jobs ran
>  ** Job configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
> History URL
>  # Application failed: 
>  ** Application logs
>  ** ResourceManager logs during job lifecycle.
>  ** NodeManager logs from NodeManager where the hanging containers of the 
> jobs ran
>  ** Job Configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
> History URL.
>  ** Job related metrics like container, attempts.
>  # Scheduler related issue:
>  ** ResourceManager Scheduler logs with DEBUG enabled for 2 minutes.
>  ** Multiple Jstacks of ResourceManager
>  ** YARN and Scheduler Configuration
>  ** Cluster Scheduler API _/ws/v1/cluster/scheduler_ and Cluster Nodes API 
> _/ws/v1/cluster/nodes response_
>  ** Scheduler Activities _/ws/v1/cluster/scheduler/bulkactivities_ response 
> (YARN-10319)
>  # ResourceManager / NodeManager daemon fails to start:
>  ** ResourceManager and NodeManager out and log file
>  ** YARN and Scheduler Configuration
> To ease the load on the RM, the servlet should allow only one HTTP request at 
> a time. If a new request comes in while serving another an appropriate 
> response code should be returned, with the message "Diagnostics Collection in 
> Progress”. The servlet should list the possible diagnostic cases to the UI. 
> The cases will be implemented in the script. The servlet should be 
> transparent to the script changes to help with the (on-the-fly) extensibility 
> of the diagnostic tool. 
>  
> The diag bundle can become large in size, so a threshold functionality should 
> be added. If the bundle's size exceeds the threshold the bundle will be 
> stored in a local folder on the host of the RM, and the path will be returned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9333) TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes fails intermittent

2020-09-10 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193545#comment-17193545
 ] 

Hadoop QA commented on YARN-9333:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
47s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
45s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 32s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 6 new + 85 unchanged - 0 fixed = 91 total (was 85) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 93m 
21s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. 

[jira] [Commented] (YARN-9333) TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes fails intermittent

2020-09-10 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193456#comment-17193456
 ] 

Peter Bacsko commented on YARN-9333:


Ehhh, guess it didn't work :( I'm uploading a debug patch to see what happens 
(too much output is coming from SchedulerUpdateThread).

> TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes
>  fails intermittent
> --
>
> Key: YARN-9333
> URL: https://issues.apache.org/jira/browse/YARN-9333
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9333-001.patch, YARN-9333-002.patch, 
> YARN-9333-debug1.patch
>
>
> TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes
>  fails intermittent - observed in YARN-9311.
> {code}
> [ERROR] 
> testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes[MinSharePreemptionWithDRF](org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption)
>   Time elapsed: 11.056 s  <<< FAILURE!
> java.lang.AssertionError: Incorrect # of containers on the greedy app 
> expected:<6> but was:<4>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption.verifyPreemption(TestFairSchedulerPreemption.java:296)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption.verifyRelaxLocalityPreemption(TestFairSchedulerPreemption.java:537)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes(TestFairSchedulerPreemption.java:473)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> 

[jira] [Updated] (YARN-9333) TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes fails intermittent

2020-09-10 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9333:
---
Attachment: YARN-9333-debug1.patch

> TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes
>  fails intermittent
> --
>
> Key: YARN-9333
> URL: https://issues.apache.org/jira/browse/YARN-9333
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9333-001.patch, YARN-9333-002.patch, 
> YARN-9333-debug1.patch
>
>
> TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes
>  fails intermittent - observed in YARN-9311.
> {code}
> [ERROR] 
> testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes[MinSharePreemptionWithDRF](org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption)
>   Time elapsed: 11.056 s  <<< FAILURE!
> java.lang.AssertionError: Incorrect # of containers on the greedy app 
> expected:<6> but was:<4>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption.verifyPreemption(TestFairSchedulerPreemption.java:296)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption.verifyRelaxLocalityPreemption(TestFairSchedulerPreemption.java:537)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes(TestFairSchedulerPreemption.java:473)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> [ERROR] 
> 

[jira] [Created] (YARN-10431) [Umbrella] Job group management

2020-09-10 Thread jialei weng (Jira)
jialei weng created YARN-10431:
--

 Summary: [Umbrella] Job group management
 Key: YARN-10431
 URL: https://issues.apache.org/jira/browse/YARN-10431
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.9.2
Reporter: jialei weng


In current yarn job management, we don't have an efficient mechanism to manage 
several jobs together. For example, one batch job may trigger several sub-jobs 
to running at the same time, like one job to process the data and another one 
monitor job metrics. And when we want to cancel these jobs, we have to kill 
them one by one in current design. I proposal a job group concept to handle 
such parent-child jobs as one unit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org