date:20210419

[jira] [Created] (YARN-10745) Change Log level from info to debug for few logs and remove unnecessary debuglog checks

2021-04-19 Thread D M Murali Krishna Reddy (Jira)

D M Murali Krishna Reddy created YARN-10745:
---

 Summary: Change Log level from info to debug for few logs and 
remove unnecessary debuglog checks
 Key: YARN-10745
 URL: https://issues.apache.org/jira/browse/YARN-10745
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: D M Murali Krishna Reddy
Assignee: D M Murali Krishna Reddy


Change the info log level to debug for few logs so that the load on the logger 
decreases in large cluster and improves the performance.

Remove the unnecessary isDebugEnabled() checks for printing strings without any 
string concatenation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10715) Remove hardcoded resource values (e.g. GPU/FPGA) in code.

2021-04-19 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325442#comment-17325442
 ] 

Qi Zhu commented on YARN-10715:
---

Thanks [~ebadger] for reply.

It make sense to me. I will close it now. :)

> Remove hardcoded resource values (e.g. GPU/FPGA) in code.
> -
>
> Key: YARN-10715
> URL: https://issues.apache.org/jira/browse/YARN-10715
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10715.001.patch
>
>
> https://issues.apache.org/jira/browse/YARN-10503?focusedCommentId=17307772=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17307772
> As above comment , we should remove hardcoded resource values (e.g. GPU/FPGA) 
> in code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10723) Change CS nodes page in UI to support custom resource.

2021-04-19 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325438#comment-17325438
 ] 

Qi Zhu commented on YARN-10723:
---

[~ebadger] Sure, i uploaded 005 patch to trigger jenkins.

> Change CS nodes page in UI to support custom resource.
> --
>
> Key: YARN-10723
> URL: https://issues.apache.org/jira/browse/YARN-10723
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10723.001.patch, YARN-10723.002.patch, 
> YARN-10723.003.patch, YARN-10723.004.patch, YARN-10723.005.patch, 
> image-2021-04-06-17-22-32-733.png
>
>
> Node page now only support gpu for custom resource.
> We should make this supported for all custom resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10723) Change CS nodes page in UI to support custom resource.

2021-04-19 Thread Qi Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10723:
--
Attachment: YARN-10723.005.patch

> Change CS nodes page in UI to support custom resource.
> --
>
> Key: YARN-10723
> URL: https://issues.apache.org/jira/browse/YARN-10723
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10723.001.patch, YARN-10723.002.patch, 
> YARN-10723.003.patch, YARN-10723.004.patch, YARN-10723.005.patch, 
> image-2021-04-06-17-22-32-733.png
>
>
> Node page now only support gpu for custom resource.
> We should make this supported for all custom resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

2021-04-19 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325432#comment-17325432
 ] 

Qi Zhu edited comment on YARN-10743 at 4/20/21, 3:04 AM:
-

Thanks [~ebadger] for reply.

One case in our cluster :

If the container killed with large size log in long running flink, flink side 
we can find the problem actually, and another container will launch to run it. 
If the problem will happen again, the new container also will aggregated when 
the problem also happened (actually large than 100G log , always the user 
print), the abnormal log size may have more than 1TB in long running case, it 
will be a big pressure to HDFS.

I also agree with you that we want to check the logs in normal case, but if we 
can add it as an option, i think it is useful for long running jobs when the 
abnormal use printing happened.

 

Thanks.


was (Author: zhuqi):
Thanks [~ebadger] for reply.

One case in our cluster :

If the container killed with large size log in long running flink, flink side 
we can find the problem actually, and another container will launch to run it. 
If the problem will happen again, the new container also will aggregated when 
the problem also happened (actually large than 100G log , always the user 
print), the abnormal log size may have more than 1TB in long running case, it 
will be a big pressure to HDFS.

I also agree with you that we want to check the logs in normal case, but if we 
can add it as an option, i think it is useful for long running jobs when the 
abnormal use printing happened.

 

!image-2021-04-20-10-41-01-057.png|width=781,height=85!

{color:#de350b}Here the exit code of log context is always 0{color}, the policy 
actually not takes effect such as :
{code:java}
public class FailedContainerLogAggregationPolicy extends
AbstractContainerLogAggregationPolicy {
  public boolean shouldDoLogAggregation(ContainerLogContext logContext) {
int exitCode = logContext.getExitCode();
return exitCode != 0 && exitCode != ExitCode.FORCE_KILLED.getExitCode()
&& exitCode != ExitCode.TERMINATED.getExitCode();
  }
}
{code}
If i missed some code where those policy will take effect?

cc [~ebadger] [~Jim_Brennan] [~epayne] 

What's your opinions about this code in 
AppLogAggregatorImpl#uploadLogsForContainers?
{code:java}
if (shouldUploadLogs(new ContainerLogContext(
container.getContainerId(), containerType, 0))) {
  pendingContainerInThisCycle.add(container.getContainerId());
}{code}
Thanks.

> Add a policy for not aggregating for containers which are killed because 
> exceeding container log size limit.
> 
>
> Key: YARN-10743
> URL: https://issues.apache.org/jira/browse/YARN-10743
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10743.001.patch, image-2021-04-20-10-41-01-057.png
>
>
> Since YARN-10471 supported container log size limited for kill.
> We'd better to add a policy that can not aggregated for those containers, so 
> that to reduce the pressure of HDFS etc.
> cc [~epayne] [~Jim_Brennan] [~ebadger]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

2021-04-19 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325432#comment-17325432
 ] 

Qi Zhu edited comment on YARN-10743 at 4/20/21, 3:02 AM:
-

Thanks [~ebadger] for reply.

One case in our cluster :

If the container killed with large size log in long running flink, flink side 
we can find the problem actually, and another container will launch to run it. 
If the problem will happen again, the new container also will aggregated when 
the problem also happened (actually large than 100G log , always the user 
print), the abnormal log size may have more than 1TB in long running case, it 
will be a big pressure to HDFS.

I also agree with you that we want to check the logs in normal case, but if we 
can add it as an option, i think it is useful for long running jobs when the 
abnormal use printing happened.

 

!image-2021-04-20-10-41-01-057.png|width=781,height=85!

{color:#de350b}Here the exit code of log context is always 0{color}, the policy 
actually not takes effect such as :
{code:java}
public class FailedContainerLogAggregationPolicy extends
AbstractContainerLogAggregationPolicy {
  public boolean shouldDoLogAggregation(ContainerLogContext logContext) {
int exitCode = logContext.getExitCode();
return exitCode != 0 && exitCode != ExitCode.FORCE_KILLED.getExitCode()
&& exitCode != ExitCode.TERMINATED.getExitCode();
  }
}
{code}
If i missed some code where those policy will take effect?

cc [~ebadger] [~Jim_Brennan] [~epayne] 

What's your opinions about this code in 
AppLogAggregatorImpl#uploadLogsForContainers?
{code:java}
if (shouldUploadLogs(new ContainerLogContext(
container.getContainerId(), containerType, 0))) {
  pendingContainerInThisCycle.add(container.getContainerId());
}{code}
Thanks.


was (Author: zhuqi):
Thanks [~ebadger] for reply.

One case in our cluster :

If the container killed with large size log in long running flink, flink side 
we can find the problem actually, and another container will launch to run it. 
If the problem will happen again, the new container also will aggregated when 
the problem also happened (actually large than 100G log , always the user 
print), the abnormal log size may have more than 1TB in long running case, it 
will be a big pressure to HDFS.

I also agree with you that we want to check the logs in normal case, but if we 
can add it as an option, i think it is useful for long running jobs when the 
abnormal use printing happened.

 

Another problem about the code:

!image-2021-04-20-10-41-01-057.png|width=781,height=85!

{color:#de350b}Here the exit code of log context is always 0{color}, the policy 
actually not takes effect such as :
{code:java}
public class FailedContainerLogAggregationPolicy extends
AbstractContainerLogAggregationPolicy {
  public boolean shouldDoLogAggregation(ContainerLogContext logContext) {
int exitCode = logContext.getExitCode();
return exitCode != 0 && exitCode != ExitCode.FORCE_KILLED.getExitCode()
&& exitCode != ExitCode.TERMINATED.getExitCode();
  }
}
{code}
I think is's wrong.

cc [~ebadger] [~Jim_Brennan] [~epayne] 

What's your opinions about this code in 
AppLogAggregatorImpl#uploadLogsForContainers?
{code:java}
if (shouldUploadLogs(new ContainerLogContext(
container.getContainerId(), containerType, 0))) {
  pendingContainerInThisCycle.add(container.getContainerId());
}
{code}

> Add a policy for not aggregating for containers which are killed because 
> exceeding container log size limit.
> 
>
> Key: YARN-10743
> URL: https://issues.apache.org/jira/browse/YARN-10743
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10743.001.patch, image-2021-04-20-10-41-01-057.png
>
>
> Since YARN-10471 supported container log size limited for kill.
> We'd better to add a policy that can not aggregated for those containers, so 
> that to reduce the pressure of HDFS etc.
> cc [~epayne] [~Jim_Brennan] [~ebadger]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

2021-04-19 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325432#comment-17325432
 ] 

Qi Zhu commented on YARN-10743:
---

Thanks [~ebadger] for reply.

One case in our cluster :

If the container killed with large size log in long running flink, flink side 
we can find the problem actually, and another container will launch to run it. 
If the problem will happen again, the new container also will aggregated when 
the problem also happened (actually large than 100G log , always the user 
print), the abnormal log size may have more than 1TB in long running case, it 
will be a big pressure to HDFS.

I also agree with you that we want to check the logs in normal case, but if we 
can add it as an option, i think it is useful for long running jobs when the 
abnormal use printing happened.

 

Another problem about the code:

!image-2021-04-20-10-41-01-057.png|width=781,height=85!

{color:#de350b}Here the exit code of log context is always 0{color}, the policy 
actually not takes effect such as :
{code:java}
public class FailedContainerLogAggregationPolicy extends
AbstractContainerLogAggregationPolicy {
  public boolean shouldDoLogAggregation(ContainerLogContext logContext) {
int exitCode = logContext.getExitCode();
return exitCode != 0 && exitCode != ExitCode.FORCE_KILLED.getExitCode()
&& exitCode != ExitCode.TERMINATED.getExitCode();
  }
}
{code}
I think is's wrong.

cc [~ebadger] [~Jim_Brennan] [~epayne] 

What's your opinions about this code in 
AppLogAggregatorImpl#uploadLogsForContainers?
{code:java}
if (shouldUploadLogs(new ContainerLogContext(
container.getContainerId(), containerType, 0))) {
  pendingContainerInThisCycle.add(container.getContainerId());
}
{code}

> Add a policy for not aggregating for containers which are killed because 
> exceeding container log size limit.
> 
>
> Key: YARN-10743
> URL: https://issues.apache.org/jira/browse/YARN-10743
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10743.001.patch, image-2021-04-20-10-41-01-057.png
>
>
> Since YARN-10471 supported container log size limited for kill.
> We'd better to add a policy that can not aggregated for those containers, so 
> that to reduce the pressure of HDFS etc.
> cc [~epayne] [~Jim_Brennan] [~ebadger]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

2021-04-19 Thread Qi Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10743:
--
Attachment: image-2021-04-20-10-41-01-057.png

> Add a policy for not aggregating for containers which are killed because 
> exceeding container log size limit.
> 
>
> Key: YARN-10743
> URL: https://issues.apache.org/jira/browse/YARN-10743
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10743.001.patch, image-2021-04-20-10-41-01-057.png
>
>
> Since YARN-10471 supported container log size limited for kill.
> We'd better to add a policy that can not aggregated for those containers, so 
> that to reduce the pressure of HDFS etc.
> cc [~epayne] [~Jim_Brennan] [~ebadger]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10744) add more doc for yarn.federation.policy-manager-params

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-10744:
--
Labels: pull-request-available  (was: )

> add more doc for yarn.federation.policy-manager-params
> --
>
> Key: YARN-10744
> URL: https://issues.apache.org/jira/browse/YARN-10744
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: chaosju
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> add example of yarn.federation.policy-manager-params



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10744) add more doc for yarn.federation.policy-manager-params

2021-04-19 Thread chaosju (Jira)

chaosju created YARN-10744:
--

 Summary: add more doc for yarn.federation.policy-manager-params
 Key: YARN-10744
 URL: https://issues.apache.org/jira/browse/YARN-10744
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: federation
Reporter: chaosju


add example of yarn.federation.policy-manager-params



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-19 Thread Eric Badger (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-10460:
---
Fix Version/s: 2.10.2
   3.1.5

Thanks for the review, [~Jim_Brennan]. The spotbugs is unrelated to this patch 
(in a different file, Server.java). I've committed the 2.10 patch to 
branch-2.10.

This jira has now been committed to trunk (3.4), branch-3.3, branch-3.2, 
branch-3.1, and branch-2.10

> Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
> -
>
> Key: YARN-10460
> URL: https://issues.apache.org/jira/browse/YARN-10460
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: YARN-10460-001.patch, YARN-10460-002.patch, 
> YARN-10460-POC.patch, YARN-10460-branch-2.10.002.patch, 
> YARN-10460-branch-3.2.002.patch
>
>
> In our downstream build environment, we're using JUnit 4.13. Recently, we 
> discovered a truly weird test failure in TestNodeStatusUpdater.
> The problem is that timeout handling has changed in Junit 4.13. See the 
> difference between these two snippets:
> 4.12
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask task = new FutureTask(callable);
> threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> }
> {noformat}
>  
>  4.13
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask task = new FutureTask(callable);
> ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> try {
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> } finally {
> try {
> thread.join(1);
> } catch (InterruptedException e) {
> Thread.currentThread().interrupt();
> }
> try {
> threadGroup.destroy();  < This
> } catch (IllegalThreadStateException e) {
> // If a thread from the group is still alive, the ThreadGroup 
> cannot be destroyed.
> // Swallow the exception to keep the same behavior prior to 
> this change.
> }
> }
> }
> {noformat}
> The change comes from [https://github.com/junit-team/junit4/pull/1517].
> Unfortunately, destroying the thread group causes an issue because there are 
> all sorts of object caching in the IPC layer. The exception is:
> {noformat}
> java.lang.IllegalThreadStateException
>   at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867)
>   at java.lang.Thread.init(Thread.java:402)
>   at java.lang.Thread.init(Thread.java:349)
>   at java.lang.Thread.(Thread.java:675)
>   at 
> java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613)
>   at 
> com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:612)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
>   at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
>   at 
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1458)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1405)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>   at com.sun.proxy.$Proxy81.startContainers(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
>   at 
>

[jira] [Commented] (YARN-7769) FS QueueManager should not create default queue at init

2021-04-19 Thread Wilfred Spiegelenburg (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325396#comment-17325396
 ] 

Wilfred Spiegelenburg commented on YARN-7769:
-

Code change looks good. This does however also impact the 
[documentation|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html]:
 * {{By default, all users share a single queue, named “default”.}}
 * {{user-as-default-queue}} says it falls back to the default queue
 * {{allow-undeclared-pools}} is another point that mentions the default queue

The last impact is on the default placement rule. Not sure if that is just a 
documentation change or a needs more.

> FS QueueManager should not create default queue at init
> ---
>
> Key: YARN-7769
> URL: https://issues.apache.org/jira/browse/YARN-7769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-7769.001.patch, YARN-7769.002.patch, 
> YARN-7769.003.patch
>
>
> Currently the FairScheduler QueueManager automatically creates the default 
> queue. However the default queue does not need to exist. We have two possible 
> cases which we should handle:
> * Based on the placement rule "Default" the name for the default queue might 
> not be default and it should be created with a different name
> * There might not be a "Default" placement rule at all which removes the need 
> to create the queue.
> We should leave the creation of the default queue to the point in time that 
> we can assess if it is needed or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9586) [QA] Need more doc for yarn.federation.policy-manager-params when LoadBasedRouterPolicy is used

2021-04-19 Thread chaosju (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325394#comment-17325394
 ] 

chaosju commented on YARN-9586:
---

cc [~qiuliang988]

> [QA] Need more doc for yarn.federation.policy-manager-params when 
> LoadBasedRouterPolicy is used
> ---
>
> Key: YARN-9586
> URL: https://issues.apache.org/jira/browse/YARN-9586
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: federation
>Reporter: Shen Yinjie
>Priority: Major
>
> We picked LoadBasedRouterPolicy for YARN federation, but had no idea what to 
>  set to yarn.federation.policy-manager-params. Is there a demo config or more 
> detailed description for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9586) [QA] Need more doc for yarn.federation.policy-manager-params when LoadBasedRouterPolicy is used

2021-04-19 Thread chaosju (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325393#comment-17325393
 ] 

chaosju commented on YARN-9586:
---

{"routerPolicyWeights":\{"entry":[{"key":{"id":"sc1"},"value":"0.07472054"},\{"key":{"id":"sc16"},"value":"0.14857134"},\{"key":{"id":"sc0"},"value":"0.048437383"},\{"key":{"id":"sc15"},"value":"0.16541433"},\{"key":{"id":"sc3"},"value":"0.22241566"},\{"key":{"id":"sc18"},"value":"0.18826425"},\{"key":{"id":"sc2"},"value":"0.052433133"},\{"key":{"id":"sc17"},"value":"0.04532957"},\{"key":{"id":"sc5"},"value":"0.13232505"},\{"key":{"id":"sc12"},"value":"0.082701534"},\{"key":{"id":"sc4"},"value":"0.21456465"},\{"key":{"id":"sc11"},"value":"0.04693638"},\{"key":{"id":"sc7"},"value":"0.20451826"},\{"key":{"id":"sc14"},"value":"0.11614501"},\{"key":{"id":"sc6"},"value":"0.059735205"},\{"key":{"id":"sc13"},"value":"0.18182552"},\{"key":{"id":"sc9"},"value":"0.19460526"},\{"key":{"id":"sc8"},"value":"0.04246365"},\{"key":{"id":"sc19"},"value":"0.192648"},\{"key":{"id":"sc10"},"value":"0.1864759"}]},"amrmPolicyWeights":\{"entry":[{"key":{"id":"sc1"},"value":"0.07472054"},\{"key":{"id":"sc16"},"value":"0.14857134"},\{"key":{"id":"sc0"},"value":"0.048437383"},\{"key":{"id":"sc15"},"value":"0.16541433"},\{"key":{"id":"sc3"},"value":"0.22241566"},\{"key":{"id":"sc18"},"value":"0.18826425"},\{"key":{"id":"sc2"},"value":"0.052433133"},\{"key":{"id":"sc17"},"value":"0.04532957"},\{"key":{"id":"sc5"},"value":"0.13232505"},\{"key":{"id":"sc12"},"value":"0.082701534"},\{"key":{"id":"sc4"},"value":"0.21456465"},\{"key":{"id":"sc11"},"value":"0.04693638"},\{"key":{"id":"sc7"},"value":"0.20451826"},\{"key":{"id":"sc14"},"value":"0.11614501"},\{"key":{"id":"sc6"},"value":"0.059735205"},\{"key":{"id":"sc13"},"value":"0.18182552"},\{"key":{"id":"sc9"},"value":"0.19460526"},\{"key":{"id":"sc8"},"value":"0.04246365"},\{"key":{"id":"sc19"},"value":"0.192648"},\{"key":{"id":"sc10"},"value":"0.1864759"}]},"headroomAlpha":"0.0"}

> [QA] Need more doc for yarn.federation.policy-manager-params when 
> LoadBasedRouterPolicy is used
> ---
>
> Key: YARN-9586
> URL: https://issues.apache.org/jira/browse/YARN-9586
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: federation
>Reporter: Shen Yinjie
>Priority: Major
>
> We picked LoadBasedRouterPolicy for YARN federation, but had no idea what to 
>  set to yarn.federation.policy-manager-params. Is there a demo config or more 
> detailed description for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-19 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325392#comment-17325392
 ] 

Hadoop QA commented on YARN-10460:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 
34s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} branch-2.10 Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
20s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
40s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
54s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
17s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~16.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
55s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
56s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
45s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
28s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~16.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 11m 
19s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:red}-1{color} | {color:red} spotbugs {color} | {color:red}  2m  
0s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/917/artifact/out/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html{color}
 | {color:red} hadoop-common-project/hadoop-common in branch-2.10 has 2 extant 
spotbugs warnings. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
16s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
13s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 
13s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m  
5s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~16.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m  
5s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
57s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
44s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10 {color} |
|

[jira] [Commented] (YARN-10715) Remove hardcoded resource values (e.g. GPU/FPGA) in code.

2021-04-19 Thread Eric Badger (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325387#comment-17325387
 ] 

Eric Badger commented on YARN-10715:


Finally getting around to looking at this and I don't think removing the 
hardcoded values from ResourceUtils is necessary. It's not necessary for the 
resource translation to be there for GPUs and FPGAs, but it also doesn't hurt 
anything. I'm not quite sure why {{yarn.io/gpu}} was chosen anyway, since that 
seems like a pretty complex name for something as simple as a gpu. But I'm sure 
there was a good reason.

Anyway, we should definitely strive to remove any code that is hardcoding 
calculations for GPUs/FPGAs and generalize them to any extended resource type. 
But in this case, this is just a simple translation from {{gpu}} to 
{{yarn.io/gpu}} and similarily for FPGAs. I'm inclined to close this as won't do

> Remove hardcoded resource values (e.g. GPU/FPGA) in code.
> -
>
> Key: YARN-10715
> URL: https://issues.apache.org/jira/browse/YARN-10715
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10715.001.patch
>
>
> https://issues.apache.org/jira/browse/YARN-10503?focusedCommentId=17307772=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17307772
> As above comment , we should remove hardcoded resource values (e.g. GPU/FPGA) 
> in code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

2021-04-19 Thread Eric Badger (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325343#comment-17325343
 ] 

Eric Badger commented on YARN-10743:


I have the same concern as [~Jim_Brennan]. If the flink logs are large enough 
that the container is getting killed, don't you want to check the logs to see 
what happened? I'm trying to understand the scenario where you wouldn't want 
the logs even though your container failed due to large log size. Is there a 
reason that you don't care about the logs in this instance?

> Add a policy for not aggregating for containers which are killed because 
> exceeding container log size limit.
> 
>
> Key: YARN-10743
> URL: https://issues.apache.org/jira/browse/YARN-10743
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10743.001.patch
>
>
> Since YARN-10471 supported container log size limited for kill.
> We'd better to add a policy that can not aggregated for those containers, so 
> that to reduce the pressure of HDFS etc.
> cc [~epayne] [~Jim_Brennan] [~ebadger]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-19 Thread Jim Brennan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325332#comment-17325332
 ] 

Jim Brennan commented on YARN-10460:


+1 on the branch-2.10 patch.


> Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
> -
>
> Key: YARN-10460
> URL: https://issues.apache.org/jira/browse/YARN-10460
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.4.0, 3.3.1, 3.2.3
>
> Attachments: YARN-10460-001.patch, YARN-10460-002.patch, 
> YARN-10460-POC.patch, YARN-10460-branch-2.10.002.patch, 
> YARN-10460-branch-3.2.002.patch
>
>
> In our downstream build environment, we're using JUnit 4.13. Recently, we 
> discovered a truly weird test failure in TestNodeStatusUpdater.
> The problem is that timeout handling has changed in Junit 4.13. See the 
> difference between these two snippets:
> 4.12
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask task = new FutureTask(callable);
> threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> }
> {noformat}
>  
>  4.13
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask task = new FutureTask(callable);
> ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> try {
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> } finally {
> try {
> thread.join(1);
> } catch (InterruptedException e) {
> Thread.currentThread().interrupt();
> }
> try {
> threadGroup.destroy();  < This
> } catch (IllegalThreadStateException e) {
> // If a thread from the group is still alive, the ThreadGroup 
> cannot be destroyed.
> // Swallow the exception to keep the same behavior prior to 
> this change.
> }
> }
> }
> {noformat}
> The change comes from [https://github.com/junit-team/junit4/pull/1517].
> Unfortunately, destroying the thread group causes an issue because there are 
> all sorts of object caching in the IPC layer. The exception is:
> {noformat}
> java.lang.IllegalThreadStateException
>   at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867)
>   at java.lang.Thread.init(Thread.java:402)
>   at java.lang.Thread.init(Thread.java:349)
>   at java.lang.Thread.(Thread.java:675)
>   at 
> java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613)
>   at 
> com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:612)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
>   at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
>   at 
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1458)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1405)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>   at com.sun.proxy.$Proxy81.startContainers(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1576)
> {noformat}
> Both the

[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-19 Thread Eric Badger (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325329#comment-17325329
 ] 

Eric Badger commented on YARN-10460:


Posting a branch-2.10 patch that doesn't use a lambda expression (not support 
in Java 7).

> Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
> -
>
> Key: YARN-10460
> URL: https://issues.apache.org/jira/browse/YARN-10460
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.4.0, 3.3.1, 3.2.3
>
> Attachments: YARN-10460-001.patch, YARN-10460-002.patch, 
> YARN-10460-POC.patch, YARN-10460-branch-2.10.002.patch, 
> YARN-10460-branch-3.2.002.patch
>
>
> In our downstream build environment, we're using JUnit 4.13. Recently, we 
> discovered a truly weird test failure in TestNodeStatusUpdater.
> The problem is that timeout handling has changed in Junit 4.13. See the 
> difference between these two snippets:
> 4.12
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask task = new FutureTask(callable);
> threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> }
> {noformat}
>  
>  4.13
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask task = new FutureTask(callable);
> ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> try {
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> } finally {
> try {
> thread.join(1);
> } catch (InterruptedException e) {
> Thread.currentThread().interrupt();
> }
> try {
> threadGroup.destroy();  < This
> } catch (IllegalThreadStateException e) {
> // If a thread from the group is still alive, the ThreadGroup 
> cannot be destroyed.
> // Swallow the exception to keep the same behavior prior to 
> this change.
> }
> }
> }
> {noformat}
> The change comes from [https://github.com/junit-team/junit4/pull/1517].
> Unfortunately, destroying the thread group causes an issue because there are 
> all sorts of object caching in the IPC layer. The exception is:
> {noformat}
> java.lang.IllegalThreadStateException
>   at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867)
>   at java.lang.Thread.init(Thread.java:402)
>   at java.lang.Thread.init(Thread.java:349)
>   at java.lang.Thread.(Thread.java:675)
>   at 
> java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613)
>   at 
> com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:612)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
>   at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
>   at 
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1458)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1405)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>   at com.sun.proxy.$Proxy81.startContainers(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251)
>   at 
>

[jira] [Updated] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-19 Thread Eric Badger (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-10460:
---
Attachment: YARN-10460-branch-2.10.002.patch

> Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
> -
>
> Key: YARN-10460
> URL: https://issues.apache.org/jira/browse/YARN-10460
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.4.0, 3.3.1, 3.2.3
>
> Attachments: YARN-10460-001.patch, YARN-10460-002.patch, 
> YARN-10460-POC.patch, YARN-10460-branch-2.10.002.patch, 
> YARN-10460-branch-3.2.002.patch
>
>
> In our downstream build environment, we're using JUnit 4.13. Recently, we 
> discovered a truly weird test failure in TestNodeStatusUpdater.
> The problem is that timeout handling has changed in Junit 4.13. See the 
> difference between these two snippets:
> 4.12
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask task = new FutureTask(callable);
> threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> }
> {noformat}
>  
>  4.13
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask task = new FutureTask(callable);
> ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> try {
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> } finally {
> try {
> thread.join(1);
> } catch (InterruptedException e) {
> Thread.currentThread().interrupt();
> }
> try {
> threadGroup.destroy();  < This
> } catch (IllegalThreadStateException e) {
> // If a thread from the group is still alive, the ThreadGroup 
> cannot be destroyed.
> // Swallow the exception to keep the same behavior prior to 
> this change.
> }
> }
> }
> {noformat}
> The change comes from [https://github.com/junit-team/junit4/pull/1517].
> Unfortunately, destroying the thread group causes an issue because there are 
> all sorts of object caching in the IPC layer. The exception is:
> {noformat}
> java.lang.IllegalThreadStateException
>   at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867)
>   at java.lang.Thread.init(Thread.java:402)
>   at java.lang.Thread.init(Thread.java:349)
>   at java.lang.Thread.(Thread.java:675)
>   at 
> java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613)
>   at 
> com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:612)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
>   at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
>   at 
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1458)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1405)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>   at com.sun.proxy.$Proxy81.startContainers(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1576)
> {noformat}
> Both the {{clientExecutor}} in

[jira] [Comment Edited] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-19 Thread Eric Badger (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325291#comment-17325291
 ] 

Eric Badger edited comment on YARN-10460 at 4/19/21, 8:26 PM:
--

Thanks for the review, [~Jim_Brennan]! I've committed the 3.2 patch to 
branch-3.2 and cherry-picked it to branch-3.1. So now this has been committed 
to trunk (3.4), branch-3.3, branch-3.2, and branch-3.1


was (Author: ebadger):
Thanks for the review, [~Jim_Brennan]! I've committed the 3.2 patch to 
branch-3.2. So now this has been committed to trunk (3.4), branch-3.3, and 
branch-3.2

> Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
> -
>
> Key: YARN-10460
> URL: https://issues.apache.org/jira/browse/YARN-10460
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.4.0, 3.3.1, 3.2.3
>
> Attachments: YARN-10460-001.patch, YARN-10460-002.patch, 
> YARN-10460-POC.patch, YARN-10460-branch-3.2.002.patch
>
>
> In our downstream build environment, we're using JUnit 4.13. Recently, we 
> discovered a truly weird test failure in TestNodeStatusUpdater.
> The problem is that timeout handling has changed in Junit 4.13. See the 
> difference between these two snippets:
> 4.12
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask task = new FutureTask(callable);
> threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> }
> {noformat}
>  
>  4.13
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask task = new FutureTask(callable);
> ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> try {
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> } finally {
> try {
> thread.join(1);
> } catch (InterruptedException e) {
> Thread.currentThread().interrupt();
> }
> try {
> threadGroup.destroy();  < This
> } catch (IllegalThreadStateException e) {
> // If a thread from the group is still alive, the ThreadGroup 
> cannot be destroyed.
> // Swallow the exception to keep the same behavior prior to 
> this change.
> }
> }
> }
> {noformat}
> The change comes from [https://github.com/junit-team/junit4/pull/1517].
> Unfortunately, destroying the thread group causes an issue because there are 
> all sorts of object caching in the IPC layer. The exception is:
> {noformat}
> java.lang.IllegalThreadStateException
>   at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867)
>   at java.lang.Thread.init(Thread.java:402)
>   at java.lang.Thread.init(Thread.java:349)
>   at java.lang.Thread.(Thread.java:675)
>   at 
> java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613)
>   at 
> com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:612)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
>   at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
>   at 
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1458)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1405)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>   at com.sun.proxy.$Proxy81.startContainers(Unknown Source)
>   at 
>

[jira] [Updated] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-19 Thread Eric Badger (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-10460:
---
Fix Version/s: 3.2.3

Thanks for the review, [~Jim_Brennan]! I've committed the 3.2 patch to 
branch-3.2. So now this has been committed to trunk (3.4), branch-3.3, and 
branch-3.2

> Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
> -
>
> Key: YARN-10460
> URL: https://issues.apache.org/jira/browse/YARN-10460
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.4.0, 3.3.1, 3.2.3
>
> Attachments: YARN-10460-001.patch, YARN-10460-002.patch, 
> YARN-10460-POC.patch, YARN-10460-branch-3.2.002.patch
>
>
> In our downstream build environment, we're using JUnit 4.13. Recently, we 
> discovered a truly weird test failure in TestNodeStatusUpdater.
> The problem is that timeout handling has changed in Junit 4.13. See the 
> difference between these two snippets:
> 4.12
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask task = new FutureTask(callable);
> threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> }
> {noformat}
>  
>  4.13
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask task = new FutureTask(callable);
> ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> try {
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> } finally {
> try {
> thread.join(1);
> } catch (InterruptedException e) {
> Thread.currentThread().interrupt();
> }
> try {
> threadGroup.destroy();  < This
> } catch (IllegalThreadStateException e) {
> // If a thread from the group is still alive, the ThreadGroup 
> cannot be destroyed.
> // Swallow the exception to keep the same behavior prior to 
> this change.
> }
> }
> }
> {noformat}
> The change comes from [https://github.com/junit-team/junit4/pull/1517].
> Unfortunately, destroying the thread group causes an issue because there are 
> all sorts of object caching in the IPC layer. The exception is:
> {noformat}
> java.lang.IllegalThreadStateException
>   at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867)
>   at java.lang.Thread.init(Thread.java:402)
>   at java.lang.Thread.init(Thread.java:349)
>   at java.lang.Thread.(Thread.java:675)
>   at 
> java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613)
>   at 
> com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:612)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
>   at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
>   at 
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1458)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1405)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>   at com.sun.proxy.$Proxy81.startContainers(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251)
>   at 
>

[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-19 Thread Jim Brennan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325278#comment-17325278
 ] 

Jim Brennan commented on YARN-10460:


+1 on the branch-3.2 patch.  Looks good to me.


> Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
> -
>
> Key: YARN-10460
> URL: https://issues.apache.org/jira/browse/YARN-10460
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10460-001.patch, YARN-10460-002.patch, 
> YARN-10460-POC.patch, YARN-10460-branch-3.2.002.patch
>
>
> In our downstream build environment, we're using JUnit 4.13. Recently, we 
> discovered a truly weird test failure in TestNodeStatusUpdater.
> The problem is that timeout handling has changed in Junit 4.13. See the 
> difference between these two snippets:
> 4.12
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask task = new FutureTask(callable);
> threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> }
> {noformat}
>  
>  4.13
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask task = new FutureTask(callable);
> ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> try {
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> } finally {
> try {
> thread.join(1);
> } catch (InterruptedException e) {
> Thread.currentThread().interrupt();
> }
> try {
> threadGroup.destroy();  < This
> } catch (IllegalThreadStateException e) {
> // If a thread from the group is still alive, the ThreadGroup 
> cannot be destroyed.
> // Swallow the exception to keep the same behavior prior to 
> this change.
> }
> }
> }
> {noformat}
> The change comes from [https://github.com/junit-team/junit4/pull/1517].
> Unfortunately, destroying the thread group causes an issue because there are 
> all sorts of object caching in the IPC layer. The exception is:
> {noformat}
> java.lang.IllegalThreadStateException
>   at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867)
>   at java.lang.Thread.init(Thread.java:402)
>   at java.lang.Thread.init(Thread.java:349)
>   at java.lang.Thread.(Thread.java:675)
>   at 
> java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613)
>   at 
> com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:612)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
>   at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
>   at 
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1458)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1405)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>   at com.sun.proxy.$Proxy81.startContainers(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1576)
> {noformat}
> Both the {{clientExecutor}} in

[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

2021-04-19 Thread Eric Badger (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325231#comment-17325231
 ] 

Eric Badger commented on YARN-10460:


The unit tests seem unrelated and don't fail for me locally. [~pbacsko], 
[~aajisaka], [~Jim_Brennan], could one of your review the branch-3.2 patch?

> Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
> -
>
> Key: YARN-10460
> URL: https://issues.apache.org/jira/browse/YARN-10460
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10460-001.patch, YARN-10460-002.patch, 
> YARN-10460-POC.patch, YARN-10460-branch-3.2.002.patch
>
>
> In our downstream build environment, we're using JUnit 4.13. Recently, we 
> discovered a truly weird test failure in TestNodeStatusUpdater.
> The problem is that timeout handling has changed in Junit 4.13. See the 
> difference between these two snippets:
> 4.12
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask task = new FutureTask(callable);
> threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> }
> {noformat}
>  
>  4.13
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask task = new FutureTask(callable);
> ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> try {
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> } finally {
> try {
> thread.join(1);
> } catch (InterruptedException e) {
> Thread.currentThread().interrupt();
> }
> try {
> threadGroup.destroy();  < This
> } catch (IllegalThreadStateException e) {
> // If a thread from the group is still alive, the ThreadGroup 
> cannot be destroyed.
> // Swallow the exception to keep the same behavior prior to 
> this change.
> }
> }
> }
> {noformat}
> The change comes from [https://github.com/junit-team/junit4/pull/1517].
> Unfortunately, destroying the thread group causes an issue because there are 
> all sorts of object caching in the IPC layer. The exception is:
> {noformat}
> java.lang.IllegalThreadStateException
>   at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867)
>   at java.lang.Thread.init(Thread.java:402)
>   at java.lang.Thread.init(Thread.java:349)
>   at java.lang.Thread.(Thread.java:675)
>   at 
> java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613)
>   at 
> com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:612)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
>   at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
>   at 
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1458)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1405)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>   at com.sun.proxy.$Proxy81.startContainers(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251)
>   at 
>

[jira] [Commented] (YARN-10723) Change CS nodes page in UI to support custom resource.

2021-04-19 Thread Eric Badger (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325210#comment-17325210
 ] 

Eric Badger commented on YARN-10723:


Looks like it still never ran. [~zhuqi], can you re-upload the patch? 

> Change CS nodes page in UI to support custom resource.
> --
>
> Key: YARN-10723
> URL: https://issues.apache.org/jira/browse/YARN-10723
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10723.001.patch, YARN-10723.002.patch, 
> YARN-10723.003.patch, YARN-10723.004.patch, image-2021-04-06-17-22-32-733.png
>
>
> Node page now only support gpu for custom resource.
> We should make this supported for all custom resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

2021-04-19 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325109#comment-17325109
 ] 

Qi Zhu edited comment on YARN-10743 at 4/19/21, 3:22 PM:
-

Thanks [~Jim_Brennan] for reply.

But in our cluster, some flink log size is huge, i think it's helpful to add a 
support for this policy as an option.

 


was (Author: zhuqi):
Thanks [~Jim_Brennan] for reply.

But in our cluster, some flink log size is huge more than 500G, i think it's 
helpful to add a support for this policy as an option.

 

> Add a policy for not aggregating for containers which are killed because 
> exceeding container log size limit.
> 
>
> Key: YARN-10743
> URL: https://issues.apache.org/jira/browse/YARN-10743
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10743.001.patch
>
>
> Since YARN-10471 supported container log size limited for kill.
> We'd better to add a policy that can not aggregated for those containers, so 
> that to reduce the pressure of HDFS etc.
> cc [~epayne] [~Jim_Brennan] [~ebadger]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

2021-04-19 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325109#comment-17325109
 ] 

Qi Zhu edited comment on YARN-10743 at 4/19/21, 3:21 PM:
-

Thanks [~Jim_Brennan] for reply.

But in our cluster, some flink log size is huge more than 500G, i think it's 
helpful to add a support for this policy as an option.

 


was (Author: zhuqi):
Thanks [~Jim_Brennan] for reply.

But in our cluster, some flink log size is huge, i think it's helpful to add a 
support for this policy as an option.

 

> Add a policy for not aggregating for containers which are killed because 
> exceeding container log size limit.
> 
>
> Key: YARN-10743
> URL: https://issues.apache.org/jira/browse/YARN-10743
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10743.001.patch
>
>
> Since YARN-10471 supported container log size limited for kill.
> We'd better to add a policy that can not aggregated for those containers, so 
> that to reduce the pressure of HDFS etc.
> cc [~epayne] [~Jim_Brennan] [~ebadger]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

2021-04-19 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325112#comment-17325112
 ] 

Hadoop QA commented on YARN-10743:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
37s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
57s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
50s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
30s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 24s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 22m 
24s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
35s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
35s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
24s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
24s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 29s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/916/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 2 new + 143 unchanged - 0 fixed = 145 total (was 143) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  2s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} |

[jira] [Commented] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

2021-04-19 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325109#comment-17325109
 ] 

Qi Zhu commented on YARN-10743:
---

Thanks [~Jim_Brennan] for reply.

But in our cluster, some flink log size is huge, i think it's helpful to add a 
support for this policy as an option.

 

> Add a policy for not aggregating for containers which are killed because 
> exceeding container log size limit.
> 
>
> Key: YARN-10743
> URL: https://issues.apache.org/jira/browse/YARN-10743
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10743.001.patch
>
>
> Since YARN-10471 supported container log size limited for kill.
> We'd better to add a policy that can not aggregated for those containers, so 
> that to reduce the pressure of HDFS etc.
> cc [~epayne] [~Jim_Brennan] [~ebadger]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

2021-04-19 Thread Jim Brennan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325102#comment-17325102
 ] 

Jim Brennan commented on YARN-10743:


I don't think this is necessary.  The logs may actually be useful in debugging 
why the job is logging so much.

> Add a policy for not aggregating for containers which are killed because 
> exceeding container log size limit.
> 
>
> Key: YARN-10743
> URL: https://issues.apache.org/jira/browse/YARN-10743
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10743.001.patch
>
>
> Since YARN-10471 supported container log size limited for kill.
> We'd better to add a policy that can not aggregated for those containers, so 
> that to reduce the pressure of HDFS etc.
> cc [~epayne] [~Jim_Brennan] [~ebadger]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

2021-04-19 Thread Qi Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10743:
--
Summary: Add a policy for not aggregating for containers which are killed 
because exceeding container log size limit.  (was: Add a policy for not 
aggregating for container log size limit killed container.)

> Add a policy for not aggregating for containers which are killed because 
> exceeding container log size limit.
> 
>
> Key: YARN-10743
> URL: https://issues.apache.org/jira/browse/YARN-10743
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10743.001.patch
>
>
> Since YARN-10471 supported container log size limited for kill.
> We'd better to add a policy that can not aggregated for those containers, so 
> that to reduce the pressure of HDFS etc.
> cc [~epayne] [~Jim_Brennan] [~ebadger]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10743) Add a policy for not aggregating for container log size limit killed container.

2021-04-19 Thread Qi Zhu (Jira)

Qi Zhu created YARN-10743:
-

 Summary: Add a policy for not aggregating for container log size 
limit killed container.
 Key: YARN-10743
 URL: https://issues.apache.org/jira/browse/YARN-10743
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Qi Zhu
Assignee: Qi Zhu


Since YARN-10471 supported container log size limited for kill.

We'd better to add a policy that can not aggregated for those containers, so 
that to reduce the pressure of HDFS etc.

cc [~epayne] [~Jim_Brennan] [~ebadger]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9906) When setting multi volumes throurh the "YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS" setting is not valid

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-9906:
-
Labels: pull-request-available  (was: )

> When setting multi volumes throurh the "YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS" 
> setting is not  valid
> ---
>
> Key: YARN-9906
> URL: https://issues.apache.org/jira/browse/YARN-9906
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: lynn
>Priority: Major
>  Labels: pull-request-available
> Attachments: docker_volume_mounts.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As 
> [https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/DockerContainers.html#Application_Submission]
>  described, when I set the item "{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS" to 
> multi volumes mounts, the value is a comma-separated list of mounts.}}
>  
> {quote}vars="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-docker,
>  
> YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/etc/passwd:/etc/passwd:ro,/etc/group:/etc/group:ro;/etc/hadoop/conf:/etc/hadoop/conf"
>  hadoop jar hadoop-examples.jar pi -Dyarn.app.mapreduce.am.env=$vars \
>  -Dmapreduce.map.env=$vars -Dmapreduce.reduce.env=$vars 10 100{quote}
> I found the docker container can mount the first volume, so it can't be 
> running successfully without report error!
> The code of 
> [DockerLinuxContainerRuntime.java|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java]
>  as follows:
> {quote}if (environment.containsKey(ENV_DOCKER_CONTAINER_MOUNTS)) {
>   Matcher parsedMounts = USER_MOUNT_PATTERN.matcher(
>   environment.get(ENV_DOCKER_CONTAINER_MOUNTS));
>   if (!parsedMounts.find()) {
> throw new ContainerExecutionException(
> "Unable to parse user supplied mount list: "
> + environment.get(ENV_DOCKER_CONTAINER_MOUNTS));
>   }{quote}
> The regex pattern is in 
> [OCIContainerRuntime|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/OCIContainerRuntime.java]
>  as follows
> {quote}static final Pattern USER_MOUNT_PATTERN = Pattern.compile(
>   "(?<=^|,)([^:\\x00]+):([^:\\x00]+)" +
>   "(:(r[ow]|(r[ow][+])?(r?shared|r?slave|r?private)))?(?:,|$)");{quote}
> it is seperated by comma indeed, but when i read the code of submit the jar 
> to yarn , i find the code 
> [Apps.java|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java]
> {quote}private static final Pattern VARVAL_SPLITTER = Pattern.compile(
> "(?<=^|,)"// preceded by ',' or line begin
>   + '(' + Shell.ENV_NAME_REGEX + ')'  // var group
>   + '='
>   + "([^,]*)" // val group
>   );
> {quote}
> It is sepearted by comma as the same.
> So, I just modify the comma to semicolon(";") of the item 
> "YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10739) GenericEventHandler.printEventQueueDetails cause RM recovery cost too much time

2021-04-19 Thread Zhanqi Cai (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324779#comment-17324779
 ] 

Zhanqi Cai commented on YARN-10739:
---

LGTM. 

> GenericEventHandler.printEventQueueDetails cause RM recovery cost too much 
> time
> ---
>
> Key: YARN-10739
> URL: https://issues.apache.org/jira/browse/YARN-10739
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.4.0, 3.3.1, 3.2.3
>Reporter: Zhanqi Cai
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10739-001.patch, YARN-10739-002.patch, 
> YARN-10739.003.patch, YARN-10739.003.patch
>
>
> Due to YARN-8995 YARN-10642 add GenericEventHandler.printEventQueueDetails on 
> AsyncDispatcher, if the event queue size is too large, the 
> printEventQueueDetails will cost too much time and RM  take a long time to 
> process.
> For example:
>  If we have 4K nodes on cluster and 4K apps running, if we do switch and the 
> node manager will register with RM, and RM will call NodesListManager to do 
> RMAppNodeUpdateEvent, code like below:
> {code:java}
> for(RMApp app : rmContext.getRMApps().values()) {
>   if (!app.isAppFinalStateStored()) {
> this.rmContext
> .getDispatcher()
> .getEventHandler()
> .handle(
> new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
> appNodeUpdateType));
>   }
> }{code}
> So the total event is 4k*4k=16 mil, during this window, the 
> GenericEventHandler.printEventQueueDetails will print the event queue detail 
> and be called frequently, once the event queue size reaches 1 mil+, the 
> Iterator of the queue from printEventQueueDetails will be so slow refer to 
> below: 
> {code:java}
> private void printEventQueueDetails() {
>   Iterator iterator = eventQueue.iterator();
>   Map counterMap = new HashMap<>();
>   while (iterator.hasNext()) {
> Enum eventType = iterator.next().getType();
> {code}
> Then RM recovery will cost too much time.
>  Refer to our log:
> {code:java}
> 2021-04-14 20:35:34,432 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(306)) - Size of event-queue is 1200
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: KILL, Event 
> record counter: 310836
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: NODE_UPDATE, 
> Event record counter: 1103
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: 
> NODE_REMOVED, Event record counter: 1
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: APP_REMOVED, 
> Event record counter: 1
> {code}
> Between AsyncDispatcher.handle and printEventQueueDetails, here is more than 
> 1s to do Iterator.
> I upload a file to ensure the printEventQueueDetails only be called one-time 
> pre-30s.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10745) Change Log level from info to debug for few logs and remove unnecessary debuglog checks

[jira] [Commented] (YARN-10715) Remove hardcoded resource values (e.g. GPU/FPGA) in code.

[jira] [Commented] (YARN-10723) Change CS nodes page in UI to support custom resource.

[jira] [Updated] (YARN-10723) Change CS nodes page in UI to support custom resource.

[jira] [Comment Edited] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

[jira] [Comment Edited] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

[jira] [Commented] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

[jira] [Updated] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

[jira] [Updated] (YARN-10744) add more doc for yarn.federation.policy-manager-params

[jira] [Created] (YARN-10744) add more doc for yarn.federation.policy-manager-params

[jira] [Updated] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

[jira] [Commented] (YARN-7769) FS QueueManager should not create default queue at init

[jira] [Commented] (YARN-9586) [QA] Need more doc for yarn.federation.policy-manager-params when LoadBasedRouterPolicy is used

[jira] [Commented] (YARN-9586) [QA] Need more doc for yarn.federation.policy-manager-params when LoadBasedRouterPolicy is used

[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

[jira] [Commented] (YARN-10715) Remove hardcoded resource values (e.g. GPU/FPGA) in code.

[jira] [Commented] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

[jira] [Updated] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

[jira] [Comment Edited] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

[jira] [Updated] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

[jira] [Commented] (YARN-10723) Change CS nodes page in UI to support custom resource.

[jira] [Comment Edited] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

[jira] [Comment Edited] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

[jira] [Commented] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

[jira] [Commented] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

[jira] [Commented] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

[jira] [Updated] (YARN-10743) Add a policy for not aggregating for containers which are killed because exceeding container log size limit.

[jira] [Created] (YARN-10743) Add a policy for not aggregating for container log size limit killed container.

[jira] [Updated] (YARN-9906) When setting multi volumes throurh the "YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS" setting is not valid

[jira] [Commented] (YARN-10739) GenericEventHandler.printEventQueueDetails cause RM recovery cost too much time

34 matches

Site Navigation

Mail list logo

Footer information