[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

Weiwei Yang (JIRA) Wed, 10 Oct 2018 04:30:52 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16644849#comment-16644849
 ]


Weiwei Yang commented on YARN-8468:
-----------------------------------

Run the jenkins again, it still fails like following
{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
project hadoop-yarn-server-resourcemanager: There was a timeout or other error 
in the fork
{noformat}
According to mvn [shutdown 
doc|https://maven.apache.org/surefire/maven-failsafe-plugin/examples/shutdown.html]

{color:#205081}*After the test-set has completed*, the process executes 
{{java.lang.System.exit(0)}} which starts shutdown hooks. At this point the 
process may run next 30 seconds until all non daemon Threads die. After the 
period of time has elapsed, the process kills itself by 
{{java.lang.Runtime.halt(0)}}. The timeout of 30 seconds can be customized by 
configuration parameter {{forkedProcessExitTimeoutInSeconds}}.{color}

and it looks like somehow the shutdown did not finish in 900 seconds,

{color:#205081}Using parameter *forkedProcessTimeoutInSeconds* forked JVMs are 
killed separately after every individual process elapsed certain amount of time 
and the whole plugin fails with the error message:{color}

*{color:#205081}There was a timeout or other error in the fork{color}*

So it seems
 # All UTs (RM module) have run, and there is no failure
 # Shutdown failed on waiting forked JVM to terminate in 900 seconds (defined 
in hadoop-project/pom.xml)

it's unclear why #2 happens ... I think we need to revert this from branch-3.1 
and start over.. I'll do the revert in a few hours.

Sorry for inconvenience. 

> Enable the use of queue based maximum container allocation limit and 
> implement it in FairScheduler
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8468
>                 URL: https://issues.apache.org/jira/browse/YARN-8468
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: fairscheduler, scheduler
>    Affects Versions: 3.1.0
>            Reporter: Antal Bálint Steinbach
>            Assignee: Antal Bálint Steinbach
>            Priority: Critical
>             Fix For: 3.2.0, 3.1.2
>
>         Attachments: YARN-8468-branch-3.1.018.patch, 
> YARN-8468-branch-3.1.019.patch, YARN-8468-branch-3.1.020.patch, 
> YARN-8468.000.patch, YARN-8468.001.patch, YARN-8468.002.patch, 
> YARN-8468.003.patch, YARN-8468.004.patch, YARN-8468.005.patch, 
> YARN-8468.006.patch, YARN-8468.007.patch, YARN-8468.008.patch, 
> YARN-8468.009.patch, YARN-8468.010.patch, YARN-8468.011.patch, 
> YARN-8468.012.patch, YARN-8468.013.patch, YARN-8468.014.patch, 
> YARN-8468.015.patch, YARN-8468.016.patch, YARN-8468.017.patch, 
> YARN-8468.018.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
> The goal of this ticket is to allow this value to be set on a per queue basis.
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
> Suggested solution:
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability(String queueName) in both 
> FSParentQueue and FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * Enforce the use of queue based maximum allocation limit if it is 
> available, if not use the general scheduler level setting
>  ** Use it during validation and normalization of requests in 
> scheduler.allocate, app submit and resource request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

Reply via email to