date:20171129

[jira] [Created] (YARN-7585) NodeManager should go unhealthy when state store throws DBException

2017-11-29 Thread Wilfred Spiegelenburg (JIRA)

Wilfred Spiegelenburg created YARN-7585:
---

 Summary: NodeManager should go unhealthy when state store throws 
DBException 
 Key: YARN-7585
 URL: https://issues.apache.org/jira/browse/YARN-7585
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


If work preserving recover is enabled the NM will not start up if the state 
store does not initialise. However if the state store becomes unavailable after 
that for any reason the NM will not go unhealthy. 
Since the state store is not available new containers can not be started any 
more and the NM should become unhealthy:
{code}
AMLauncher: Error launching appattempt_1508806289867_268617_01. Got 
exception: org.apache.hadoop.yarn.exceptions.YarnException: 
java.io.IOException: org.iq80.leveldb.DBException: IO error: 
/dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: 
Read-only file system
at o.a.h.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
at 
o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:721)
...
Caused by: java.io.IOException: org.iq80.leveldb.DBException: IO error: 
/dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: 
Read-only file system
at 
o.a.h.y.s.n.r.NMLeveldbStateStoreService.storeApplication(NMLeveldbStateStoreService.java:374)
at 
o.a.h.y.s.n.cm.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:848)
at 
o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:712)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

2017-11-29 Thread Carlo Aldo Curino

I haven't tested this, but I support the merge as the patch is very much
needed for MS usecases as well... Can this be cherry-picked on 2.9 easily?

Thanks for this contribution!

Cheers,
Carlo

On Nov 29, 2017 6:34 PM, "Weiwei Yang"  wrote:

> Hi Sunil
>
> +1 from my side.
> Actually we have applied some of these patches to our production cluster
> since Sep this year, on over 2000+ nodes and it works nicely. +1 for the
> merge. I am pretty sure this feature will help a lot of users, especially
> those on cloud. Thanks for getting this done, great job!
>
> --
> Weiwei
>
> On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S <
> rohithsharm...@apache.org>, wrote:
> +1, thanks Sunil for working on this feature!
>
> -Rohith Sharma K S
>
> On 24 November 2017 at 23:19, Sunil G  wrote:
>
> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
> - YARN-6471. Support to add min/max resource configuration for a queue
> - YARN-7332. Compute effectiveCapacity per each resource vector
> - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
> handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* 
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
> https://issues.apache.org/jira/secure/attachment/
> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
> Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>
>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

2017-11-29 Thread Weiwei Yang

Hi Sunil

+1 from my side.
Actually we have applied some of these patches to our production cluster since 
Sep this year, on over 2000+ nodes and it works nicely. +1 for the merge. I am 
pretty sure this feature will help a lot of users, especially those on cloud. 
Thanks for getting this done, great job!

--
Weiwei

On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S , 
wrote:
+1, thanks Sunil for working on this feature!

-Rohith Sharma K S

On 24 November 2017 at 23:19, Sunil G  wrote:

Hi All,

We would like to bring up the discussion of merging “absolute min/max
resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
in a few weeks. The goal is to get it in for Hadoop 3.1.

*Major work happened in this branch*

- YARN-6471. Support to add min/max resource configuration for a queue
- YARN-7332. Compute effectiveCapacity per each resource vector
- YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
handle absolute resources.

*Regarding design details*

Please refer [1] for detailed design document.

*Regarding to testing:*

We did extensive tests for the feature in the last couple of months.
Comparing to latest trunk.

- For SLS benchmark: We didn't see observable performance gap from
simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
containers allocated per second.

- For microbenchmark: We use performance test cases added by YARN 6775, it
did not show much performance regression comparing to trunk.

*YARN-5881*

[VOTE] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

2017-11-29 Thread Sunil G

Hi All,

Based on the discussion at [1], I'd like to start a vote to merge feature
branch

YARN-5881 to trunk. Vote will run for 7 days, ending Wednesday Dec 6 at
6:00PM PDT.

This branch adds support to configure queue capacity as absolute resource in

capacity scheduler. This will help admins who want fine control of
resources of queues.

Feature development is done at YARN-5881 [2], jenkins build is here
(YARN-7510 [3]).

All required tasks for this feature are committed. This feature changes
RM’s Capacity Scheduler only,

and we did extensive tests for the feature in the last couple of months
including performance tests.

Key points:

- The feature is turned off by default, and have to configure absolute
resource to enable same.

- Detailed documentation about how to use this feature is done as part of
[4].

- No major performance degradation is observed with this branch work. SLS
and UT performance

tests are done.

There were 11 subtasks completed for this feature.

Huge thanks to everyone who helped with reviews, commits, guidance, and

technical discussion/design, including Wangda Tan, Vinod Vavilapalli,
Rohith Sharma K S, Eric Payne .

[1] :
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201711.mbox/%3CCACYiTuhKhF1JCtR7ZFuZSEKQ4sBvN_n_tV5GHsbJ3YeyJP%2BP4Q%40mail.gmail.com%3E

[2] : https://issues.apache.org/jira/browse/YARN-5881

[3] : https://issues.apache.org/jira/browse/YARN-7510

[4] : https://issues.apache.org/jira/browse/YARN-7533

Regards

Sunil and Wangda

[jira] [Resolved] (YARN-7509) AsyncScheduleThread and ResourceCommitterService are still running after RM is transitioned to standby

2017-11-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-7509.
--
   Resolution: Fixed
Fix Version/s: (was: 3.0.1)
   3.0.0

> AsyncScheduleThread and ResourceCommitterService are still running after RM 
> is transitioned to standby
> --
>
> Key: YARN-7509
> URL: https://issues.apache.org/jira/browse/YARN-7509
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4, 2.9.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.0.0, 3.1.0, 2.9.1
>
> Attachments: YARN-7509.001.patch
>
>
> After RM is transitioned to standby, AsyncScheduleThread and 
> ResourceCommitterService will receive interrupt signal. When thread is 
> sleeping, it will ignore the interrupt signal since InterruptedException is 
> catched inside and the interrupt signal is cleared.
> For AsyncScheduleThread, InterruptedException was catched and ignored in  
> CapacityScheduler#schedule.
> For ResourceCommitterService, InterruptedException was catched inside and 
> ignored in ResourceCommitterService#run. 
> We should let the interrupt signal out and make these threads exit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7584) Support resource profiles in native services

2017-11-29 Thread Jonathan Hung (JIRA)

Jonathan Hung created YARN-7584:
---

 Summary: Support resource profiles in native services
 Key: YARN-7584
 URL: https://issues.apache.org/jira/browse/YARN-7584
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung


Currently resource profiles does not appear to be supported: {noformat}// 
Currently resource profile is not supported yet, so we will raise
// validation error if only resource profile is specified
if (StringUtils.isNotEmpty(resource.getProfile())) {
  throw new IllegalArgumentException(
  RestApiErrorMessages.ERROR_RESOURCE_PROFILE_NOT_SUPPORTED_YET);
}{noformat}

Also attempting to specify profiles in the service spec throws an exception 
since cpu default value is 1:
{noformat}Exception in thread "main" java.lang.IllegalArgumentException: Cannot 
specify cpus/memory along with profile for component ps
at 
org.apache.hadoop.yarn.service.utils.ServiceApiUtil.validateServiceResource(ServiceApiUtil.java:278)
at 
org.apache.hadoop.yarn.service.utils.ServiceApiUtil.validateComponent(ServiceApiUtil.java:201)
at 
org.apache.hadoop.yarn.service.utils.ServiceApiUtil.validateAndResolveService(ServiceApiUtil.java:174)
at 
org.apache.hadoop.yarn.service.client.ServiceClient.actionCreate(ServiceClient.java:214)
at 
org.apache.hadoop.yarn.service.client.ServiceClient.actionLaunch(ServiceClient.java:205)
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:447)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:111){noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7583) Reduce overhead of container reacquisition

2017-11-29 Thread Jason Lowe (JIRA)

Jason Lowe created YARN-7583:


 Summary: Reduce overhead of container reacquisition
 Key: YARN-7583
 URL: https://issues.apache.org/jira/browse/YARN-7583
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Jason Lowe


When reacquiring containers after a nodemanager restart the Linux container 
executor invokes the container executor to essentially kill -0 the process to 
check if it is alive.  It would be a lot cheaper on Linux to stat the 
/proc/ directory which the nodemanager can do directly rather than pay for 
the fork-and-exec through the container executor and potential signal 
permission issues.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7582) Yarn Services - restore descriptive exception types

2017-11-29 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created YARN-7582:
--

 Summary: Yarn Services - restore descriptive exception types
 Key: YARN-7582
 URL: https://issues.apache.org/jira/browse/YARN-7582
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sergey Shelukhin


Slider used to throw descriptive exceptions like UnknownApp, etc. from various 
commands (e.g. destroy). It looks like YARN Services throw generic exceptions 
from these (see the review in HIVE-18037). 
It would be good to restore the exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7581) ATSv2 does not construct HBase filters correctly in HBase 2.0

2017-11-29 Thread Haibo Chen (JIRA)

Haibo Chen created YARN-7581:


 Summary: ATSv2 does not construct HBase filters correctly in HBase 
2.0
 Key: YARN-7581
 URL: https://issues.apache.org/jira/browse/YARN-7581
 Project: Hadoop YARN
  Issue Type: Bug
  Components: ATSv2
Affects Versions: 3.0.0-beta1
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

2017-11-29 Thread Rohith Sharma K S

+1, thanks Sunil for working on this feature!

-Rohith Sharma K S

On 24 November 2017 at 23:19, Sunil G  wrote:

> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
>- YARN-6471. Support to add min/max resource configuration for a queue
>- YARN-7332. Compute effectiveCapacity per each resource vector
>- YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* 
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
> https://issues.apache.org/jira/secure/attachment/
> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
> Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>

Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-11-29 Thread Apache Jenkins Server

For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/54/

[Nov 27, 2017 10:52:18 PM] (kihwal) HDFS-12754. Lease renewal can hit a 
deadlock. Contributed by Kuhu
[Nov 27, 2017 10:54:27 PM] (yufei) YARN-7363. ContainerLocalizer don't have a 
valid log4j config in case of
[Nov 28, 2017 5:42:41 AM] (yqlin) HDFS-12858. RBF: Add router admin commands 
usage in HDFS commands
[Nov 28, 2017 11:57:51 AM] (stevel) HADOOP-15042. Azure 
PageBlobInputStream.skip() can return negative value




-1 overall


The following subsystems voted -1:
asflicense unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Unreaped Processes :

   hadoop-common:1 
   hadoop-hdfs:12 
   bkjournal:5 
   hadoop-yarn-server-nodemanager:1 
   hadoop-yarn-server-timelineservice:1 
   hadoop-yarn-client:8 
   hadoop-yarn-applications-distributedshell:1 
   hadoop-mapreduce-client-app:1 
   hadoop-mapreduce-client-jobclient:15 
   hadoop-distcp:4 
   hadoop-extras:1 
   hadoop-sls:1 

Failed junit tests :

   hadoop.crypto.key.kms.server.TestKMS 
   hadoop.yarn.server.nodemanager.webapp.TestNMWebServer 
   
hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy 
   hadoop.yarn.server.TestDiskFailures 
   hadoop.mapreduce.security.TestUmbilicalProtocolWithJobToken 
   hadoop.mapred.TestReduceFetch 
   hadoop.fs.slive.TestSlive 
   hadoop.mapred.TestLazyOutput 
   hadoop.fs.TestFileSystem 
   hadoop.conf.TestNoDefaultsJobConf 
   hadoop.fs.TestDFSIO 
   hadoop.mapred.TestJobSysDirWithDFS 
   hadoop.tools.TestDistCpSystem 
   hadoop.tools.TestIntegration 
   hadoop.tools.TestDistCpViewFs 
   hadoop.yarn.sls.appmaster.TestAMSimulator 
   hadoop.yarn.sls.TestReservationSystemInvariants 
   hadoop.resourceestimator.solver.impl.TestLpSolver 
   hadoop.resourceestimator.service.TestResourceEstimatorService 

Timed out junit tests :

   org.apache.hadoop.log.TestLogLevel 
   org.apache.hadoop.hdfs.TestLeaseRecovery2 
   org.apache.hadoop.hdfs.TestRead 
   org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream 
   org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade 
   org.apache.hadoop.hdfs.TestReadWhileWriting 
   org.apache.hadoop.hdfs.TestDFSMkdirs 
   org.apache.hadoop.hdfs.TestDFSOutputStream 
   org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs 
   org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs 
   org.apache.hadoop.hdfs.TestDistributedFileSystem 
   org.apache.hadoop.hdfs.TestReplaceDatanodeFailureReplication 
   org.apache.hadoop.hdfs.TestDFSShell 
   org.apache.hadoop.contrib.bkjournal.TestBootstrapStandbyWithBKJM 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperJournalManager 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperAsHASharedDir 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperSpeculativeRead 
   org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater 
   
org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServices
 
   org.apache.hadoop.yarn.client.TestRMFailover 
   org.apache.hadoop.yarn.client.cli.TestYarnCLI 
   org.apache.hadoop.yarn.client.TestApplicationMasterServiceProtocolOnHA 
   org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA 
   org.apache.hadoop.yarn.client.api.impl.TestYarnClientWithReservation 
   org.apache.hadoop.yarn.client.api.impl.TestYarnClient 
   org.apache.hadoop.yarn.client.api.impl.TestAMRMClient 
   org.apache.hadoop.yarn.client.api.impl.TestNMClient 
   
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell 
   org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup 
   org.apache.hadoop.mapred.lib.TestDelegatingInputFormat 
   org.apache.hadoop.mapred.TestClusterMRNotification 
   org.apache.hadoop.mapred.TestMiniMRClasspath 
   org.apache.hadoop.mapred.TestMRCJCFileInputFormat 
   org.apache.hadoop.mapred.TestClusterMapReduceTestCase 
   org.apache.hadoop.mapred.TestMRIntermediateDataEncryption 
   org.apache.hadoop.mapred.TestMRTimelineEventHandling 
   org.apache.hadoop.mapred.join.TestDatamerge 
   org.apache.hadoop.mapred.TestJobName 
   org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers 
   org.apache.hadoop.mapred.TestNetworkedJob 
   org.apache.hadoop.mapred.TestReduceFetchFromPartialMem 
   org.apache.hadoop.mapred.TestMROpportunisticMaps 
   org.apache.hadoop.mapred.TestMerge

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

2017-11-29 Thread Sunil G

Thanks Eric. Appreciate the support in verifying the feature.
YARN-7575 is closed now.

- Sunil


On Tue, Nov 28, 2017 at 11:15 PM Eric Payne
 wrote:

> Thanks Sunil for the great work on this feature.
> I looked through the design document, reviewed the code, and tested out
> branch YARN-5881. The design makes sense and the code looks like it is
> implementing the desing in a sensible way. However, I have encountered a
> couple of bugs. I opened https://issues.apache.org/jira/browse/YARN-7575
> to track my findings. Basically, here's a summary:
>
> The design document from YARN-5881 says that for max-capacity:
> 3)  For each queue, we require: a) if max-resource not set, it
> automatically set to parent.max-resource
>
> When I try not setting
> anyyarn.scheduler.capacity..maximum-capacity, the RMUI
> scheduler page refuses to render. It looks like it's in
> CapacitySchedulerPage$LeafQueueInfoBlock.
>
> Also... A job will run in the leaf queue with no max capacity set and it
> will grow to the max capacity of the cluster, but if I add resources to the
> node, the job won't grow any more even though it has pending resources.
>
> Thanks,Eric
>
>
>   From: Sunil G 
>  To: "yarn-dev@hadoop.apache.org" ; Hadoop
> Common ; Hdfs-dev <
> hdfs-...@hadoop.apache.org>; "mapreduce-...@hadoop.apache.org" <
> mapreduce-...@hadoop.apache.org>
>  Sent: Friday, November 24, 2017 11:49 AM
>  Subject: [DISCUSS] Merge Absolute resource configuration support in
> Capacity Scheduler (YARN-5881) to trunk
>
> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
>   - YARN-6471. Support to add min/max resource configuration for a queue
>   - YARN-7332. Compute effectiveCapacity per each resource vector
>   - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>   handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* 
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
>
> https://issues.apache.org/jira/secure/attachment/12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>
>

[jira] [Created] (YARN-7585) NodeManager should go unhealthy when state store throws DBException

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

[VOTE] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

[jira] [Resolved] (YARN-7509) AsyncScheduleThread and ResourceCommitterService are still running after RM is transitioned to standby

[jira] [Created] (YARN-7584) Support resource profiles in native services

[jira] [Created] (YARN-7583) Reduce overhead of container reacquisition

[jira] [Created] (YARN-7582) Yarn Services - restore descriptive exception types

[jira] [Created] (YARN-7581) ATSv2 does not construct HBase filters correctly in HBase 2.0

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

12 matches

Site Navigation

Mail list logo

Footer information