from:"Daniel Templeton"

Daniel Templeton created YARN-7135:
--

 Summary: Clean up lock-try order in common scheduler code
 Key: YARN-7135
 URL: https://issues.apache.org/jira/browse/YARN-7135
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


There are many places that follow the pattern:{code}try {
  lock.lock();
  ...
} finally {
  lock.unlock();
}{code}

There are a couple of reasons that's a bad idea.  The correct pattern 
is:{code}lock.lock();
try {
  ...
} finally {
  lock.unlock();
}{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7134) AppSchedulingInfo has a dependency on capacity scheduler

Daniel Templeton created YARN-7134:
--

 Summary: AppSchedulingInfo has a dependency on capacity scheduler
 Key: YARN-7134
 URL: https://issues.apache.org/jira/browse/YARN-7134
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Blocker


The common scheduling code should be independent of all scheduler 
implementations.  YARN-6040 introduced capacity scheduler's {{SchedulingMode}} 
into {{AppSchedulingInfo}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7133) Clean up lock-try order fair scheduler

Daniel Templeton created YARN-7133:
--

 Summary: Clean up lock-try order fair scheduler
 Key: YARN-7133
 URL: https://issues.apache.org/jira/browse/YARN-7133
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


There are many places that follow the pattern:{code}try {
  lock.lock();
  ...
} finally {
  lock.unlock();
}{code}

There are a couple of reasons that's a bad idea.  The correct pattern 
is:{code}lock.lock();
try {
  ...
} finally {
  lock.unlock();
}{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7132) FairScheduler.initScheduler() contains a surprising unary plus

Daniel Templeton created YARN-7132:
--

 Summary: FairScheduler.initScheduler() contains a surprising unary 
plus
 Key: YARN-7132
 URL: https://issues.apache.org/jira/browse/YARN-7132
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Minor


The method contains the following code:{code}
LOG.warn(FairSchedulerConfiguration.UPDATE_INTERVAL_MS
+ " is invalid, so using default value "
+ +FairSchedulerConfiguration.DEFAULT_UPDATE_INTERVAL_MS
+ " ms instead");{code}

Note the beginning of the third line.  One of those plusses should be deleted 
so that no one else spends cycles trying to understand why it even compiles.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7123) FairScheduler.getResourceCalculator() returns an instance of DefaultResourceCalculator regardless of the configuration

2017-08-29 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-7123:
--

 Summary: FairScheduler.getResourceCalculator() returns an instance 
of DefaultResourceCalculator regardless of the configuration
 Key: YARN-7123
 URL: https://issues.apache.org/jira/browse/YARN-7123
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


There are several places where this creates the wrong behavior:

* 298:RMServerUtils.java
* 1081:AbstractYarnScheduler.java
* 1197:FSAppAttempt.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7121) FSAppAttempt's delayed scheduling should be factored out into the common scheduling code

2017-08-29 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-7121:
--

 Summary: FSAppAttempt's delayed scheduling should be factored out 
into the common scheduling code
 Key: YARN-7121
 URL: https://issues.apache.org/jira/browse/YARN-7121
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


Per [~leftnoteasy]'s comment:{code}// TODO (wandga): All logics in this method 
should be added to
// SchedulerPlacement#canDelayTo which is independent from scheduler.
// Scheduler can choose to use various/pluggable delay-scheduling
// implementation.{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7119) yarn rmadmin -updateNodeResource should be updated for resource types

2017-08-29 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-7119:
--

 Summary: yarn rmadmin -updateNodeResource should be updated for 
resource types
 Key: YARN-7119
 URL: https://issues.apache.org/jira/browse/YARN-7119
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-3926
Reporter: Daniel Templeton






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [VOTE] Merge YARN-3926 (resource profile) to trunk

2017-08-26 Thread Daniel Templeton

Quick question, Wangda.  When you say that the feature can be turned 
off, do you mean resource types or resource profiles?  I know there's an 
off-by-default property that governs resource profiles, but I didn't see 
any way to turn off resource types.  Even if only CPU and memory are 
configured, i.e. no additional resource types, the code path is 
different than it was.  Specifically, where CPU and memory were 
primitives before, they're now entries in an array whose indexes have to 
be looked up through the ResourceUtils class.  Did I miss something?


For those who haven't followed the feature closely, there are really two 
features here.  Resource types allows for declarative extension of the 
resource system in YARN.  Resource profiles builds on top of resource 
types to allow a user to request a group of resources as a profile, much 
like EC2 instance types, e.g. "fast-compute" might mean 32GB RAM, 8 
vcores, and 2 GPUs.


Daniel

On 8/23/17 11:49 AM, Wangda Tan wrote:

  Hi folks,

Per earlier discussion [1], I'd like to start a formal vote to merge
feature branch YARN-3926 (Resource profile) to trunk. The vote will run for
7 days and will end August 30 10:00 AM PDT.

Briefly, YARN-3926 can extend resource model of YARN to support resource
types other than CPU and memory, so it will be a cornerstone of features
like GPU support (YARN-6223), disk scheduling/isolation (YARN-2139), FPGA
support (YARN-5983), network IO scheduling/isolation (YARN-2140). In
addition to that, YARN-3926 allows admin to preconfigure resource profiles
in the cluster, for example, m3.large means <2 vcores, 8 GB memory, 64 GB
disk>, so applications can request "m3.large" profile instead of specifying
all resource types’s values.

There are 32 subtasks that were completed as part of this effort.

This feature needs to be explicitly turned on before use. We paid close
attention to compatibility, performance, and scalability of this feature,
mentioned in [1], we didn't see observable performance regression in large
scale SLS (scheduler load simulator) executions and saw less than 5%
performance regression by using micro benchmark added by YARN-6775.

This feature works from end-to-end (including UI/CLI/application/server),
we have setup a cluster with this feature turned on runs for several weeks,
we didn't see any issues by far.

Merge JIRA: YARN-7013 (Jenkins gave +1 already).
Documentation: YARN-7056

Special thanks to a team of folks who worked hard and contributed towards
this effort including design discussion/development/reviews, etc.: Varun
Vasudev, Sunil Govind, Daniel Templeton, Vinod Vavilapalli, Yufei Gu,
Karthik Kambatla, Jason Lowe, Arun Suresh.

Regards,
Wangda Tan

[1]
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201708.mbox/%3CCAD%2B%2BeCnjEHU%3D-M33QdjnND0ZL73eKwxRua4%3DBbp4G8inQZmaMg%40mail.gmail.com%3E




-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7085) Application.schedule() and Application.assign() appear to only be used in test code

2017-08-23 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-7085:
--

 Summary: Application.schedule() and Application.assign() appear to 
only be used in test code
 Key: YARN-7085
 URL: https://issues.apache.org/jira/browse/YARN-7085
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


That's a pretty big chunk of code to be purely for tests.  I haven't looked at 
it closely enough yet to tell if the code is there to support the tests, or if 
the tests are just testing dead code.  Either way, we should remove the code 
from {{Application}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7042) Clean up unit tests after YARN-6610

2017-08-17 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-7042:
--

 Summary: Clean up unit tests after YARN-6610
 Key: YARN-7042
 URL: https://issues.apache.org/jira/browse/YARN-7042
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: test
Affects Versions: YARN-3926
Reporter: Daniel Templeton
Assignee: Daniel Templeton


Some of the unit tests in YARN-6610 weren't quite testing what they were 
supposed to be testing.  This patch fixes that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7026) Fair scheduler docs should explain what happens when no placement rules are specified

2017-08-16 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-7026:
--

 Summary: Fair scheduler docs should explain what happens when no 
placement rules are specified
 Key: YARN-7026
 URL: https://issues.apache.org/jira/browse/YARN-7026
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: docs
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-7002) branch-2 build is broken by AllocationFileLoaderService.java

2017-08-11 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved YARN-7002.

  Resolution: Fixed
Hadoop Flags: Reviewed

I reverted the offending commit.

> branch-2 build is broken by AllocationFileLoaderService.java
> 
>
> Key: YARN-7002
> URL: https://issues.apache.org/jira/browse/YARN-7002
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0
>Reporter: John Zhuge
>    Assignee: Daniel Templeton
>
> branch-2 build is broken:
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
> on project hadoop-yarn-server-resourcemanager: Compilation failure
> [ERROR] 
> /Users/jzhuge/hadoop-commit/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java:[270,39]
>  incompatible types: java.util.HashSet cannot be converted 
> to java.util.Set
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6995) Improve use of ResourceNotFoundException in resource types code

2017-08-11 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-6995:
--

 Summary: Improve use of ResourceNotFoundException in resource 
types code
 Key: YARN-6995
 URL: https://issues.apache.org/jira/browse/YARN-6995
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Minor


Now that all the YarnExceptions have been replaced with 
ResourceNotFoundExceptions, we should make the ResourceNotFoundExceptions as 
useful as possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6994) Remove last uses of Long from resource types code

2017-08-11 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-6994:
--

 Summary: Remove last uses of Long from resource types code
 Key: YARN-6994
 URL: https://issues.apache.org/jira/browse/YARN-6994
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Minor


Most of the uses have been removed over the last few patches.  There's only one 
left that I see.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6986) Fair scheduler should add a SchedulerMetrics class

2017-08-10 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-6986:
--

 Summary: Fair scheduler should add a SchedulerMetrics class
 Key: YARN-6986
 URL: https://issues.apache.org/jira/browse/YARN-6986
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


Especially now that ATSv2 is almost here, it would be very helpful if the fair 
scheduler offered scheduler metrics, like # pending requests, # running 
containers, # preemptions, size of the event queue, etc.

Currently I see cluster metrics, queue metrics, and scheduler operation 
duration metrics, but no top-level scheduler metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6985) The wrapper methods in Resources aren't useful

2017-08-10 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-6985:
--

 Summary: The wrapper methods in Resources aren't useful
 Key: YARN-6985
 URL: https://issues.apache.org/jira/browse/YARN-6985
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


The code would be shorter, easier to read, and a tiny smidgeon faster if we 
just called the {{ResourceCalculator}} methods directly.  I don't see where the 
wrappers improve the code in any way.

For example, with wrappers:{code}Resource normalized = Resources.normalize(
resourceCalculator, ask, minimumResource,
maximumResource, incrementResource);
{code} and without wrappers:{code}Resource normalized = 
resourceCalculator.normalize(ask, minimumResource,
maximumResource, incrementResource);{code}

The difference isn't huge, but I find the latter much more readable.  With the 
former I always have to figure out which parameters are which, because passing 
in the {{ResourceCalculator}} adds in an unrelated additional parameter at the 
head of the list.

There may be some cases where the wrapper methods are mixed in with calls to 
legitimate {{Resources}} methods, making the code more consistent to use the 
wrappers. In those cases, that may be a reason to keep and use the wrapper 
method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6984) DominantResourceCalculator.isAnyMajorResourceZero() should test all resources

2017-08-10 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-6984:
--

 Summary: DominantResourceCalculator.isAnyMajorResourceZero() 
should test all resources
 Key: YARN-6984
 URL: https://issues.apache.org/jira/browse/YARN-6984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: YARN-3926
Reporter: Daniel Templeton


The method currently tests only memory and CPU.  It looks to me like it should 
test all resources, i.e. it should do what {{isInvalidDivisor()}} does and 
should, in fact, replace that method.  [~sunilg], since you wrote the method 
originally, can you comment on what its intended semantics are?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6964) Fair scheduler misuses Resources operations

2017-08-07 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-6964:
--

 Summary: Fair scheduler misuses Resources operations
 Key: YARN-6964
 URL: https://issues.apache.org/jira/browse/YARN-6964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Assignee: Daniel Templeton


There are several places where YARN uses the {{Resources}} class to do 
comparisons of {{Resource}} instances incorrectly.  This patch corrects those 
mistakes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6953) Clean up ResourceUtils.setMinimumAllocationForMandatoryResources() and setMaximumAllocationForMandatoryResources()

2017-08-04 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-6953:
--

 Summary: Clean up 
ResourceUtils.setMinimumAllocationForMandatoryResources() and 
setMaximumAllocationForMandatoryResources()
 Key: YARN-6953
 URL: https://issues.apache.org/jira/browse/YARN-6953
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton
Priority: Minor


The {{setMinimumAllocationForMandatoryResources()}} and 
{{setMaximumAllocationForMandatoryResources()}} methods are quite convoluted.  
They'd be much simpler if they just handled CPU and memory manually instead of 
trying to be clever about doing it in a loop.  There are also issues, such as 
the log warning always talking about memory or the last element of the inner 
array being a copy of the first element.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-6934) ResourceUtils.checkMandatoryResources() should also ensure that no min or max is set for vcores or memory

2017-08-04 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved YARN-6934.

Resolution: Invalid

> ResourceUtils.checkMandatoryResources() should also ensure that no min or max 
> is set for vcores or memory
> -
>
> Key: YARN-6934
> URL: https://issues.apache.org/jira/browse/YARN-6934
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>    Reporter: Daniel Templeton
>  Labels: newbie++
> Attachments: YARN-6934.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6935) ResourceProfilesManagerImpl.parseResource() has no need of the key parameter

2017-08-02 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-6935:
--

 Summary: ResourceProfilesManagerImpl.parseResource() has no need 
of the key parameter
 Key: YARN-6935
 URL: https://issues.apache.org/jira/browse/YARN-6935
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton


The {{key}} parameter is the name of the resource profile being parsed, which 
is irrelevant to parsing the {{value}} as a {{Resource}} and hence is unused.  
It should be removed, and {{value}} should be renamed to something more 
descriptive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6934) ResourceUtils.checkMandatoryResources() should also ensure that no min or max is set for vcores or memory

2017-08-02 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-6934:
--

 Summary: ResourceUtils.checkMandatoryResources() should also 
ensure that no min or max is set for vcores or memory
 Key: YARN-6934
 URL: https://issues.apache.org/jira/browse/YARN-6934
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6933) ResourceUtils.DISALLOWED_NAMES and ResourceUtils.checkMandatoryResources() are duplicating work

2017-08-02 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-6933:
--

 Summary: ResourceUtils.DISALLOWED_NAMES and 
ResourceUtils.checkMandatoryResources() are duplicating work
 Key: YARN-6933
 URL: https://issues.apache.org/jira/browse/YARN-6933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton


Both are used to check that the mandatory resources were not redefined.  Only 
one check is needed.  I would recommend dropping {{DISALLOWED_RESOURCES}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6927) Add support for individual resource types requests in MapReduce

2017-08-01 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-6927:
--

 Summary: Add support for individual resource types requests in 
MapReduce
 Key: YARN-6927
 URL: https://issues.apache.org/jira/browse/YARN-6927
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton


YARN-6504 adds support for resource profiles in MapReduce jobs, but resource 
profiles don't give users much flexibility in their resource requests.  To 
satisfy users' needs, MapReduce should also allow users to specify arbitrary 
resource requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [Vote] merge feature branch YARN-2915 (Federation) to trunk

2017-08-01 Thread Daniel Templeton


Thanks, Subru!  Carry on. :)

Daniel

On 8/1/17 1:42 PM, Subru Krishnan wrote:

Hi Daniel,

You were just on time, myself & Carlo were just talking about moving
forward with the merge :).

To answer your questions:

1. The expectation about the store is that user will have a database set
up (we only link to install instructions page) but we do have the scripts
for the schema and stored procedures. This is in fact called out in the doc
in the *State Store* section (just before *Running a Sample Job).
*Additionally
we are working on a ZK based implementation of the store. Inigo has patch
in YARN-6900[1].
2. We rely on existing YARN/Hadoop security mechanisms for running
application on Federation as-is so you should not need any additional
Kerberos configuration. Disclaimer: we don't use Kerberos for securing
Hadoop but rely on our production infrastructure.

Thanks,
Subru

[1] https://issues.apache.org/jira/browse/YARN-6900

On Tue, Aug 1, 2017 at 1:25 PM, Daniel Templeton 
wrote:


Subru, sorry for the last minute contribution... :)  I've been looking at
the branch, and I have two questions.

First, what's the out-of-box experience regarding the data store? Is the
expectation that the user will have a database set up and ready to go?
Will the state store set up the schema automatically, or is that on the
user?  I don't see that in the docs.

Second, how well does federation play with Kerberos?  Anything special
that needs to be configured to make it work?

Daniel

On 7/25/17 8:24 PM, Subru Krishnan wrote:


Hi all,

Per earlier discussion [9], I'd like to start a formal vote to merge
feature YARN Federation (YARN-2915) [1] to trunk. The vote will run for 7
days, and will end Aug 1 7PM PDT.

We have been developing the feature in a branch (YARN-2915 [2]) for a
while, and we are reasonably confident that the state of the feature meets
the criteria to be merged onto trunk.

*Key Ideas*:

YARN’s centralized design allows strict enforcement of scheduling
invariants and effective resource sharing, but becomes a scalability
bottleneck (in number of jobs and nodes) well before reaching the scale of
our clusters (e.g., 20k-50k nodes).


To address these limitations, we developed a scale-out, federation-based
solution (YARN-2915). Our architecture scales near-linearly to datacenter
sized clusters, by partitioning nodes across multiple sub-clusters (each
running a YARN cluster of few thousands nodes). Applications can span
multiple sub-clusters *transparently (i.e. no code change or recompilation
of existing apps)*, thanks to a layer of indirection that negotiates with
multiple sub-clusters' Resource Managers on behalf of the application.


This design is structurally scalable, as it bounds the number of nodes
each
RM is responsible for. Appropriate policies ensure that the majority of
applications reside within a single sub-cluster, thus further controlling
the load on each RM. This provides near linear scale-out by simply adding
more sub-clusters. The same mechanism enables pooling of resources from
clusters owned and operated by different teams.

Status:

 - The version we would like to merge to trunk is termed "MVP" (minimal
 viable product). The feature will have a complete end-to-end
application
 execution flow with the ability to span a single application across
 multiple YARN (sub) clusters.
 - There were 50+ sub-tasks that were that were completed as part of
this
 effort. Every patch has been reviewed and +1ed by a committer. Thanks
to
 Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
 - Federation is designed to be built around YARN and consequently has
 minimal code changes to core YARN. The relevant JIRAs that modify
existing
 YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
 attention to ensure that if federation is disabled there is zero
impact to
 existing functionality (disabled by default).
 - We found a few bugs as we went along which we fixed directly
upstream
 in trunk and/or branch-2.
 - We have continuously rebasing the feature branch [2] so the merge
 should be a straightforward cherry-pick.
 - The current version has been rather thoroughly tested and is
currently
 deployed in a *10,000+ node federated YARN cluster that's running
 upwards of 50k jobs daily with a reliability of 99.9%*.
 - We have few ideas for follow-up extensions/improvements which are
 tracked in the umbrella JIRA YARN-5597[3].


Documentation:

 - Quick start guide (maven site) - YARN-6484[4].
 - Overall design doc[5] and the slide-deck [6] we used for our talk at
 Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.


Credits:

This is a group effort that could have not been possible without the ideas
and hard work of many other folks and we would like to specifically call
out Gi

Re: [Vote] merge feature branch YARN-2915 (Federation) to trunk

2017-08-01 Thread Daniel Templeton

Subru, sorry for the last minute contribution... :) I've been looking
at the branch, and I have two questions.

First, what's the out-of-box experience regarding the data store? Is the
expectation that the user will have a database set up and ready to go?
Will the state store set up the schema automatically, or is that on the
user? I don't see that in the docs.

Second, how well does federation play with Kerberos? Anything special
that needs to be configured to make it work?

Daniel

On 7/25/17 8:24 PM, Subru Krishnan wrote:

Hi all,

Per earlier discussion [9], I'd like to start a formal vote to merge
feature YARN Federation (YARN-2915) [1] to trunk. The vote will run for 7
days, and will end Aug 1 7PM PDT.

We have been developing the feature in a branch (YARN-2915 [2]) for a
while, and we are reasonably confident that the state of the feature meets
the criteria to be merged onto trunk.

*Key Ideas*:

YARN’s centralized design allows strict enforcement of scheduling
invariants and effective resource sharing, but becomes a scalability
bottleneck (in number of jobs and nodes) well before reaching the scale of
our clusters (e.g., 20k-50k nodes).

To address these limitations, we developed a scale-out, federation-based
solution (YARN-2915). Our architecture scales near-linearly to datacenter
sized clusters, by partitioning nodes across multiple sub-clusters (each
running a YARN cluster of few thousands nodes). Applications can span
multiple sub-clusters *transparently (i.e. no code change or recompilation
of existing apps)*, thanks to a layer of indirection that negotiates with
multiple sub-clusters' Resource Managers on behalf of the application.

This design is structurally scalable, as it bounds the number of nodes each
RM is responsible for. Appropriate policies ensure that the majority of
applications reside within a single sub-cluster, thus further controlling
the load on each RM. This provides near linear scale-out by simply adding
more sub-clusters. The same mechanism enables pooling of resources from
clusters owned and operated by different teams.

Status:

- The version we would like to merge to trunk is termed "MVP" (minimal
viable product). The feature will have a complete end-to-end application
execution flow with the ability to span a single application across
multiple YARN (sub) clusters.
- There were 50+ sub-tasks that were that were completed as part of this
effort. Every patch has been reviewed and +1ed by a committer. Thanks to
Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
- Federation is designed to be built around YARN and consequently has
minimal code changes to core YARN. The relevant JIRAs that modify existing
YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
attention to ensure that if federation is disabled there is zero impact to
existing functionality (disabled by default).
- We found a few bugs as we went along which we fixed directly upstream
in trunk and/or branch-2.
- We have continuously rebasing the feature branch [2] so the merge
should be a straightforward cherry-pick.
- The current version has been rather thoroughly tested and is currently
deployed in a *10,000+ node federated YARN cluster that's running
upwards of 50k jobs daily with a reliability of 99.9%*.
- We have few ideas for follow-up extensions/improvements which are
tracked in the umbrella JIRA YARN-5597[3].

Documentation:

- Quick start guide (maven site) - YARN-6484[4].
- Overall design doc[5] and the slide-deck [6] we used for our talk at
Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.

Credits:

This is a group effort that could have not been possible without the ideas
and hard work of many other folks and we would like to specifically call
out Giovanni, Botong & Ellen for their invaluable contributions. Also big
thanks to the many folks in community (Sriram, Kishore, Sarvesh, Jian,
Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith and
many more) that helped us shape our ideas and code with very insightful
feedback and comments.

Cheers,
Subru & Carlo

[1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
[2] https://github.com/apache/hadoop/tree/YARN-2915
[3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
[4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
[5] https://issues.apache.org/jira/secure/attachment/12733292/Ya
rn_federation_design_v1.pdf
[6] https://issues.apache.org/jira/secure/attachment/1281922
9/YARN-Federation-Hadoop-Summit_final.pptx
[7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
[8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673
[9]
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201706.mbox/%3CCAOScs9bSsZ7mzH15Y%2BSPDU8YuNUAq7QicjXpDoX_tKh3MS4HsA%40mail.gmail.com%3E

Re: [Vote] merge feature branch YARN-2915 (Federation) to trunk

2017-08-01 Thread Daniel Templeton

Subru, sorry for the last minute contribution... :) I've been looking
at the branch, and I have two questions.

Second, how well does federation play with Kerberos? Anything special
that needs to be configured to make it work?

Thanks!
Daniel

On 7/25/17 8:24 PM, Subru Krishnan wrote:

Hi all,

Per earlier discussion [9], I'd like to start a formal vote to merge
feature YARN Federation (YARN-2915) [1] to trunk. The vote will run for 7
days, and will end Aug 1 7PM PDT.

We have been developing the feature in a branch (YARN-2915 [2]) for a
while, and we are reasonably confident that the state of the feature meets
the criteria to be merged onto trunk.

*Key Ideas*:

Status:

Documentation:

- Quick start guide (maven site) - YARN-6484[4].
- Overall design doc[5] and the slide-deck [6] we used for our talk at
Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.

Credits:

Cheers,
Subru & Carlo

[jira] [Created] (YARN-6912) Cluster Metrics API should report resource types information

Daniel Templeton created YARN-6912:
--

 Summary: Cluster Metrics API should report resource types 
information
 Key: YARN-6912
 URL: https://issues.apache.org/jira/browse/YARN-6912
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6909) The performance advantages of YARN-6679 are lost when resource types are used

Daniel Templeton created YARN-6909:
--

 Summary: The performance advantages of YARN-6679 are lost when 
resource types are used
 Key: YARN-6909
 URL: https://issues.apache.org/jira/browse/YARN-6909
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton
Priority: Critical


YARN-6679 added the {{SimpleResource}} as a lightweight replacement for 
{{ResourcePBImpl}} when a protobuf isn't needed.  With resource types enabled 
and anything other than memory and CPU defined, {{ResourcePBImpl}} will always 
be used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6908) ResourceProfilesManagerImpl is missing @Overrides on methods

Daniel Templeton created YARN-6908:
--

 Summary: ResourceProfilesManagerImpl is missing @Overrides on 
methods
 Key: YARN-6908
 URL: https://issues.apache.org/jira/browse/YARN-6908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6907) Node information page in the old web UI should report resource types

Daniel Templeton created YARN-6907:
--

 Summary: Node information page in the old web UI should report 
resource types
 Key: YARN-6907
 URL: https://issues.apache.org/jira/browse/YARN-6907
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6906) Cluster Node API and Cluster Nodes API should report resource types usage

Daniel Templeton created YARN-6906:
--

 Summary: Cluster Node API and Cluster Nodes API should report 
resource types usage
 Key: YARN-6906
 URL: https://issues.apache.org/jira/browse/YARN-6906
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton


These endpoints currently report:

{noformat}

/default-rack
RUNNING
localhost:51877
localhost
localhost:8042
1501534150336
3.0.0-beta1-SNAPSHOT

4
5120
3072
4
0
0
0
0
0

0
0
0.0
0
0
0.0


{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6886) AllocationFileLoaderService.loadQueue() should validate that setting do not conflict with parent

Daniel Templeton created YARN-6886:
--

 Summary: AllocationFileLoaderService.loadQueue() should validate 
that setting do not conflict with parent
 Key: YARN-6886
 URL: https://issues.apache.org/jira/browse/YARN-6886
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Minor


Some settings, like policy, are limited by the queue's parent queue's 
configuration.  We should check those settings when we load the file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6885) AllocationFileLoaderService.loadQueue() should use a switch statement in the main tag parsing loop instead of the if/else-if/...

Daniel Templeton created YARN-6885:
--

 Summary: AllocationFileLoaderService.loadQueue() should use a 
switch statement in the main tag parsing loop instead of the if/else-if/...
 Key: YARN-6885
 URL: https://issues.apache.org/jira/browse/YARN-6885
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Minor


{code}  if ("minResources".equals(field.getTagName())) {
String text = ((Text)field.getFirstChild()).getData().trim();
Resource val =
FairSchedulerConfiguration.parseResourceConfigValue(text);
minQueueResources.put(queueName, val);
  } else if ("maxResources".equals(field.getTagName())) {
  ...{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6884) AllocationFileLoaderService.loadQueue() has an if without braces

Daniel Templeton created YARN-6884:
--

 Summary: AllocationFileLoaderService.loadQueue() has an if without 
braces
 Key: YARN-6884
 URL: https://issues.apache.org/jira/browse/YARN-6884
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Trivial


{code}  if (!(fieldNode instanceof Element))
continue;{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6883) AllocationFileLoaderService.reloadAllocations() should use a switch statement in the main tag parsing loop instead of the if/else-if/...

Daniel Templeton created YARN-6883:
--

 Summary: AllocationFileLoaderService.reloadAllocations() should 
use a switch statement in the main tag parsing loop instead of the 
if/else-if/...
 Key: YARN-6883
 URL: https://issues.apache.org/jira/browse/YARN-6883
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Minor


{code}if ("queue".equals(element.getTagName()) ||
  "pool".equals(element.getTagName())) {
  queueElements.add(element);
} else if ("user".equals(element.getTagName())) {
...{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6882) AllocationFileLoaderService.reloadAllocations() should use the diamond operator

Daniel Templeton created YARN-6882:
--

 Summary: AllocationFileLoaderService.reloadAllocations() should 
use the diamond operator
 Key: YARN-6882
 URL: https://issues.apache.org/jira/browse/YARN-6882
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Trivial


Here:{code}for (FSQueueType queueType : FSQueueType.values()) {
  configuredQueues.put(queueType, new HashSet());
}{code} and here:{code}List queueElements = new 
ArrayList();{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6881) LOG is unused in AllocationConfiguration

Daniel Templeton created YARN-6881:
--

 Summary: LOG is unused in AllocationConfiguration
 Key: YARN-6881
 URL: https://issues.apache.org/jira/browse/YARN-6881
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


The variable can be removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6880) FSQueue.reservedResource can be final

Daniel Templeton created YARN-6880:
--

 Summary: FSQueue.reservedResource can be final
 Key: YARN-6880
 URL: https://issues.apache.org/jira/browse/YARN-6880
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6879) TestLeafQueue.testDRFUserLimits() has commented out code