Re: June Hadoop Community Meetup

2019-06-04 Thread Daniel Templeton

The meetup page is now live:

   https://www.meetup.com/Hadoop-Contributors/events/262055924

I'll fill in the agenda details after we get them nailed down.  The 
meetup will be an all-day event on June 26th with lunch provided and a 
reception after.  Let me know if there are any questions.


Hope to see you there!
Daniel

On 5/23/19 10:57 AM, Daniel Templeton wrote:
Hi, all!  I want to let you know that Cloudera is planning to host a 
contributors meetup on June 26 at our Palo Alto headquarters. We're 
still working out the details, but we're hoping to follow the format 
that Oath and LinkedIn followed during the last two. Please feel free 
to reach out to me if you have a topic you'd like to propose for the 
meetup.  I will also be reaching out to key folks from the community 
to solicit ideas.  I will send out an update with more details when I 
have more to share.


Thanks!
Daniel



-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



June Hadoop Community Meetup

2019-05-23 Thread Daniel Templeton
Hi, all!  I want to let you know that Cloudera is planning to host a 
contributors meetup on June 26 at our Palo Alto headquarters.  We're 
still working out the details, but we're hoping to follow the format 
that Oath and LinkedIn followed during the last two.  Please feel free 
to reach out to me if you have a topic you'd like to propose for the 
meetup.  I will also be reaching out to key folks from the community to 
solicit ideas.  I will send out an update with more details when I have 
more to share.


Thanks!
Daniel

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8681) Wrong error message in RM placement constraints check

2018-08-17 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-8681:
--

 Summary: Wrong error message in RM placement constraints check
 Key: YARN-8681
 URL: https://issues.apache.org/jira/browse/YARN-8681
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.1.1, 3.2.0
Reporter: Daniel Templeton


In {{SingleConstraintAppPlacementAllocator.validateAndSetSchedulingRequest()}} 
I see the following:

{code}  if (singleConstraint.getMinCardinality() != 0
  || singleConstraint.getMaxCardinality() != 0) {
throwExceptionWithMetaInfo(
"Only support anti-affinity, which is: minCardinality=0, "
+ "maxCardinality=1");
  }{code}

I think the error message should say {{"maxCardinality=0"}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Rookie here

2018-08-06 Thread Daniel Templeton

On 8/3/18 4:39 PM, Aakash Sharma wrote:

Hello All,

I am trying to make changes to YARN as part of my research project.
Regarding this, I have a few queries from this group:-

1) I wanted to have a feature in YARN such that users can run reducers at
specific nodes. Is such a feature already present in YARN?


YARN supports running any task on a specific node.  The issue is that 
the MapReduce application master does not currently offer a way to ask 
for it.



2) In order to contribute to the hadoop code base, I forked out a local
github repository from the apache repository. This gave me version 3.2.0,
which I think is the current development version.
If I want to do my changes on the current stable version, i.e. 3.0.3, how
do I do that? I realize this should be a simple git command, but I wanted
to double check before I start digging in to the code.


All changes should be done against the trunk.  Assuming your changes are 
committed, the committer will cherry pick them into the relevant releases.


Daniel

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8480) Add boolean option for resources

2018-06-28 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-8480:
--

 Summary: Add boolean option for resources
 Key: YARN-8480
 URL: https://issues.apache.org/jira/browse/YARN-8480
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Daniel Templeton
Assignee: Szilard Nemeth


Make it possible to define a resource with a boolean value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-4353) Provide short circuit user group mapping for NM/AM

2018-05-21 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved YARN-4353.

  Resolution: Won't Fix
Hadoop Flags: Reviewed

I'm fine with closing this out.  I added {{NullGroupsMapping}} in order to 
resolve this JIRA, but I never felt confident enough to pull the trigger.

> Provide short circuit user group mapping for NM/AM
> --
>
> Key: YARN-4353
> URL: https://issues.apache.org/jira/browse/YARN-4353
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: YARN-4353.prelim.patch
>
>
> When the NM launches an AM, the {{ContainerLocalizer}} gets the current user 
> from {{UserGroupInformation}}, which triggers user group mapping, even though 
> the user groups are never accessed.  If secure LDAP is configured for group 
> mapping, then there are some additional complications created by the 
> unnecessary group resolution.  Additionally, it adds unnecessary latency to 
> the container launch time.
> To address the issue, before getting the current user, the 
> {{ContainerLocalizer}} should configure {{UserGroupInformation}} with a null 
> group mapping service that quickly and quietly returns an empty group list 
> for all users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Hadoop 3.1.0 release discussion

2018-01-31 Thread Daniel Templeton
I added my comments on that JIRA.  Looks like YARN-7292 is marked as a 
blocker for 3.1, and I would tend to agree with that.  Let's see what we 
can to do get profiles nailed down so that 3.1 can go forward.


Daniel

On 1/18/18 10:25 AM, Wangda Tan wrote:

Thanks Daniel,

We need to make a decision about this: 
https://issues.apache.org/jira/browse/YARN-7292, I believe this is the 
JIRA you mentioned correct? Please let me know if there's anything 
else. And let's move the discussion on JIRA.


The good news is that resource profile is merged to trunk already so 
we can finish that before code freeze date (Feb 08).


+ Sunil as well.

Thanks,
Wangda



On Wed, Jan 17, 2018 at 4:31 PM, Daniel Templeton <mailto:dan...@cloudera.com>> wrote:


What's the status on resource profiles?  I believe there are still
a couple of open JIRAs to rethink some of the design choices.

Daniel


On 1/17/18 11:33 AM, Wangda Tan wrote:

Hi All,

Since we're fast approaching previously proposed feature
freeze date (Jan
30, about 13 days from today). If you've any features which
live in a
branch and targeted to 3.1.0, please reply this email thread.
Ideally, we
should finish branch merging before feature freeze date.

Here's an updated 3.1.0 feature status:

1. Merged & Completed features:
* (Sunil) YARN-5881: Support absolute value in CapacityScheduler.
* (Wangda) YARN-6223: GPU support on YARN. Features in trunk
and works
end-to-end.
* (Jian) YARN-5079,YARN-4793,YARN-4757,YARN-6419 YARN native
services.
* (Steve Loughran): HADOOP-13786: S3Guard committer for
zero-rename commits.
* (Suma): YARN-7117: Capacity Scheduler: Support Auto Creation
of Leaf
Queues While Doing Queue Mapping.
* (Chris Douglas) HDFS-9806: HDFS Tiered Storage.

2. Features close to finish:
* (Zhankun) YARN-5983: FPGA support. Majority implementations
completed and
merged to trunk. Except for UI/documentation.
* (Uma) HDFS-10285: HDFS SPS. Majority implementations are
done, some
discussions going on about implementation.
* (Arun Suresh / Kostas / Wangda). YARN-6592: New
SchedulingRequest and
anti-affinity support. Close to finish, on track to be merged
before Jan 30.

3. Tentative features:
* (Arun Suresh). YARN-5972: Support pausing/freezing opportunistic
containers. Only one pending patch. Plan to finish before Jan 7th.
* (Haibo Chen). YARN-1011: Resource overcommitment. Looks
challenging to be
done before Jan 2018.
* (Anu): HDFS-7240: Ozone. Given the discussion on HDFS-7240.
Looks
challenging to be done before Jan 2018.
* (Varun V) YARN-5673: container-executor write. Given
security refactoring
of c-e (YARN-6623) is already landed, IMHO other stuff may be
moved to 3.2.

Thanks,
Wangda




On Fri, Dec 15, 2017 at 1:20 PM, Wangda Tan
mailto:wheele...@gmail.com>> wrote:

Hi all,

Congratulations on the 3.0.0-GA release!

As we discussed in the previous email thread [1], I'd like
to restart
3.1.0 release plans.

a) Quick summary:
a.1 Release status
We started 3.1 release discussion on Sep 6, 2017 [1]. As
of today,
there’re 232 patches loaded on 3.1.0 alone [2], besides 6
open blockers and
22 open critical issues.

a.2 Release date update
Considering delays of 3.0-GA release by month-and-a-half,
I propose to
move the dates as follows
  - feature freeze date from Dec 15, 2017, to Jan 30, 2018
- last date for
any branches to get merged too;
  - code freeze (blockers & critical only) date to Feb 08,
2018;
  - release voting start by Feb 18, 2018, leaving time for
at least two RCx
  - release date from Jan 15, 2018, to Feb 28, 2018;

Unlike before, I added an additional milestone for
release-vote-start so
that we can account for voting time-period also.

This overall is still 5 1/2 months of release-timeline
unlike the faster
cadence we hoped for, but this, in my opinion, is the
best-updated timeline
given the delays of the final release of 3.0-GA.

b) Individual feature status:
I spoke to several feature owners and checked the status
of un-finished
features, following are status of features planned to 3.1.0:

b.1 Merged & Completed f

Re: Hadoop 3.1.0 release discussion

2018-01-17 Thread Daniel Templeton
What's the status on resource profiles?  I believe there are still a 
couple of open JIRAs to rethink some of the design choices.


Daniel

On 1/17/18 11:33 AM, Wangda Tan wrote:

Hi All,

Since we're fast approaching previously proposed feature freeze date (Jan
30, about 13 days from today). If you've any features which live in a
branch and targeted to 3.1.0, please reply this email thread. Ideally, we
should finish branch merging before feature freeze date.

Here's an updated 3.1.0 feature status:

1. Merged & Completed features:
* (Sunil) YARN-5881: Support absolute value in CapacityScheduler.
* (Wangda) YARN-6223: GPU support on YARN. Features in trunk and works
end-to-end.
* (Jian) YARN-5079,YARN-4793,YARN-4757,YARN-6419 YARN native services.
* (Steve Loughran): HADOOP-13786: S3Guard committer for zero-rename commits.
* (Suma): YARN-7117: Capacity Scheduler: Support Auto Creation of Leaf
Queues While Doing Queue Mapping.
* (Chris Douglas) HDFS-9806: HDFS Tiered Storage.

2. Features close to finish:
* (Zhankun) YARN-5983: FPGA support. Majority implementations completed and
merged to trunk. Except for UI/documentation.
* (Uma) HDFS-10285: HDFS SPS. Majority implementations are done, some
discussions going on about implementation.
* (Arun Suresh / Kostas / Wangda). YARN-6592: New SchedulingRequest and
anti-affinity support. Close to finish, on track to be merged before Jan 30.

3. Tentative features:
* (Arun Suresh). YARN-5972: Support pausing/freezing opportunistic
containers. Only one pending patch. Plan to finish before Jan 7th.
* (Haibo Chen). YARN-1011: Resource overcommitment. Looks challenging to be
done before Jan 2018.
* (Anu): HDFS-7240: Ozone. Given the discussion on HDFS-7240. Looks
challenging to be done before Jan 2018.
* (Varun V) YARN-5673: container-executor write. Given security refactoring
of c-e (YARN-6623) is already landed, IMHO other stuff may be moved to 3.2.

Thanks,
Wangda




On Fri, Dec 15, 2017 at 1:20 PM, Wangda Tan  wrote:


Hi all,

Congratulations on the 3.0.0-GA release!

As we discussed in the previous email thread [1], I'd like to restart
3.1.0 release plans.

a) Quick summary:
a.1 Release status
We started 3.1 release discussion on Sep 6, 2017 [1]. As of today,
there’re 232 patches loaded on 3.1.0 alone [2], besides 6 open blockers and
22 open critical issues.

a.2 Release date update
Considering delays of 3.0-GA release by month-and-a-half, I propose to
move the dates as follows
  - feature freeze date from Dec 15, 2017, to Jan 30, 2018 - last date for
any branches to get merged too;
  - code freeze (blockers & critical only) date to Feb 08, 2018;
  - release voting start by Feb 18, 2018, leaving time for at least two RCx
  - release date from Jan 15, 2018, to Feb 28, 2018;

Unlike before, I added an additional milestone for release-vote-start so
that we can account for voting time-period also.

This overall is still 5 1/2 months of release-timeline unlike the faster
cadence we hoped for, but this, in my opinion, is the best-updated timeline
given the delays of the final release of 3.0-GA.

b) Individual feature status:
I spoke to several feature owners and checked the status of un-finished
features, following are status of features planned to 3.1.0:

b.1 Merged & Completed features:
* (Sunil) YARN-5881: Support absolute value in CapacityScheduler.
* (Wangda) YARN-6223: GPU support on YARN. Features in trunk and works
end-to-end.
* (Jian) YARN-5079,YARN-4793,YARN-4757,YARN-6419 YARN native services.
* (Steve Loughran): HADOOP-13786: S3Guard committer for zero-rename
commits.
* (Suma): YARN-7117: Capacity Scheduler: Support Auto Creation of Leaf
Queues While Doing Queue Mapping.

b.2 Features close to finish:
* (Chris Douglas) HDFS-9806: HDFS Tiered Storage. Being voting now.
* (Zhankun) YARN-5983: FPGA support. Majority implementations completed
and merged to trunk. Except for UI/documentation.
* (Uma) HDFS-10285: HDFS SPS. Majority implementations are done, some
discussions going on about implementation.

b.3 Tentative features:
* (Arun Suresh). YARN-5972: Support pausing/freezing opportunistic
containers. Only one pending patch. Plan to finish before Jan 7th.
* (Haibo Chen). YARN-1011: Resource overcommitment. Looks challenging to
be done before Jan 2018.
* (Arun Suresh / Kostas / Wangda). YARN-6592: New SchedulingRequest and
anti-affinity support. Tentative will figure out by Jan 1st.
* (Anu): HDFS-7240: Ozone. Given the discussion on HDFS-7240. Looks
challenging to be done before Jan 2018.
* (Varun V) YARN-5673: container-executor write. Given security
refactoring of c-e (YARN-6623) is already landed, IMHO other stuff may be
moved to 3.2.

b.4 Additional release drivers
* More exhaustive upgrade testing from 2.x to 3.x.

c) Regarding branch cut:

We will keep pointing trunk to 3.1 and cut branch-3.1 until: A. some
feature planned to 3.2 has to be landed on trunk or B. After feature freeze
date, whichever comes first.

I've also talked offline with Vinod to get 

Re: Apache YARN Committers & Contributors Meetup #5

2017-12-01 Thread Daniel Templeton

And thanks to Vinod we now have an official Meetup online:

https://www.meetup.com/Hadoop-Contributors/events/245569075/

Daniel

On 11/22/17 12:20 PM, Daniel Templeton wrote:
We're long past due for another contributors meetup.  Cloudera is 
excited to host this event at our new Palo Alto Galactic Headquarters 
(https://www.google.com/maps/place/Cloudera+Galactic+HQ/@37.4254615,-122.1413431,17z) 
on December 6th from 2pm-5pm PST.  We will have a Google Hangout set 
up so that remote participants can dial in.  We will also provide 
drinks and snacks.


Our agenda will be roughly:

* Celebrate Hadoop 3.0.0, identify any loose ends, and talk about 3.0.1
* Planning around Hadoop 3.1.0
* Planning around the branch-2 releases, Hadoop 2.10.0, etc.
* Key signing
* Plan for the next bug bash

Hope to see you there!
Daniel

PS: I wasn't able to figure out how to create an event in our meetup 
group for this.  Anyone want to give me a pointer or set it up?



-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Jenkins is failing

2017-11-25 Thread Daniel Templeton
Looking at the output, it's hunting for the working directory and not 
finding it.


Daniel

On 11/25/17 12:10 AM, Yufei Gu wrote:

Yeah, I found the same issue for YARN-7541 and several others. Don't know
how to fix this though.

Best,

Yufei

On Fri, Nov 24, 2017 at 9:18 AM, Sunil G  wrote:


Hello

I am seeing continuous jenkins errors like below.

Modes:  MultiJDK  Sentinel  Jenkins  Robot  Docker  ResetRepo  UnitTests
Processing: YARN-7510
ERROR: *Unsure how to process YARN-7510*.


Has anyone see same issue earlier, and how to resolve this?

- Sunil




-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7557) It should be possible to specify resource types in the fair scheduler increment value

2017-11-22 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7557:
--

 Summary: It should be possible to specify resource types in the 
fair scheduler increment value
 Key: YARN-7557
 URL: https://issues.apache.org/jira/browse/YARN-7557
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Affects Versions: 3.0.0-beta1
Reporter: Daniel Templeton
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7556) Fair scheduler configuration should allow resource types in the minResources and maxResources properties

2017-11-22 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7556:
--

 Summary: Fair scheduler configuration should allow resource types 
in the minResources and maxResources properties
 Key: YARN-7556
 URL: https://issues.apache.org/jira/browse/YARN-7556
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Affects Versions: 3.0.0-beta1
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache YARN Committers & Contributors Meetup #5

2017-11-22 Thread Daniel Templeton
We're long past due for another contributors meetup.  Cloudera is 
excited to host this event at our new Palo Alto Galactic Headquarters 
(https://www.google.com/maps/place/Cloudera+Galactic+HQ/@37.4254615,-122.1413431,17z) 
on December 6th from 2pm-5pm PST.  We will have a Google Hangout set up 
so that remote participants can dial in.  We will also provide drinks 
and snacks.


Our agenda will be roughly:

* Celebrate Hadoop 3.0.0, identify any loose ends, and talk about 3.0.1
* Planning around Hadoop 3.1.0
* Planning around the branch-2 releases, Hadoop 2.10.0, etc.
* Key signing
* Plan for the next bug bash

Hope to see you there!
Daniel

PS: I wasn't able to figure out how to create an event in our meetup 
group for this.  Anyone want to give me a pointer or set it up?


-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7551) yarn.resourcemanager.reservation-system.max-periodicity is not in yarn-default.xml

2017-11-21 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved YARN-7551.

Resolution: Not A Problem

OK.

> yarn.resourcemanager.reservation-system.max-periodicity is not in 
> yarn-default.xml
> --
>
> Key: YARN-7551
> URL: https://issues.apache.org/jira/browse/YARN-7551
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: reservation system
>Affects Versions: 3.0.0
>    Reporter: Daniel Templeton
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7552) RM REST containers endpoints are not documented

2017-11-21 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7552:
--

 Summary: RM REST containers endpoints are not documented
 Key: YARN-7552
 URL: https://issues.apache.org/jira/browse/YARN-7552
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Daniel Templeton
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7551) yarn.resourcemanager.reservation-system.max-periodicity is not in yarn-default.xml

2017-11-21 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7551:
--

 Summary: yarn.resourcemanager.reservation-system.max-periodicity 
is not in yarn-default.xml
 Key: YARN-7551
 URL: https://issues.apache.org/jira/browse/YARN-7551
 Project: Hadoop YARN
  Issue Type: Bug
  Components: reservation system
Affects Versions: 3.0.0
Reporter: Daniel Templeton
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Apache Hadoop 2.8.3 Release Plan

2017-11-21 Thread Daniel Templeton
Doh.  Mailer dropped some of the lists.  Replying again to avoid 
fragmenting the discussion...


Still +1 to Andrew's comments.

Daniel

On 11/21/17 7:53 AM, Daniel Templeton wrote:

+1

Daniel

On 11/20/17 10:22 PM, Andrew Wang wrote:
I'm against including new features in maintenance releases, since 
they're

meant to be bug-fix only.

If we're struggling with being able to deliver new features in a safe 
and

timely fashion, let's try to address that, not overload the meaning of
"maintenance release".

Best,
Andrew

On Mon, Nov 20, 2017 at 5:20 PM, Zheng, Kai  wrote:


Hi Junping,

Thank you for making 2.8.2 happen and now planning the 2.8.3 release.

I have an ask, is it convenient to include the back port work for OSS
connector module? We have some Hadoop users that wish to have it by 
default

for convenience, though in the past they used it by back porting
themselves. I have raised this and got thoughts from Chris and 
Steve. Looks
like this is more wanted for 2.9 but I wanted to ask again here for 
broad
feedback and thoughts by this chance. The back port patch is 
available for
2.8 and the one for branch-2 was already in. IMO, 2.8.x is promising 
as we
can see some shift from 2.7.x, hence it's worth more important 
features and

efforts. How would you think? Thanks!

https://issues.apache.org/jira/browse/HADOOP-14964

Regards,
Kai

-Original Message-
From: Junping Du [mailto:j...@hortonworks.com]
Sent: Tuesday, November 14, 2017 9:02 AM
To: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
mapreduce-...@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Apache Hadoop 2.8.3 Release Plan

Hi,
 We have several important fixes get landed on branch-2.8 and I 
would

like to cut off branch-2.8.3 now to start 2.8.3 release work.
 So far, I don't see any pending blockers on 2.8.3, so my 
current plan

is to cut off 1st RC of 2.8.3 in next several days:
  -  For all coming commits to land on branch-2.8, please 
mark the

fix version as 2.8.4.
  -  If there is a really important fix for 2.8.3 and getting
closed, please notify me ahead before landing it on branch-2.8.3.
 Please let me know if you have any thoughts or comments on the 
plan.


Thanks,

Junping

From: dujunp...@gmail.com  on behalf of 俊平堵 <
junping...@apache.org>
Sent: Friday, October 27, 2017 3:33 PM
To: gene...@hadoop.apache.org
Subject: [ANNOUNCE] Apache Hadoop 2.8.2 Release.

Hi all,

 It gives me great pleasure to announce that the Apache Hadoop
community has voted to release Apache Hadoop 2.8.2, which is now 
available
for download from Apache mirrors[1]. For download instructions 
please refer

to the Apache Hadoop Release page [2].

Apache Hadoop 2.8.2 is the first GA release of Apache Hadoop 2.8 
line and

our newest stable release for entire Apache Hadoop project. For major
changes incuded in Hadoop 2.8 line, please refer Hadoop 2.8.2 main 
page[3].


This release has 315 resolved issues since previous 2.8.1 release with
following
breakdown:
    - 91 in Hadoop Common
    - 99 in HDFS
    - 105 in YARN
    - 20 in MapReduce
Please read the log of CHANGES[4] and RELEASENOTES[5] for more details.

The release news is posted on the Hadoop website too, you can go to the
downloads section directly [6].

Thank you all for contributing to the Apache Hadoop release!


Cheers,

Junping


[1] http://www.apache.org/dyn/closer.cgi/hadoop/common

[2] http://hadoop.apache.org/releases.html

[3] http://hadoop.apache.org/docs/r2.8.2/index.html

[4]
http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/
hadoop-common/release/2.8.2/CHANGES.2.8.2.html

[5]
http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/
hadoop-common/release/2.8.2/RELEASENOTES.2.8.2.html

[6] http://hadoop.apache.org/releases.html#Download


-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org


-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org







-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-20 Thread Daniel Templeton

+1 (binding)

Built with JDK 1.8 and setup a single node cluster.  Messed around with 
resource types and found everything to be working as expected.


Daniel

On 11/14/17 1:34 PM, Andrew Wang wrote:

Hi folks,

Thanks as always to the many, many contributors who helped with this
release. I've created RC0 for Apache Hadoop 3.0.0. The artifacts are
available here:

http://people.apache.org/~wang/3.0.0-RC0/

This vote will run 5 days, ending on Nov 19th at 1:30pm Pacific.

3.0.0 GA contains 291 fixed JIRA issues since 3.0.0-beta1. Notable
additions include the merge of YARN resource types, API-based configuration
of the CapacityScheduler, and HDFS router-based federation.

I've done my traditional testing with a pseudo cluster and a Pi job. My +1
to start.

Best,
Andrew




-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7541) Node updates don't update the maximum cluster capability for resources other than CPU and memory

2017-11-20 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7541:
--

 Summary: Node updates don't update the maximum cluster capability 
for resources other than CPU and memory
 Key: YARN-7541
 URL: https://issues.apache.org/jira/browse/YARN-7541
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0-beta1, 3.1.0
Reporter: Daniel Templeton
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] Apache YARN committers/contribu­t­ors meetup #5

2017-11-17 Thread Daniel Templeton

Dec 6th remains the overwhelming preference, so let's go with it.

I agree that we need to do a bug bash, but since it's been a while since 
we last had a meetup, I would say let's use this one to get the ball 
rolling again, and we can add an agenda item to plan for a bug bash.  
Does that work?


I will yield to the majority opinion.

Daniel

On 11/17/17 3:57 PM, Vinod Kumar Vavilapalli wrote:

Thanks for volunteering to host, Daniel!

The agenda looks good. But we could also go a completely orthogonal way and do 
a bug / review bash too. Up for either of these directions. Or a mix.

Thanks
+Vinod


On Nov 17, 2017, at 9:22 AM, Daniel Templeton  wrote:

I will close the poll this afternoon.  Right now the overwhelming preference is 
for Dec 6th.  Remote folks note that we will have a dial-in option, most likely 
via Hangout.

Daniel

On 11/14/17 10:39 AM, Daniel Templeton wrote:

I was hoping for some signs of life before posting a Doodle poll, but I guess 
we can head off the flurry of votes on this thread with a poll now:

https://doodle.com/poll/mmzmsyvdk9n5pudc

Everyone, please fill out the poll, even if you can't attend either of the 
dates.  Let's try to make a call on a date this week.

Daniel

On 11/13/17 5:43 PM, Karthik Kambatla wrote:

Thanks for hosting this.

Should people respond to this email for date preferences or do you want to use 
something like doodle?



On Sat, Nov 11, 2017 at 8:59 AM Daniel Templeton mailto:dan...@cloudera.com>> wrote:

Sounds like there are no other volunteers, so we're happy to host
here
at Cloudera.  How about November 29th or December 6th in the
afternoon?
Suggested topics would include:

3.0 status/update/celebration
3.1 status, timing, and plans
What we're doing with the 2.x branch going forward
Docker status/plans
Other?

Thanks!
    Daniel

On 10/20/17 1:54 PM, Daniel Templeton wrote:
> Seems to me like we're due for a YARN contributors meetup.  Anyone
> want to volunteer to host?  I'd be happy to handle the
logistics and
> host here at Cloudera, but I don't want to take the opportunity
away
> from another company. :)
>
> Daniel
>
> On 10/28/16 3:27 PM, Vinod Kumar Vavilapalli wrote:
>> Thanks to everyone who joined this meetup!
>>
>> We had quite a blast both in the western hemisphere and from
what I
>> hear in the IST timezone too.
>>
>> Overall, stats
>>
>>   - PST
>>  — Started at 269 patch-available tickets and got it down
to 170
>> - a mix of commits, reviews + updates, closing invalid /
>> not-applicable JIRAs
>>  — Working notes:
>>

https://docs.google.com/spreadsheets/d/1kPKsm3VSnkLU107t-CL05RQ9xQxc6kN-h6o9bDrheaY/edit#gid=2076540402
>>
>>   - IST: (Notes from Sunil)
>>  — 19 patch commits, 11 added / rebased patches, and more
commits
>> on the way waiting for Jenkins
>>  — Working notes:
>>

https://docs.google.com/spreadsheets/d/1EVga79x-sxrfxWoe3o_ZkyLgHbrzHJqAU4hq6qabW_k/edit?ts=5811eb51#gid=0
>>
>> Special thanks
>>   - To Subru for sponsoring the event, logistics and a great
lunch!
>>   - To Sunil for taking the initiative, organizing and running the
>> contributors’ meetup in India!
>>
>> We are thinking of doing this at a regular cadence but for a
smaller
>> duration than a full-day.
>>
>> Thanks
>> +Vinod
>>
>>> On Oct 19, 2016, at 11:08 AM, Subru Krishnan
mailto:su...@apache.org>> wrote:
>>>
>>> Folks,
>>>
>>> Hope everyone's is doing great.
>>>
>>> We are putting in one full day (5-6 hours) for a YARN review
/ commit
>>> marathon on *next Thursday, 27th Oct*.
>>>
>>> Expected Audience: *regular contributor / committer
in YARN*.
>>>
>>> Non-audience: While the meetups are generally open to the
>>> general
>>> public, this is not a 'meetup to learn about YARN'.
>>>
>>> Specific Agenda: YARN bug bash
>>>
>>> Location: Microsoft Moffett Towers, 1020 Enterprise Way,
>>> Sunnyvale,
>>> CA.
>>>
>>> Webex/Skype details for those who are remote: TBD
>>>
>>> Meetup URL:
>>> http://www.meetup.com/Hadoop-Contributors/events/234971372/
>>>
>>> IMPORTANT NOTES:
>>> - Food will be provided for the

Re: Changing the JSON Serializer

2017-11-17 Thread Daniel Templeton

Oh, yeah, I agree.  Adding the new version would be a 3.1 thing.

Daniel

On 11/17/17 4:48 PM, Sean Busbey wrote:

I personally wouldn't be comfortable adding an API version in a maintenance
release; it's essentially adding a feature. but I'm not set to be the RM
for 3.0.1. :)

On Nov 17, 2017 17:56, "Eric Yang"  wrote:


This means YARN-7505 can have /ws/v2/* running in parallel of /ws/v1/* for
3.1 or 3.0.1 release, and deprecate /ws/v1/*.  In version 4, we drop
/ws/v1/*, right?

I think this plan can work.



Regards,

Eric





*From: *Sean Busbey 
*Date: *Friday, November 17, 2017 at 3:08 PM
*To: *Eric Yang 
*Cc: *"yarn-dev@hadoop.apache.org" 
*Subject: *Re: Changing the JSON Serializer



3.0.0 RCs are in progress already. Bit late to make a breaking change.



the REST APIs are versioned for a reason. So long as we're outputting
these changes on a new version, this change should be fine on whatever
branch we like. When we open up for changes to go in the next major release
we can drop the v1 APIs.



On Fri, Nov 17, 2017 at 11:41 AM, Eric Yang  wrote:

+1 on changing the JSON serializer.  Hadoop was an early adopter for
Jersey, but proper JSON deserializer for Jackson didn’t appear until mid
2016 after Jackson 2.5 release.  Hence, some early versions of Hadoop REST
API were not JSON compliant.  Hadoop kind of comply to schematic
versioning, therefore, it will be best to make this change in 3.0 release.
This will reduce some baggage carried forward from Hadoop 2.x.
I think community will respond positively toward this change.  Thank you
for bringing this up.

regards,
Eric


On 11/16/17, 10:02 PM, "Sean Busbey"  wrote:

 The REST APIs are covered under the compatibility guidelines[1].
Presuming
 these are under a new API version number, it's not clear to me from the
 existing guidelines if adding one is okay in a maintenance release. It
 sounds surprising to me.

 [1]:
 https://hadoop.apache.org/docs/current/hadoop-project-
dist/hadoop-common/Compatibility.html#REST_APIs

 On Wed, Nov 15, 2017 at 9:23 PM, Daniel Templeton  Looks like our REST endpoints return malformed JSON for any DAO that
 > includes a Map.  That includes:
 >
 > * the resourceSecondsMap and preemptedResourceSecondsMap entries in
all
 > the GET /apps/* endpoints,
 > * the operationsInfo entry in the GET /scheduler endpoint for
capacity
 > scheduler,
 > * the local_resources, environment, and acls entries in the POST
/apps
 > endpoint, and
 > * the labelsToNodes entry in the GET /label-mappings endpoint.
 >
 > The issue is that each entry in the map is included with a duplicate
key
 > ("entry").  Some JSON parsers will choke on the error, and some will
 > quietly drop the duplicates.  I've filed YARN-7505 to address the
issue.
 >
 > The solution is to replace the Jersey JSON serializer with the
Jackson
 > JSON serializer.  This change fixes the issue, but it changes the
structure
 > of the resulting JSON.  For example, without YARN-7505, hitting
/apps might
 > yield JSON that contains something like:
 >
 > "resourceSecondsMap":{
 >   "entry":{"key":"memory-mb","value":"11225"},
 >   "entry":{"key":"vcores","value":"5"}
 >   "entry":{"key":"test","value":"0"}
 >   "entry":{"key":"test2","value":"0"}
 > }
 >
 > With YARN-7505, we get:
 >
 > "resourceSecondsMap": {
 >   "test2":0,
 >   "test":0,
 >   "memory-mb":11225,
 >   "vcores":5
 > }
 >
 > The first example is obviously broken, so the second one is clearly
 > better, but it's structurally different.
 >
 > For the GET /label-mappings endpoint, the keys of the map also have
to be
 > changed to simple strings because JSON doesn't allow for complex map
keys.
 > So this:
 >
 > "labelsToNodes":{
 >   "entry":{
 > "key":{"name":"label1","exclusivity":"true"},
 > "value":{"nodes":"localhost:63261"}
 >   }
 > }
 >
 > becomes this:
 >
 > "labelsToNodes":{
 >   "label1":{
 > "nodes":["dhcp-10-16-0-181.pa.cloudera.com:63261"]
 >   }
 > }
 >
 > The first one sucks and is invalid, but changing to the second one
will
 > break clients that are parsing the first one, especially if 

Re: Changing the JSON Serializer

2017-11-17 Thread Daniel Templeton
Yeah, we're not going to be able to change the REST APIs without 
updating the version number and leaving the old version around for a 
while.  We should make sure that the fix makes future WS version revs 
easy (or at least easier than this one).


Daniel

On 11/17/17 3:56 PM, Eric Yang wrote:

This means YARN-7505 can have /ws/v2/* running in parallel of /ws/v1/* for 3.1 
or 3.0.1 release, and deprecate /ws/v1/*.  In version 4, we drop /ws/v1/*, 
right?
I think this plan can work.

Regards,
Eric


From: Sean Busbey 
Date: Friday, November 17, 2017 at 3:08 PM
To: Eric Yang 
Cc: "yarn-dev@hadoop.apache.org" 
Subject: Re: Changing the JSON Serializer

3.0.0 RCs are in progress already. Bit late to make a breaking change.

the REST APIs are versioned for a reason. So long as we're outputting these 
changes on a new version, this change should be fine on whatever branch we 
like. When we open up for changes to go in the next major release we can drop 
the v1 APIs.

On Fri, Nov 17, 2017 at 11:41 AM, Eric Yang 
mailto:ey...@hortonworks.com>> wrote:
+1 on changing the JSON serializer.  Hadoop was an early adopter for Jersey, 
but proper JSON deserializer for Jackson didn’t appear until mid 2016 after 
Jackson 2.5 release.  Hence, some early versions of Hadoop REST API were not 
JSON compliant.  Hadoop kind of comply to schematic versioning, therefore, it 
will be best to make this change in 3.0 release.  This will reduce some baggage 
carried forward from Hadoop 2.x.
I think community will respond positively toward this change.  Thank you for 
bringing this up.

regards,
Eric

On 11/16/17, 10:02 PM, "Sean Busbey" 
mailto:bus...@cloudera.com>> wrote:

 The REST APIs are covered under the compatibility guidelines[1]. Presuming
 these are under a new API version number, it's not clear to me from the
 existing guidelines if adding one is okay in a maintenance release. It
 sounds surprising to me.

 [1]:
 
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs

 On Wed, Nov 15, 2017 at 9:23 PM, Daniel Templeton 
mailto:dan...@cloudera.com>>
 wrote:

 > Looks like our REST endpoints return malformed JSON for any DAO that
 > includes a Map.  That includes:
 >
 > * the resourceSecondsMap and preemptedResourceSecondsMap entries in all
 > the GET /apps/* endpoints,
 > * the operationsInfo entry in the GET /scheduler endpoint for capacity
 > scheduler,
 > * the local_resources, environment, and acls entries in the POST /apps
 > endpoint, and
 > * the labelsToNodes entry in the GET /label-mappings endpoint.
 >
 > The issue is that each entry in the map is included with a duplicate key
 > ("entry").  Some JSON parsers will choke on the error, and some will
 > quietly drop the duplicates.  I've filed YARN-7505 to address the issue.
 >
 > The solution is to replace the Jersey JSON serializer with the Jackson
 > JSON serializer.  This change fixes the issue, but it changes the 
structure
 > of the resulting JSON.  For example, without YARN-7505, hitting /apps 
might
 > yield JSON that contains something like:
 >
 > "resourceSecondsMap":{
 >   "entry":{"key":"memory-mb","value":"11225"},
 >   "entry":{"key":"vcores","value":"5"}
 >   "entry":{"key":"test","value":"0"}
 >   "entry":{"key":"test2","value":"0"}
 > }
 >
 > With YARN-7505, we get:
 >
 > "resourceSecondsMap": {
 >   "test2":0,
 >   "test":0,
 >   "memory-mb":11225,
 >   "vcores":5
 > }
 >
 > The first example is obviously broken, so the second one is clearly
 > better, but it's structurally different.
 >
 > For the GET /label-mappings endpoint, the keys of the map also have to be
 > changed to simple strings because JSON doesn't allow for complex map 
keys.
 > So this:
 >
 > "labelsToNodes":{
 >   "entry":{
 > "key":{"name":"label1","exclusivity":"true"},
 > "value":{"nodes":"localhost:63261"}
 >   }
 > }
 >
 > becomes this:
 >
 > "labelsToNodes":{
 >   "label1":{
 > 
"nodes":["dhcp-10-16-0-181.pa.cloudera.com:63261<http://dhcp-10-16-0-181.pa.cloudera.com:63261>"]
 >   }
 > }
 >
 > The fi

Re: [DISCUSS] Apache YARN committers/contribu­t­ors meetup #5

2017-11-17 Thread Daniel Templeton
I will close the poll this afternoon.  Right now the overwhelming 
preference is for Dec 6th.  Remote folks note that we will have a 
dial-in option, most likely via Hangout.


Daniel

On 11/14/17 10:39 AM, Daniel Templeton wrote:
I was hoping for some signs of life before posting a Doodle poll, but 
I guess we can head off the flurry of votes on this thread with a poll 
now:


https://doodle.com/poll/mmzmsyvdk9n5pudc

Everyone, please fill out the poll, even if you can't attend either of 
the dates.  Let's try to make a call on a date this week.


Daniel

On 11/13/17 5:43 PM, Karthik Kambatla wrote:

Thanks for hosting this.

Should people respond to this email for date preferences or do you 
want to use something like doodle?




On Sat, Nov 11, 2017 at 8:59 AM Daniel Templeton <mailto:dan...@cloudera.com>> wrote:


Sounds like there are no other volunteers, so we're happy to host
here
at Cloudera.  How about November 29th or December 6th in the
afternoon?
Suggested topics would include:

3.0 status/update/celebration
3.1 status, timing, and plans
What we're doing with the 2.x branch going forward
Docker status/plans
Other?

Thanks!
Daniel

    On 10/20/17 1:54 PM, Daniel Templeton wrote:
> Seems to me like we're due for a YARN contributors meetup.  Anyone
> want to volunteer to host?  I'd be happy to handle the
logistics and
> host here at Cloudera, but I don't want to take the opportunity
away
> from another company. :)
>
> Daniel
>
> On 10/28/16 3:27 PM, Vinod Kumar Vavilapalli wrote:
>> Thanks to everyone who joined this meetup!
>>
>> We had quite a blast both in the western hemisphere and from
what I
>> hear in the IST timezone too.
>>
>> Overall, stats
>>
>>   - PST
>>  — Started at 269 patch-available tickets and got it down
to 170
>> - a mix of commits, reviews + updates, closing invalid /
>> not-applicable JIRAs
>>  — Working notes:
>>

https://docs.google.com/spreadsheets/d/1kPKsm3VSnkLU107t-CL05RQ9xQxc6kN-h6o9bDrheaY/edit#gid=2076540402
>>
>>   - IST: (Notes from Sunil)
>>  — 19 patch commits, 11 added / rebased patches, and more
commits
>> on the way waiting for Jenkins
>>  — Working notes:
>>

https://docs.google.com/spreadsheets/d/1EVga79x-sxrfxWoe3o_ZkyLgHbrzHJqAU4hq6qabW_k/edit?ts=5811eb51#gid=0
>>
>> Special thanks
>>   - To Subru for sponsoring the event, logistics and a great
lunch!
>>   - To Sunil for taking the initiative, organizing and running the
>> contributors’ meetup in India!
>>
>> We are thinking of doing this at a regular cadence but for a
smaller
>> duration than a full-day.
>>
>> Thanks
>> +Vinod
>>
>>> On Oct 19, 2016, at 11:08 AM, Subru Krishnan
mailto:su...@apache.org>> wrote:
>>>
>>> Folks,
>>>
>>> Hope everyone's is doing great.
>>>
>>> We are putting in one full day (5-6 hours) for a YARN review
/ commit
>>> marathon on *next Thursday, 27th Oct*.
>>>
>>>     Expected Audience: *regular contributor / committer
in YARN*.
>>>
>>>     Non-audience: While the meetups are generally open to the
>>> general
>>> public, this is not a 'meetup to learn about YARN'.
>>>
>>>     Specific Agenda: YARN bug bash
>>>
>>>     Location: Microsoft Moffett Towers, 1020 Enterprise Way,
>>> Sunnyvale,
>>> CA.
>>>
>>>     Webex/Skype details for those who are remote: TBD
>>>
>>>     Meetup URL:
>>> http://www.meetup.com/Hadoop-Contributors/events/234971372/
>>>
>>> IMPORTANT NOTES:
>>> - Food will be provided for the reviewers / committers to
make sure
>>> they
>>> stay up :)
>>> - We have capacity for only *25 people*, so this isn't a walk-in,
>>> please *RSVP
>>> *and reach out to us if you want to join this meetup.
>>>
>>> Thanks.
>>
>>
-
>> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
<mailto:yarn-dev-unsubscr...@hadoop.apache.org>
>> For additional commands, e-mail:
yarn-dev-h...@hadoop.apache.org
<mailto:yarn-dev-h...@hadoop.apache.org>
>>
>


-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
<mailto:yarn-dev-unsubscr...@hadoop.apache.org>
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
<mailto:yarn-dev-h...@hadoop.apache.org>







[jira] [Created] (YARN-7518) Node manager should allow resource units to be lower cased

2017-11-16 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7518:
--

 Summary: Node manager should allow resource units to be lower cased
 Key: YARN-7518
 URL: https://issues.apache.org/jira/browse/YARN-7518
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 3.0.0-beta1, 3.1.0
Reporter: Daniel Templeton


When we do units checks, we should ignore case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Changing the JSON Serializer

2017-11-15 Thread Daniel Templeton
Looks like our REST endpoints return malformed JSON for any DAO that 
includes a Map.  That includes:


* the resourceSecondsMap and preemptedResourceSecondsMap entries in all 
the GET /apps/* endpoints,
* the operationsInfo entry in the GET /scheduler endpoint for capacity 
scheduler,
* the local_resources, environment, and acls entries in the POST /apps 
endpoint, and

* the labelsToNodes entry in the GET /label-mappings endpoint.

The issue is that each entry in the map is included with a duplicate key 
("entry").  Some JSON parsers will choke on the error, and some will 
quietly drop the duplicates.  I've filed YARN-7505 to address the issue.


The solution is to replace the Jersey JSON serializer with the Jackson 
JSON serializer.  This change fixes the issue, but it changes the 
structure of the resulting JSON.  For example, without YARN-7505, 
hitting /apps might yield JSON that contains something like:


"resourceSecondsMap":{
  "entry":{"key":"memory-mb","value":"11225"},
  "entry":{"key":"vcores","value":"5"}
  "entry":{"key":"test","value":"0"}
  "entry":{"key":"test2","value":"0"}
}

With YARN-7505, we get:

"resourceSecondsMap": {
  "test2":0,
  "test":0,
  "memory-mb":11225,
  "vcores":5
}

The first example is obviously broken, so the second one is clearly 
better, but it's structurally different.


For the GET /label-mappings endpoint, the keys of the map also have to 
be changed to simple strings because JSON doesn't allow for complex map 
keys.  So this:


"labelsToNodes":{
  "entry":{
    "key":{"name":"label1","exclusivity":"true"},
    "value":{"nodes":"localhost:63261"}
  }
}

becomes this:

"labelsToNodes":{
  "label1":{
    "nodes":["dhcp-10-16-0-181.pa.cloudera.com:63261"]
  }
}

The first one sucks and is invalid, but changing to the second one will 
break clients that are parsing the first one, especially if they're 
expecting to get the label exclusivity from this endpoint.


Before I try to get YARN-7505 committed, I want to give the community a 
chance to voice any concerns about the change.  It's too late to get 
into 3.0.0, so we'd be looking at 3.0.1 and 3.1.0.


Feel free to comment here or on the JIRA directly.

Thanks,
Daniel

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7505) RM REST endpoints generate malformed JSON

2017-11-15 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7505:
--

 Summary: RM REST endpoints generate malformed JSON
 Key: YARN-7505
 URL: https://issues.apache.org/jira/browse/YARN-7505
 Project: Hadoop YARN
  Issue Type: Bug
  Components: restapi
Affects Versions: 3.0.0
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Critical


For all endpoints that return DAOs that contain maps, the generated JSON is 
malformed.  For example:

% curl 'http://localhost:8088/ws/v1/cluster/apps'
{"apps":{"app":[{"id":"application_1510777276702_0001","user":"daniel","name":"QuasiMonteCarlo","queue":"root.daniel","state":"RUNNING","finalStatus":"UNDEFINED","progress":5.0,"trackingUI":"ApplicationMaster","trackingUrl":"http://dhcp-10-16-0-181.pa.cloudera.com:8088/proxy/application_1510777276702_0001/","diagnostics":"","clusterId":1510777276702,"applicationType":"MAPREDUCE","applicationTags":"","priority":0,"startedTime":1510777317853,"finishedTime":0,"elapsedTime":21623,"amContainerLogs":"http://dhcp-10-16-0-181.pa.cloudera.com:8042/node/containerlogs/container_1510777276702_0001_01_01/daniel","amHostHttpAddress":"dhcp-10-16-0-181.pa.cloudera.com:8042","amRPCAddress":"dhcp-10-16-0-181.pa.cloudera.com:63371","allocatedMB":5120,"allocatedVCores":4,"reservedMB":0,"reservedVCores":0,"runningContainers":4,"memorySeconds":49820,"vcoreSeconds":26,"queueUsagePercentage":62.5,"clusterUsagePercentage":62.5,"resourceSecondsMap":{"entry":{"key":"test2","value":"0"},"entry":{"key":"test","value":"0"},"entry":{"key":"memory-mb","value":"49820"},"entry":{"key":"vcores","value":"26"}},"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"preemptedMemorySeconds":0,"preemptedVcoreSeconds":0,"preemptedResourceSecondsMap":{},"resourceRequests":[{"priority":20,"resourceName":"dhcp-10-16-0-181.pa.cloudera.com","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false},{"priority":20,"resourceName":"/default-rack","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false},{"priority":20,"resourceName":"*","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false}],"logAggregationStatus":"DISABLED","unmanagedApplication":false,"amNodeLabelExpression":"","timeouts":{"timeout":[{"type":"LIFETIME","expiryTime":"UNLIMITED","remainingTimeInSeconds":-1}]}}]}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7414) FairScheduler#getAppWeight() should be moved into FSAppAttempt#getWeight()

2017-11-15 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved YARN-7414.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.1
   3.1.0

Thanks for the patch, [~soumabrata].  Committed to trunk and branch-3.0.

> FairScheduler#getAppWeight() should be moved into FSAppAttempt#getWeight()
> --
>
> Key: YARN-7414
> URL: https://issues.apache.org/jira/browse/YARN-7414
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.0-beta1
>Reporter: Daniel Templeton
>Assignee: Soumabrata Chakraborty
>Priority: Minor
>  Labels: newbie
> Fix For: 3.1.0, 3.0.1
>
> Attachments: YARN-7414.001.patch, YARN-7414.002.patch, 
> YARN-7414.003.patch
>
>
> It's illogical that {{FSAppAttempt}} defers to {{FairScheduler}} for its own 
> weight, especially when {{FairScheduler}} has to call back to 
> {{FSAppAttempt}} to get the details to return a value. Instead, 
> {{FSAppAttempt}} should do the work and call out to {{FairScheduler}} to get 
> the details it needs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] Apache YARN committers/contribu­t­ors meetup #5

2017-11-14 Thread Daniel Templeton
I was hoping for some signs of life before posting a Doodle poll, but I 
guess we can head off the flurry of votes on this thread with a poll now:


https://doodle.com/poll/mmzmsyvdk9n5pudc

Everyone, please fill out the poll, even if you can't attend either of 
the dates.  Let's try to make a call on a date this week.


Daniel

On 11/13/17 5:43 PM, Karthik Kambatla wrote:

Thanks for hosting this.

Should people respond to this email for date preferences or do you 
want to use something like doodle?




On Sat, Nov 11, 2017 at 8:59 AM Daniel Templeton <mailto:dan...@cloudera.com>> wrote:


Sounds like there are no other volunteers, so we're happy to host here
at Cloudera.  How about November 29th or December 6th in the
afternoon?
Suggested topics would include:

3.0 status/update/celebration
3.1 status, timing, and plans
What we're doing with the 2.x branch going forward
Docker status/plans
Other?

Thanks!
Daniel

    On 10/20/17 1:54 PM, Daniel Templeton wrote:
> Seems to me like we're due for a YARN contributors meetup.  Anyone
> want to volunteer to host?  I'd be happy to handle the logistics and
> host here at Cloudera, but I don't want to take the opportunity away
> from another company. :)
>
> Daniel
>
> On 10/28/16 3:27 PM, Vinod Kumar Vavilapalli wrote:
>> Thanks to everyone who joined this meetup!
>>
>> We had quite a blast both in the western hemisphere and from what I
>> hear in the IST timezone too.
>>
>> Overall, stats
>>
>>   - PST
>>  — Started at 269 patch-available tickets and got it down
to 170
>> - a mix of commits, reviews + updates, closing invalid /
>> not-applicable JIRAs
>>  — Working notes:
>>

https://docs.google.com/spreadsheets/d/1kPKsm3VSnkLU107t-CL05RQ9xQxc6kN-h6o9bDrheaY/edit#gid=2076540402
>>
>>   - IST: (Notes from Sunil)
>>  — 19 patch commits, 11 added / rebased patches, and more
commits
>> on the way waiting for Jenkins
>>  — Working notes:
>>

https://docs.google.com/spreadsheets/d/1EVga79x-sxrfxWoe3o_ZkyLgHbrzHJqAU4hq6qabW_k/edit?ts=5811eb51#gid=0
>>
>> Special thanks
>>   - To Subru for sponsoring the event, logistics and a great lunch!
>>   - To Sunil for taking the initiative, organizing and running the
>> contributors’ meetup in India!
>>
>> We are thinking of doing this at a regular cadence but for a
smaller
>> duration than a full-day.
>>
>> Thanks
>> +Vinod
>>
>>> On Oct 19, 2016, at 11:08 AM, Subru Krishnan mailto:su...@apache.org>> wrote:
>>>
>>> Folks,
>>>
>>> Hope everyone's is doing great.
>>>
>>> We are putting in one full day (5-6 hours) for a YARN review /
commit
>>> marathon on *next Thursday, 27th Oct*.
>>>
>>>     Expected Audience: *regular contributor / committer in
YARN*.
>>>
>>>     Non-audience: While the meetups are generally open to the
>>> general
>>> public, this is not a 'meetup to learn about YARN'.
>>>
>>>     Specific Agenda: YARN bug bash
>>>
>>>     Location: Microsoft Moffett Towers, 1020 Enterprise Way,
>>> Sunnyvale,
>>> CA.
>>>
>>>     Webex/Skype details for those who are remote: TBD
>>>
>>>     Meetup URL:
>>> http://www.meetup.com/Hadoop-Contributors/events/234971372/
>>>
>>> IMPORTANT NOTES:
>>> - Food will be provided for the reviewers / committers to make
sure
>>> they
>>> stay up :)
>>> - We have capacity for only *25 people*, so this isn't a walk-in,
>>> please *RSVP
>>> *and reach out to us if you want to join this meetup.
>>>
>>> Thanks.
>>
>>
-
>> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
<mailto:yarn-dev-unsubscr...@hadoop.apache.org>
>> For additional commands, e-mail:
yarn-dev-h...@hadoop.apache.org
<mailto:yarn-dev-h...@hadoop.apache.org>
>>
>


-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
<mailto:yarn-dev-unsubscr...@hadoop.apache.org>
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
<mailto:yarn-dev-h...@hadoop.apache.org>





[DISCUSS] Apache YARN committers/contribu­t­ors meetup #5

2017-11-11 Thread Daniel Templeton
Sounds like there are no other volunteers, so we're happy to host here 
at Cloudera.  How about November 29th or December 6th in the afternoon?  
Suggested topics would include:


3.0 status/update/celebration
3.1 status, timing, and plans
What we're doing with the 2.x branch going forward
Docker status/plans
Other?

Thanks!
Daniel

On 10/20/17 1:54 PM, Daniel Templeton wrote:
Seems to me like we're due for a YARN contributors meetup.  Anyone 
want to volunteer to host?  I'd be happy to handle the logistics and 
host here at Cloudera, but I don't want to take the opportunity away 
from another company. :)


Daniel

On 10/28/16 3:27 PM, Vinod Kumar Vavilapalli wrote:

Thanks to everyone who joined this meetup!

We had quite a blast both in the western hemisphere and from what I 
hear in the IST timezone too.


Overall, stats

  - PST
 — Started at 269 patch-available tickets and got it down to 170 
- a mix of commits, reviews + updates, closing invalid / 
not-applicable JIRAs
 — Working notes: 
https://docs.google.com/spreadsheets/d/1kPKsm3VSnkLU107t-CL05RQ9xQxc6kN-h6o9bDrheaY/edit#gid=2076540402


  - IST: (Notes from Sunil)
 — 19 patch commits, 11 added / rebased patches, and more commits 
on the way waiting for Jenkins
 — Working notes: 
https://docs.google.com/spreadsheets/d/1EVga79x-sxrfxWoe3o_ZkyLgHbrzHJqAU4hq6qabW_k/edit?ts=5811eb51#gid=0


Special thanks
  - To Subru for sponsoring the event, logistics and a great lunch!
  - To Sunil for taking the initiative, organizing and running the 
contributors’ meetup in India!


We are thinking of doing this at a regular cadence but for a smaller 
duration than a full-day.


Thanks
+Vinod


On Oct 19, 2016, at 11:08 AM, Subru Krishnan  wrote:

Folks,

Hope everyone's is doing great.

We are putting in one full day (5-6 hours) for a YARN review / commit
marathon on *next Thursday, 27th Oct*.

    Expected Audience: *regular contributor / committer in YARN*.

    Non-audience: While the meetups are generally open to the 
general

public, this is not a 'meetup to learn about YARN'.

    Specific Agenda: YARN bug bash

    Location: Microsoft Moffett Towers, 1020 Enterprise Way, 
Sunnyvale,

CA.

    Webex/Skype details for those who are remote: TBD

    Meetup URL:
http://www.meetup.com/Hadoop-Contributors/events/234971372/

IMPORTANT NOTES:
- Food will be provided for the reviewers / committers to make sure 
they

stay up :)
- We have capacity for only *25 people*, so this isn't a walk-in, 
please *RSVP

*and reach out to us if you want to join this meetup.

Thanks.


-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org






-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7467) FSLeafQueue unnecessarily calls ComputeFairShares.computeShare() to calculate fair share for apps

2017-11-09 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7467:
--

 Summary: FSLeafQueue unnecessarily calls 
ComputeFairShares.computeShare() to calculate fair share for apps
 Key: YARN-7467
 URL: https://issues.apache.org/jira/browse/YARN-7467
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.1.0
Reporter: Daniel Templeton
Priority: Critical


All apps have the same weight, the same max share (unbounded), and the same min 
share (none).  There's no reason to call {{computeShares()}} at all.  Just 
divide the resources by the number of apps.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Merge Resource Types (YARN-3926) to branch-3.0

2017-10-31 Thread Daniel Templeton
My +1 (binding) brings us to three +1's and no -1's.  The vote is now 
closed, and the merge is approved.  I'll proceed with the merge.  The 
code should be in by this afternoon.


Daniel

On 10/28/17 9:39 AM, Sunil G wrote:

+1 (binding)

Thanks Daniel for helping to backport this. I also ran various 
performance test cases including mentioned UT perfs and SLS tests.


In SLS tests, I found that performance impact of branch-3.0 and 
resource-types branch is almost minimal. I tried to run test scenarios 
with 8k nodes and 4k nodes. There are no performance regressions seen 
when I used 2 resource types. I could get around 2800 container 
allocation per second in my machine with 8k nodes. Other than this I 
have also gone through the branch code and trunk. I could see that all 
major changes related to recent performance improvements are pulled in.


- Sunil


On Sat, Oct 28, 2017 at 8:20 PM Daniel Templeton <mailto:dan...@cloudera.com>> wrote:


As promised, here's the updated performance numbers.

Performance reporting is always a tricky business.  I'll do my
best here
to fairly represent the state of things.  We've run a number of
performance tests.  Those tests include TestCapacitySchedulerPref,
SLS,
and actual cluster testing.

The summary is that in most scenarios, the resource-types branch
is very
close to branch-3.0 in performance.  There are some large scale SLS
tests that show a performance drop, but that we have not been able to
replicate those findings on an actual cluster.  Additional cluster
testing is still in process.

= TestCapacitySchedulerPerf =
This unit test added with YARN-7136 does a tight loop over the
scheduler's handling of node update events.  The net effect is similar
to running 100 apps through 1 queue in a 2-node cluster.  I also
modified it to run with fair scheduler and configured it with assign
multiple enabled and set to the max containers supported by the
cluster.

- Capacity scheduler -
Performance of resource-types v/s branch-3.0: 1.0 (no change)
Performance of resource-types v/s trunk: 1.16 (16% *better*)
- Fair scheduler -
Performance of resource-types + YARN-7374 v/s branch-3.0: 1.25 (25%
*better*)
Performance of resource-types + YARN-7374 v/s trunk: 1.04 (4%
*better*)

These results seem a little optimistic when compared with the SLS
results, but at worst they provide evidence that the resource types
changes do not have a significant negative impact.

Wangda and Sunil did some independent testing with this unit test and
found no significant difference between branch-3.0 and resource-types.

= SLS =
For SLS, we tested a wide range of scenarios with different node, app,
task, and queue counts.  We ran these tests for capacity and fair
scheduler.

The net result is that for the majority of the scenarios we
tested, the
resource-types branch performance was within 95% to 105% of branch-3.0
performance.  We looked at the numbers for only the allocation
time and
node update event processing time, as the other numbers returned
by SLS
are not relevant here.  I'm not reporting specific numbers because of
the volume of tests run, and because reporting any kind of aggregate
result would be inherently skewed by the mix of tests we chose to run,
and hence would be misleading.

There were a few large node count+large queue count+large app count
scenarios where resource-types showed a larger performance degradation
versus branch-3.0 when comparing mean node update time over the entire
run.  Mean is a lossy metric here, as we're trying to summarize an
entire time series in a single number, but it's about the best we're
gonna do.  While these results aren't encouraging, bear in mind that
they are specifically for the time to process a node update, which
does
not necessarily translate directly into overall cluster performance.

Wangda and Sunil did some independent testing with SLS and found no
significant difference between branch-3.0 and resource-types.

= Cluster Testing =
Because of the large SLS scenarios that showed a performance
degradation, we have done performance testing on actual clusters.
These
tests are still ongoing, but thus far the results have shown no
discernible difference in overall throughput between branch-3.0 and
resource-types.  Overall throughput for both branches falls into
    identically the same range.

Daniel

On 10/24/17 10:56 AM, Daniel Templeton wrote:
> I'd like to formally start the voting process for merging the
> resource-types branch into branch-3.0.  The resource-types
branch is a
> selective backport of JIRAs that were already merged into trunk in a
> previous merge vote for

Re: Hadoop Compatability Guide, Part Deux: Developer Docs

2017-10-30 Thread Daniel Templeton
We've now gone a couple of rounds of reviews on HADOOP-14876, and a 
patch is posted for HADOOP-14875.  Feedback is very welcome.  Please 
take a look.


Daniel

On 10/14/17 8:46 AM, Daniel Templeton wrote:
I just posted a first patch for HADOOP-14876 that adds downstream 
developer docs based on the overhauled compatibility guide from 
HADOOP-13714.  I would really appreciate some critical review of the 
doc, as it's much more likely to be read by downstream developers than 
the compatibility spec itself.


There's another doc coming in HADOOP-14875 that will add what amounts 
to upgrade docs for admins.  When that's complete, I will send another 
email here to solicit reviews.


Daniel



-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7418) Improve performance of locking in fair scheduler

2017-10-30 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7418:
--

 Summary: Improve performance of locking in fair scheduler
 Key: YARN-7418
 URL: https://issues.apache.org/jira/browse/YARN-7418
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 3.0.0-beta1
Reporter: Daniel Templeton
Assignee: Daniel Templeton


Based on initial testing, we can improve scheduler performance by 5%-10% with 
some simple optimizations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7414) FairScheduler#getAppWeight() should be moved into FSAppAttempt#getWeight()

2017-10-28 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7414:
--

 Summary: FairScheduler#getAppWeight() should be moved into 
FSAppAttempt#getWeight()
 Key: YARN-7414
 URL: https://issues.apache.org/jira/browse/YARN-7414
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-beta1
Reporter: Daniel Templeton
Priority: Minor


It's illogical that {{FSAppAttempt}} defers to {{FairScheduler}} for its own 
weight, especially when {{FairScheduler}} has to call back to {{FSAppAttempt}} 
to get the details to return a value. Instead, {{FSAppAttempt}} should do the 
work and call out to {{FairScheduler}} to get the details it needs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Merge Resource Types (YARN-3926) to branch-3.0

2017-10-28 Thread Daniel Templeton

As promised, here's the updated performance numbers.

Performance reporting is always a tricky business.  I'll do my best here 
to fairly represent the state of things.  We've run a number of 
performance tests.  Those tests include TestCapacitySchedulerPref, SLS, 
and actual cluster testing.


The summary is that in most scenarios, the resource-types branch is very 
close to branch-3.0 in performance.  There are some large scale SLS 
tests that show a performance drop, but that we have not been able to 
replicate those findings on an actual cluster.  Additional cluster 
testing is still in process.


= TestCapacitySchedulerPerf =
This unit test added with YARN-7136 does a tight loop over the 
scheduler's handling of node update events.  The net effect is similar 
to running 100 apps through 1 queue in a 2-node cluster.  I also 
modified it to run with fair scheduler and configured it with assign 
multiple enabled and set to the max containers supported by the cluster.


- Capacity scheduler -
Performance of resource-types v/s branch-3.0: 1.0 (no change)
Performance of resource-types v/s trunk: 1.16 (16% *better*)
- Fair scheduler -
Performance of resource-types + YARN-7374 v/s branch-3.0: 1.25 (25% 
*better*)

Performance of resource-types + YARN-7374 v/s trunk: 1.04 (4% *better*)

These results seem a little optimistic when compared with the SLS 
results, but at worst they provide evidence that the resource types 
changes do not have a significant negative impact.


Wangda and Sunil did some independent testing with this unit test and 
found no significant difference between branch-3.0 and resource-types.


= SLS =
For SLS, we tested a wide range of scenarios with different node, app, 
task, and queue counts.  We ran these tests for capacity and fair scheduler.


The net result is that for the majority of the scenarios we tested, the 
resource-types branch performance was within 95% to 105% of branch-3.0 
performance.  We looked at the numbers for only the allocation time and 
node update event processing time, as the other numbers returned by SLS 
are not relevant here.  I'm not reporting specific numbers because of 
the volume of tests run, and because reporting any kind of aggregate 
result would be inherently skewed by the mix of tests we chose to run, 
and hence would be misleading.


There were a few large node count+large queue count+large app count 
scenarios where resource-types showed a larger performance degradation 
versus branch-3.0 when comparing mean node update time over the entire 
run.  Mean is a lossy metric here, as we're trying to summarize an 
entire time series in a single number, but it's about the best we're 
gonna do.  While these results aren't encouraging, bear in mind that 
they are specifically for the time to process a node update, which does 
not necessarily translate directly into overall cluster performance.


Wangda and Sunil did some independent testing with SLS and found no 
significant difference between branch-3.0 and resource-types.


= Cluster Testing =
Because of the large SLS scenarios that showed a performance 
degradation, we have done performance testing on actual clusters. These 
tests are still ongoing, but thus far the results have shown no 
discernible difference in overall throughput between branch-3.0 and 
resource-types.  Overall throughput for both branches falls into 
identically the same range.


Daniel

On 10/24/17 10:56 AM, Daniel Templeton wrote:
I'd like to formally start the voting process for merging the 
resource-types branch into branch-3.0.  The resource-types branch is a 
selective backport of JIRAs that were already merged into trunk in a 
previous merge vote for YARN-3926 (resource types) [1].  For a full 
explanation of the feature, benefits, and risks, see the previous 
DISCUSS thread [2].  The vote will be 7 days, ending Tuesday Oct 31 at 
11:00AM PDT.


In summary, resource types adds the ability to declaratively configure 
new resource types in addition to CPU and memory and request them when 
submitting resource requests.  The resource-types branch currently 
represents 32 patches from trunk drawn from the resource types 
umbrella JIRAs: YARN-3926 [3] and YARN-7069 [4].


Key points:
* If no additional resource types are configured, the user experience 
with YARN remains unchanged.
* Performance is the primary risk. We have been closely watching the 
performance impact of adding resource types, and according to current 
measurements the impact is trivial.
* This merge vote is for resource types excluding the resource 
profiles feature which was included in the original merge vote [1].
* Documentation is available in trunk via YARN-7056 [5] with 
improvements pending review in YARN-7369 [6].


Refreshed performance numbers on the resource-types branch are 
pending, and I'll post them to this thread as soon as they're ready.


Thanks!
Daniel

[1] 
http://mail-arc

[jira] [Created] (YARN-7401) Reduce lock contention in ClusterNodeTracker#getClusterResource()

2017-10-26 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7401:
--

 Summary: Reduce lock contention in 
ClusterNodeTracker#getClusterResource()
 Key: YARN-7401
 URL: https://issues.apache.org/jira/browse/YARN-7401
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 3.1.0
Reporter: Daniel Templeton
Assignee: Daniel Templeton


Profiling the code shows massive latency in 
{{ClusterNodeTracker.getClusterResource()}} on getting the lock.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7397) Reduce lock contention in FairScheduler#getAppWeight()

2017-10-25 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7397:
--

 Summary: Reduce lock contention in FairScheduler#getAppWeight()
 Key: YARN-7397
 URL: https://issues.apache.org/jira/browse/YARN-7397
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-beta1
Reporter: Daniel Templeton
Assignee: Daniel Templeton


In profiling the fair scheduler, a large amount of time is spent waiting to get 
the lock in {{FairScheduler.getAppWeight()}}, when the lock isn't actually 
needed.  This patch reduces the scope of the lock to eliminate that contention.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[VOTE] Merge Resource Types (YARN-3926) to branch-3.0

2017-10-24 Thread Daniel Templeton
I'd like to formally start the voting process for merging the 
resource-types branch into branch-3.0.  The resource-types branch is a 
selective backport of JIRAs that were already merged into trunk in a 
previous merge vote for YARN-3926 (resource types) [1].  For a full 
explanation of the feature, benefits, and risks, see the previous 
DISCUSS thread [2].  The vote will be 7 days, ending Tuesday Oct 31 at 
11:00AM PDT.


In summary, resource types adds the ability to declaratively configure 
new resource types in addition to CPU and memory and request them when 
submitting resource requests.  The resource-types branch currently 
represents 32 patches from trunk drawn from the resource types umbrella 
JIRAs: YARN-3926 [3] and YARN-7069 [4].


Key points:
* If no additional resource types are configured, the user experience 
with YARN remains unchanged.
* Performance is the primary risk. We have been closely watching the 
performance impact of adding resource types, and according to current 
measurements the impact is trivial.
* This merge vote is for resource types excluding the resource profiles 
feature which was included in the original merge vote [1].
* Documentation is available in trunk via YARN-7056 [5] with 
improvements pending review in YARN-7369 [6].


Refreshed performance numbers on the resource-types branch are pending, 
and I'll post them to this thread as soon as they're ready.


Thanks!
Daniel

[1] 
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201708.mbox/%3ccad++ecm6xss4_kxp4audf85_rgg4pzxkuox7u2vp8tfzmy4...@mail.gmail.com%3E
[2] 
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201710.mbox/%3Caa2bcc6d-9d88-459d-63f4-5bb43e31f4f4%40cloudera.com%3E

[3] https://issues.apache.org/jira/browse/YARN-3926
[4] https://issues.apache.org/jira/browse/YARN-7069
[5] https://issues.apache.org/jira/browse/YARN-7056
[6] https://issues.apache.org/jira/browse/YARN-7369

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Apache YARN committers/contribu­t­ors meetup #4 (10/27)

2017-10-20 Thread Daniel Templeton
Seems to me like we're due for a YARN contributors meetup.  Anyone want 
to volunteer to host?  I'd be happy to handle the logistics and host 
here at Cloudera, but I don't want to take the opportunity away from 
another company. :)


Daniel

On 10/28/16 3:27 PM, Vinod Kumar Vavilapalli wrote:

Thanks to everyone who joined this meetup!

We had quite a blast both in the western hemisphere and from what I hear in the 
IST timezone too.

Overall, stats

  - PST
 — Started at 269 patch-available tickets and got it down to 170 - a mix of 
commits, reviews + updates, closing invalid / not-applicable JIRAs
 — Working notes: 
https://docs.google.com/spreadsheets/d/1kPKsm3VSnkLU107t-CL05RQ9xQxc6kN-h6o9bDrheaY/edit#gid=2076540402

  - IST: (Notes from Sunil)
 — 19 patch commits, 11 added / rebased patches, and more commits on the 
way waiting for Jenkins
 — Working notes: 
https://docs.google.com/spreadsheets/d/1EVga79x-sxrfxWoe3o_ZkyLgHbrzHJqAU4hq6qabW_k/edit?ts=5811eb51#gid=0

Special thanks
  - To Subru for sponsoring the event, logistics and a great lunch!
  - To Sunil for taking the initiative, organizing and running the 
contributors’ meetup in India!

We are thinking of doing this at a regular cadence but for a smaller duration 
than a full-day.

Thanks
+Vinod


On Oct 19, 2016, at 11:08 AM, Subru Krishnan  wrote:

Folks,

Hope everyone's is doing great.

We are putting in one full day (5-6 hours) for a YARN review / commit
marathon on *next Thursday, 27th Oct*.

Expected Audience: *regular contributor / committer in YARN*.

Non-audience: While the meetups are generally open to the general
public, this is not a 'meetup to learn about YARN'.

Specific Agenda: YARN bug bash

Location: Microsoft Moffett Towers, 1020 Enterprise Way, Sunnyvale,
CA.

Webex/Skype details for those who are remote: TBD

Meetup URL:
http://www.meetup.com/Hadoop-Contributors/events/234971372/

IMPORTANT NOTES:
- Food will be provided for the reviewers / committers to make sure they
stay up :)
- We have capacity for only *25 people*, so this isn't a walk-in, please *RSVP
*and reach out to us if you want to join this meetup.

Thanks.


-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org




-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7374) Improve performance of DRF comparisons for resource types in fair scheduler

2017-10-20 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7374:
--

 Summary: Improve performance of DRF comparisons for resource types 
in fair scheduler
 Key: YARN-7374
 URL: https://issues.apache.org/jira/browse/YARN-7374
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Affects Versions: 3.1.0
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7369) Improve the docs

2017-10-19 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7369:
--

 Summary: Improve the docs
 Key: YARN-7369
 URL: https://issues.apache.org/jira/browse/YARN-7369
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: docs
Affects Versions: 3.1.0
Reporter: Daniel Templeton
Assignee: Daniel Templeton






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[DISCUSS] Merge Resource Types (YARN-3926) to branch-3.0

2017-10-19 Thread Daniel Templeton
After much offline discussion with Wangda, Sunil, Varun V., and Andrew 
we've agreed that it would make sense to pull resource types into 
branch-3.0 ahead of the Hadoop 3.0 RC0.  Resource types has already been 
merged into trunk/3.1.  Now I'd like open a discussion about getting it 
into 3.0 GA.  Here's the run-down:


Feature Details
---
Resource types replaces the two primitives that tracked CPU and memory 
with an array of objects to track an arbitrary set of resources (that 
must always include CPU and memory).  The resource manager reads the 
master list of supported resources from its configs.  The node managers 
read their resource values from their configs and report them to the 
resource manager in their heartbeats.  The clients read the supported 
resource types from their configs (or an RM service) and specify them in 
the application submission.  At a high level, nothing else changes.


The Resource object is a core construct in the resource manager and 
scheduler.  All application operations end up touching Resource objects 
as we determine fit or share-based priority for applications, queues, 
and nodes.  As this feature replaces the core of how Resource objects 
work, resource types impacts almost every aspect of the resource 
manager's operation.  The change is pervasive, but not radical.


The resource types patches as merged into trunk/3.1 include an 
additional feature called resource profiles.  Resource profiles are 
actually independent of resource types, and either is useful without the 
other.  The resource profiles code is still in a bit of flux, so the 
current plan is to pull only the resource types code into branch-3.0.  I 
have backported only the resource types patches into the resource-types 
branch.  Unit tests are passing, and I don't see any significant risk 
from the split.  The diff between the resource-types branch and 
branch-3.0 is available as a branch-3.0 patch on YARN-7013[1].


Justification for 3.0
-
Resource types (leaving out resource profiles) is in a stable state and 
is well tested with unit tests, performance tests, and functional tests 
with both the fair scheduler and the capacity scheduler.  Tests were run 
on both the resource-types branch and the original YARN-3926 branch. 
There is some additional work to do, but none of it's critical (except 
maybe improving the docs).  Our confidence level in the feature is good.


Resource types doesn't introduce incompatible changes to any Public and 
Stable APIs.  The are some incompatible changes to Public and Unstable 
APIs, but that's what a major release is for.  The Resource object proto 
retains the CPU and memory fields and adds a new field for any 
additional resource types to retain wire compatibility.  Other proto 
changes are all additive.


While it's not possible to turn resource types off per se, if the user 
does not activate the feature, the operation of YARN will be unchanged.  
Getting this feature into Hadoop 3.0 gives us the required groundwork to 
make progress on tidying up the usage details without having to drag in 
a large set of invasive changes into 3.1.


If we don't pull resource types into 3.0, it will open a persistent 
channel through which failures can be introduced through backporting.  
The differences introduced by resource types are significant enough that 
it will be an issue for scheduler and resource manager patches between 
3.1 and 3.0.


From the other side, resource types is a pervasive change, and there's 
no turning it off.  Users will be impacted by it regardless of whether 
they choose to use it or not.  While we've tested it, the feature 
represents a large number of changes to core code that's critical to the 
resource manager's operation.  If we're going to introduce a large 
change like this, no matter how well tested, we should do it in 3.0 
where users already expect some bumps in the road.  Bringing in a large 
change like this in a 3.1 release, when users expect the release to have 
stabilized, sounds like a bad idea.



What do folks think about pulling resource types back into branch-3.0 in 
time for RC0?  Any concerns?


Thanks to Varun Vasudev, Sunil Govind, Wangda Tan, Yufei Gu, Grant Sohn, 
Jason Lowe, Arun Suresh, Karthik Kambatla, Vinod Vavilapalli, and Andrew 
Wang for their work on getting the resource types work done, backported, 
tested, and on track for 3.0.


[1]: 
https://issues.apache.org/jira/secure/attachment/12892456/YARN-7013.branch-3.0.002.patch


-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7367) ResourceInformation lacks stability and audience annotations

2017-10-19 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7367:
--

 Summary: ResourceInformation lacks stability and audience 
annotations
 Key: YARN-7367
 URL: https://issues.apache.org/jira/browse/YARN-7367
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Affects Versions: 3.1.0
Reporter: Daniel Templeton






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7357) Several methods in TestZKRMStateStore.TestZKRMStateStoreTester.TestZKRMStateStoreInternal should have @Override annotations

2017-10-18 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7357:
--

 Summary: Several methods in 
TestZKRMStateStore.TestZKRMStateStoreTester.TestZKRMStateStoreInternal should 
have @Override annotations
 Key: YARN-7357
 URL: https://issues.apache.org/jira/browse/YARN-7357
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 3.0.0-beta1
Reporter: Daniel Templeton
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7356) The fair scheduler reservation threshold should be documented in the fair scheduler docs

2017-10-18 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7356:
--

 Summary: The fair scheduler reservation threshold should be 
documented in the fair scheduler docs
 Key: YARN-7356
 URL: https://issues.apache.org/jira/browse/YARN-7356
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-beta1
Reporter: Daniel Templeton


See YARN-3920.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7335) Unsafe cast to from long to int in DominantResourceCalculator.computeAvailableContainers()

2017-10-16 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved YARN-7335.

Resolution: Invalid

Whoops.  Didn't look closely enough.  This one is fine.

> Unsafe cast to from long to int in 
> DominantResourceCalculator.computeAvailableContainers()
> --
>
> Key: YARN-7335
> URL: https://issues.apache.org/jira/browse/YARN-7335
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 3.1.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>
> {code}
> long min = Long.MAX_VALUE;
> ...
> return min > Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) min;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7336) Unsafe cast from long to int hashCode() methods for Resource, LightweightResource, and ResourcePBImpl

2017-10-16 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7336:
--

 Summary: Unsafe cast from long to int hashCode() methods for 
Resource, LightweightResource, and ResourcePBImpl
 Key: YARN-7336
 URL: https://issues.apache.org/jira/browse/YARN-7336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0-beta1
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Critical


For example:

{code}
final int prime = 47;
long result = 0;
for (ResourceInformation entry : resources) {
  result = prime * result + entry.hashCode();
}
return (int) result;
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7335) Unsafe cast to from long to int in DominantResourceCalculator.computeAvailableContainers()

2017-10-16 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7335:
--

 Summary: Unsafe cast to from long to int in 
DominantResourceCalculator.computeAvailableContainers()
 Key: YARN-7335
 URL: https://issues.apache.org/jira/browse/YARN-7335
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0-beta1
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Critical


{code}
long min = Long.MAX_VALUE;
...
return min > Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) min;
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-4373) Jobs can be temporarily forgotten during recovery

2017-10-14 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved YARN-4373.

Resolution: Invalid

This issue was actually that if the name node is unreachable, there's a window 
after a job finishes where the RM will redirect to the JHS, but because the 
name node isn't reachable, the job's history hasn't been transferred yet.  
That's not something that we can easily resolve, so I'm closing this as invalid.

> Jobs can be temporarily forgotten during recovery
> -
>
> Key: YARN-4373
> URL: https://issues.apache.org/jira/browse/YARN-4373
> Project: Hadoop YARN
>  Issue Type: Bug
>    Affects Versions: 2.7.1
>    Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>
> The RM becomes available to service requests before state store recovery is 
> started.  Before recovery and during the recovery period, it's possible for a 
> client to request an application report for a running application to which 
> the RM will respond that the application in unknown.
> I'm seeing this issue with Oozie during an RM failover.  Until the active 
> finishes recovery, it reports erroneous information to Oozie, which doesn't 
> have context to know that it should just try again later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Hadoop Compatability Guide, Part Deux: Developer Docs

2017-10-14 Thread Daniel Templeton
I just posted a first patch for HADOOP-14876 that adds downstream 
developer docs based on the overhauled compatibility guide from 
HADOOP-13714.  I would really appreciate some critical review of the 
doc, as it's much more likely to be read by downstream developers than 
the compatibility spec itself.


There's another doc coming in HADOOP-14875 that will add what amounts to 
upgrade docs for admins.  When that's complete, I will send another 
email here to solicit reviews.


Daniel

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7328) ResourceUtils allows yarn.nodemanager.resource-types.memory-mb and .vcores to override yarn.nodemanager.resource.memory-mb and .cpu-vcores

2017-10-13 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7328:
--

 Summary: ResourceUtils allows 
yarn.nodemanager.resource-types.memory-mb and .vcores to override 
yarn.nodemanager.resource.memory-mb and .cpu-vcores
 Key: YARN-7328
 URL: https://issues.apache.org/jira/browse/YARN-7328
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 3.1.0
Reporter: Daniel Templeton
Priority: Critical


We will throw an exception if yarn.nodemanager.resource-types.memory is 
configured, but not if .memory-mb or .vcores is configured.  We should be 
consistent.  We should not allow resource types to redefine something for which 
we already have a property to set. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7247) Assign multiple will lead to hot point problems of physical resource consumption

2017-09-25 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved YARN-7247.

Resolution: Not A Problem
  Assignee: Daniel Templeton

Closing as not an issue because it's the intentional design of fair scheduler, 
and there are documented approaches to mitigate the problem.

> Assign multiple will lead to hot point problems of physical resource 
> consumption
> 
>
> Key: YARN-7247
> URL: https://issues.apache.org/jira/browse/YARN-7247
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: balloons
>Assignee: Daniel Templeton
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.8.2 (RC0)

2017-09-11 Thread Daniel Templeton
YARN-6622 is now committed to 2.9.  We could backport YARN-5258 and 
YARN-6622 for 2.8, but it'll take some editing.  We'll have to check to 
see what features are unsupported in 2.8 and remove those from the 
docs.  Not a huge effort overall, though.  Probably a hour's work.  I 
may have time to try do it later this week.  Anyone else want to volunteer?


Daniel

On 9/11/17 3:01 PM, Chris Douglas wrote:

On Mon, Sep 11, 2017 at 2:52 PM, Junping Du  wrote:

I don't think this -1 is reasonable, because:
- If you look at YARN-6622 closely, it targets to fix a problematic 
documentation work on YARN-5258 which get checked into 2.9 and 3.0 branch only. 
It means it targets to fix a problem that 2.8.2 never exists.

...we're not going to document security implications- which include
escalations to root- because we don't have _any_ documentation? Why
don't we backport the documentation?


- New docker container support (replace of old DockerContainerExectutor) is 
still an alpha feature now which doesn't highlight in 2.8 major 
features/improvement (http://hadoop.apache.org/docs/r2.8.0/index.html). So 
adding documentation here is also not a blocker.

YARN-6622 is *documenting* the fact that this is an alpha feature and
that it shouldn't be enabled in secure environments. How are users
supposed to make this determination without it?


Vote still continue until a real blocker comes.

Soright. I remain -1. -C



From: Chris Douglas 
Sent: Monday, September 11, 2017 12:00 PM
To: Junping Du
Cc: Miklos Szegedi; Mingliang Liu; Hadoop Common; Hdfs-dev; 
mapreduce-...@hadoop.apache.org; yarn-dev@hadoop.apache.org; junping_du
Subject: Re: [VOTE] Release Apache Hadoop 2.8.2 (RC0)

-1 (binding)

I don't think we should release this without YARN-6622.

Since this doesn't happen often: a -1 in this case is NOT a veto.
Releases are approved by majority vote of the PMC. -C

On Mon, Sep 11, 2017 at 11:45 AM, Junping Du  wrote:

Thanks Mikols for notifying on this. I think docker support is general known as 
alpha feature so document it as experimental is nice to have but not a blocker 
for 2.8.2. I also noticed that our 2.7.x document 
(https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html)
 without mentioning docker support is experimental. We may need to fix that as 
well in following releases.

I can also add it (mentioning docker container support feature is experimental) 
to release message in public website just like previous release we call 
2.7.0/2.8.0 as non-production release.

I think vote should continue until we could find a real blocker.


Thanks,


Junping



From: Miklos Szegedi 
Sent: Monday, September 11, 2017 10:07 AM
To: Mingliang Liu
Cc: Hadoop Common; Hdfs-dev; mapreduce-...@hadoop.apache.org; 
yarn-dev@hadoop.apache.org; junping_du; Junping Du
Subject: Re: [VOTE] Release Apache Hadoop 2.8.2 (RC0)

Hello Junping,

Thank you for working on this. Should not YARN-6622 be addressed first? "Summary: 
Document Docker work as experimental".

Thank you,
Miklos


On Sun, Sep 10, 2017 at 6:39 PM, Mingliang Liu 
mailto:lium...@gmail.com>> wrote:
Thanks Junping for doing this!

+1 (non-binding)

- Download the hadoop-2.8.2-src.tar.gz file and checked the md5 value
- Build package using maven (skipping tests) with Java 8
- Spin up a test cluster in Docker containers having 1 master node (NN/RM) and 
3 slave nodes (DN/NM)
- Operate the basic HDFS/YARN operations from command line, both client and 
admin
- Check NN/RM Web UI
- Run distcp to copy files from/to local and HDFS
- Run hadoop mapreduce examples: grep and wordcount
- Check the HDFS service logs

All looked good to me.

Mingliang


On Sep 10, 2017, at 5:00 PM, Junping Du 
mailto:j...@hortonworks.com>> wrote:

Hi folks,
 With fix of HADOOP-14842 get in, I've created our first release candidate 
(RC0) for Apache Hadoop 2.8.2.

 Apache Hadoop 2.8.2 is the first stable release of Hadoop 2.8 line and 
will be the latest stable/production release for Apache Hadoop - it includes 
305 new fixed issues since 2.8.1 and 63 fixes are marked as blocker/critical 
issues.

  More information about the 2.8.2 release plan can be found here: 
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release

  New RC is available at: 
http://home.apache.org/~junping_du/hadoop-2.8.2-RC0

  The RC tag in git is: release-2.8.2-RC0, and the latest commit id is: 
e6597fe3000b06847d2bf55f2bab81770f4b2505

  The maven artifacts are available via 
repository.apache.org at: 
https://repository.apache.org/content/repositories/orgapachehadoop-1062

  Please try the release and vote; the vote will run for the usual 5 days, 
ending on 09/15/2017 5pm PST time.

Thanks,

Junping



-
To unsubscribe, e-mail: 
mapreduce-dev-unsubscr...@hadoop.apache.org

[jira] [Created] (YARN-7182) YARN's StateMachine should be stable

2017-09-09 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7182:
--

 Summary: YARN's StateMachine should be stable
 Key: YARN-7182
 URL: https://issues.apache.org/jira/browse/YARN-7182
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


It's currently {{Evolving}}, which is clearly no longer true.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7172) ResourceCalculator.fitsIn() should not take a cluster resource parameter

2017-09-07 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7172:
--

 Summary: ResourceCalculator.fitsIn() should not take a cluster 
resource parameter
 Key: YARN-7172
 URL: https://issues.apache.org/jira/browse/YARN-7172
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


There are numerous calls to {{ClusterNodeTracker.getClusterResource()}} (which 
involves a lock) to get a value to pass as the cluster resource parameter to 
{{Resources.fitsIn()}}, but the parameter is (quite reasonably) ignored.  We 
should remove the parameter.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: DISCUSS: Hadoop Compatability Guidelines

2017-09-07 Thread Daniel Templeton
Good point.  I think it would be valuable to enumerate the policies 
around the versioned state stores.  We have the three you listed. We 
should probably include the HDFS fsimage in that list.  Any others?


I also want to add a section that clarifies when it's OK to change the 
visibility or audience of an API.


Daniel

On 9/5/17 11:04 AM, Arun Suresh wrote:

Thanks for starting this Daniel.

I think we should also add a section for store compatibility (all state
stores including RM, NM, Federation etc.). Essentially an explicit policy
detailing when is it ok to change the major and minor versions and how it
should relate to the hadoop release version.
Thoughts ?

Cheers
-Arun


On Tue, Sep 5, 2017 at 10:38 AM, Daniel Templeton 
wrote:


Good idea.  I should have thought of that. :)  Done.

Daniel


On 9/5/17 10:33 AM, Anu Engineer wrote:


Could you please attach the PDFs to the JIRA. I think the mailer is
stripping them off from the mail.

Thanks
Anu





On 9/5/17, 9:44 AM, "Daniel Templeton"  wrote:

Resending with a broader audience, and reattaching the PDFs.

Daniel

On 9/4/17 9:01 AM, Daniel Templeton wrote:


All, in prep for Hadoop 3 beta 1 I've been working on updating the
compatibility guidelines on HADOOP-13714.  I think the initial doc is
more or less complete, so I'd like to open the discussion up to the
broader Hadoop community.

In the new guidelines, I have drawn some lines in the sand regarding
compatibility between releases.  In some cases these lines are more
restrictive than the current practices.  The intent with the new
guidelines is not to limit progress by restricting what goes into a
release, but rather to drive release numbering to keep in line with
the reality of the code.

Please have a read and provide feedback on the JIRA.  I'm sure there
are more than a couple of areas that could be improved.  If you'd
rather not read markdown from a diff patch, I've attached PDFs of the
two modified docs.

Thanks!
Daniel




-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org





-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7166) Container REST endpoints should report resource types

2017-09-06 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7166:
--

 Summary: Container REST endpoints should report resource types
 Key: YARN-7166
 URL: https://issues.apache.org/jira/browse/YARN-7166
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: DISCUSS: Hadoop Compatability Guidelines

2017-09-05 Thread Daniel Templeton

Good idea.  I should have thought of that. :)  Done.

Daniel

On 9/5/17 10:33 AM, Anu Engineer wrote:

Could you please attach the PDFs to the JIRA. I think the mailer is stripping 
them off from the mail.

Thanks
Anu





On 9/5/17, 9:44 AM, "Daniel Templeton"  wrote:


Resending with a broader audience, and reattaching the PDFs.

Daniel

On 9/4/17 9:01 AM, Daniel Templeton wrote:

All, in prep for Hadoop 3 beta 1 I've been working on updating the
compatibility guidelines on HADOOP-13714.  I think the initial doc is
more or less complete, so I'd like to open the discussion up to the
broader Hadoop community.

In the new guidelines, I have drawn some lines in the sand regarding
compatibility between releases.  In some cases these lines are more
restrictive than the current practices.  The intent with the new
guidelines is not to limit progress by restricting what goes into a
release, but rather to drive release numbering to keep in line with
the reality of the code.

Please have a read and provide feedback on the JIRA.  I'm sure there
are more than a couple of areas that could be improved.  If you'd
rather not read markdown from a diff patch, I've attached PDFs of the
two modified docs.

Thanks!
Daniel





-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: DISCUSS: Hadoop Compatability Guidelines

2017-09-05 Thread Daniel Templeton

Resending with a broader audience, and reattaching the PDFs.

Daniel

On 9/4/17 9:01 AM, Daniel Templeton wrote:
All, in prep for Hadoop 3 beta 1 I've been working on updating the 
compatibility guidelines on HADOOP-13714.  I think the initial doc is 
more or less complete, so I'd like to open the discussion up to the 
broader Hadoop community.


In the new guidelines, I have drawn some lines in the sand regarding 
compatibility between releases.  In some cases these lines are more 
restrictive than the current practices.  The intent with the new 
guidelines is not to limit progress by restricting what goes into a 
release, but rather to drive release numbering to keep in line with 
the reality of the code.


Please have a read and provide feedback on the JIRA.  I'm sure there 
are more than a couple of areas that could be improved.  If you'd 
rather not read markdown from a diff patch, I've attached PDFs of the 
two modified docs.


Thanks!
Daniel




-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7143) FileNotFound handling in ResourceUtils is inconsistent

2017-08-31 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7143:
--

 Summary: FileNotFound handling in ResourceUtils is inconsistent
 Key: YARN-7143
 URL: https://issues.apache.org/jira/browse/YARN-7143
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton
Assignee: Daniel Templeton


When loading the resource-types.xml file, we warn and move on if it's not 
found.  When loading the node-resource.xml file, we abort loading resource 
types if the file isn't found.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7135) Clean up lock-try order in common scheduler code

2017-08-30 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7135:
--

 Summary: Clean up lock-try order in common scheduler code
 Key: YARN-7135
 URL: https://issues.apache.org/jira/browse/YARN-7135
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


There are many places that follow the pattern:{code}try {
  lock.lock();
  ...
} finally {
  lock.unlock();
}{code}

There are a couple of reasons that's a bad idea.  The correct pattern 
is:{code}lock.lock();
try {
  ...
} finally {
  lock.unlock();
}{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7134) AppSchedulingInfo has a dependency on capacity scheduler

2017-08-30 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7134:
--

 Summary: AppSchedulingInfo has a dependency on capacity scheduler
 Key: YARN-7134
 URL: https://issues.apache.org/jira/browse/YARN-7134
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Blocker


The common scheduling code should be independent of all scheduler 
implementations.  YARN-6040 introduced capacity scheduler's {{SchedulingMode}} 
into {{AppSchedulingInfo}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7133) Clean up lock-try order fair scheduler

2017-08-30 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7133:
--

 Summary: Clean up lock-try order fair scheduler
 Key: YARN-7133
 URL: https://issues.apache.org/jira/browse/YARN-7133
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


There are many places that follow the pattern:{code}try {
  lock.lock();
  ...
} finally {
  lock.unlock();
}{code}

There are a couple of reasons that's a bad idea.  The correct pattern 
is:{code}lock.lock();
try {
  ...
} finally {
  lock.unlock();
}{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7132) FairScheduler.initScheduler() contains a surprising unary plus

2017-08-30 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7132:
--

 Summary: FairScheduler.initScheduler() contains a surprising unary 
plus
 Key: YARN-7132
 URL: https://issues.apache.org/jira/browse/YARN-7132
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Minor


The method contains the following code:{code}
LOG.warn(FairSchedulerConfiguration.UPDATE_INTERVAL_MS
+ " is invalid, so using default value "
+ +FairSchedulerConfiguration.DEFAULT_UPDATE_INTERVAL_MS
+ " ms instead");{code}

Note the beginning of the third line.  One of those plusses should be deleted 
so that no one else spends cycles trying to understand why it even compiles.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7123) FairScheduler.getResourceCalculator() returns an instance of DefaultResourceCalculator regardless of the configuration

2017-08-29 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7123:
--

 Summary: FairScheduler.getResourceCalculator() returns an instance 
of DefaultResourceCalculator regardless of the configuration
 Key: YARN-7123
 URL: https://issues.apache.org/jira/browse/YARN-7123
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


There are several places where this creates the wrong behavior:

* 298:RMServerUtils.java
* 1081:AbstractYarnScheduler.java
* 1197:FSAppAttempt.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7121) FSAppAttempt's delayed scheduling should be factored out into the common scheduling code

2017-08-29 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7121:
--

 Summary: FSAppAttempt's delayed scheduling should be factored out 
into the common scheduling code
 Key: YARN-7121
 URL: https://issues.apache.org/jira/browse/YARN-7121
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


Per [~leftnoteasy]'s comment:{code}// TODO (wandga): All logics in this method 
should be added to
// SchedulerPlacement#canDelayTo which is independent from scheduler.
// Scheduler can choose to use various/pluggable delay-scheduling
// implementation.{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7119) yarn rmadmin -updateNodeResource should be updated for resource types

2017-08-29 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7119:
--

 Summary: yarn rmadmin -updateNodeResource should be updated for 
resource types
 Key: YARN-7119
 URL: https://issues.apache.org/jira/browse/YARN-7119
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-3926
Reporter: Daniel Templeton






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Merge YARN-3926 (resource profile) to trunk

2017-08-26 Thread Daniel Templeton
Quick question, Wangda.  When you say that the feature can be turned 
off, do you mean resource types or resource profiles?  I know there's an 
off-by-default property that governs resource profiles, but I didn't see 
any way to turn off resource types.  Even if only CPU and memory are 
configured, i.e. no additional resource types, the code path is 
different than it was.  Specifically, where CPU and memory were 
primitives before, they're now entries in an array whose indexes have to 
be looked up through the ResourceUtils class.  Did I miss something?


For those who haven't followed the feature closely, there are really two 
features here.  Resource types allows for declarative extension of the 
resource system in YARN.  Resource profiles builds on top of resource 
types to allow a user to request a group of resources as a profile, much 
like EC2 instance types, e.g. "fast-compute" might mean 32GB RAM, 8 
vcores, and 2 GPUs.


Daniel

On 8/23/17 11:49 AM, Wangda Tan wrote:

  Hi folks,

Per earlier discussion [1], I'd like to start a formal vote to merge
feature branch YARN-3926 (Resource profile) to trunk. The vote will run for
7 days and will end August 30 10:00 AM PDT.

Briefly, YARN-3926 can extend resource model of YARN to support resource
types other than CPU and memory, so it will be a cornerstone of features
like GPU support (YARN-6223), disk scheduling/isolation (YARN-2139), FPGA
support (YARN-5983), network IO scheduling/isolation (YARN-2140). In
addition to that, YARN-3926 allows admin to preconfigure resource profiles
in the cluster, for example, m3.large means <2 vcores, 8 GB memory, 64 GB
disk>, so applications can request "m3.large" profile instead of specifying
all resource types’s values.

There are 32 subtasks that were completed as part of this effort.

This feature needs to be explicitly turned on before use. We paid close
attention to compatibility, performance, and scalability of this feature,
mentioned in [1], we didn't see observable performance regression in large
scale SLS (scheduler load simulator) executions and saw less than 5%
performance regression by using micro benchmark added by YARN-6775.

This feature works from end-to-end (including UI/CLI/application/server),
we have setup a cluster with this feature turned on runs for several weeks,
we didn't see any issues by far.

Merge JIRA: YARN-7013 (Jenkins gave +1 already).
Documentation: YARN-7056

Special thanks to a team of folks who worked hard and contributed towards
this effort including design discussion/development/reviews, etc.: Varun
Vasudev, Sunil Govind, Daniel Templeton, Vinod Vavilapalli, Yufei Gu,
Karthik Kambatla, Jason Lowe, Arun Suresh.

Regards,
Wangda Tan

[1]
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201708.mbox/%3CCAD%2B%2BeCnjEHU%3D-M33QdjnND0ZL73eKwxRua4%3DBbp4G8inQZmaMg%40mail.gmail.com%3E




-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7085) Application.schedule() and Application.assign() appear to only be used in test code

2017-08-23 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7085:
--

 Summary: Application.schedule() and Application.assign() appear to 
only be used in test code
 Key: YARN-7085
 URL: https://issues.apache.org/jira/browse/YARN-7085
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


That's a pretty big chunk of code to be purely for tests.  I haven't looked at 
it closely enough yet to tell if the code is there to support the tests, or if 
the tests are just testing dead code.  Either way, we should remove the code 
from {{Application}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7042) Clean up unit tests after YARN-6610

2017-08-17 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7042:
--

 Summary: Clean up unit tests after YARN-6610
 Key: YARN-7042
 URL: https://issues.apache.org/jira/browse/YARN-7042
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: test
Affects Versions: YARN-3926
Reporter: Daniel Templeton
Assignee: Daniel Templeton


Some of the unit tests in YARN-6610 weren't quite testing what they were 
supposed to be testing.  This patch fixes that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7026) Fair scheduler docs should explain what happens when no placement rules are specified

2017-08-16 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7026:
--

 Summary: Fair scheduler docs should explain what happens when no 
placement rules are specified
 Key: YARN-7026
 URL: https://issues.apache.org/jira/browse/YARN-7026
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: docs
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7002) branch-2 build is broken by AllocationFileLoaderService.java

2017-08-11 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved YARN-7002.

  Resolution: Fixed
Hadoop Flags: Reviewed

I reverted the offending commit.

> branch-2 build is broken by AllocationFileLoaderService.java
> 
>
> Key: YARN-7002
> URL: https://issues.apache.org/jira/browse/YARN-7002
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0
>Reporter: John Zhuge
>    Assignee: Daniel Templeton
>
> branch-2 build is broken:
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
> on project hadoop-yarn-server-resourcemanager: Compilation failure
> [ERROR] 
> /Users/jzhuge/hadoop-commit/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java:[270,39]
>  incompatible types: java.util.HashSet cannot be converted 
> to java.util.Set
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6995) Improve use of ResourceNotFoundException in resource types code

2017-08-11 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6995:
--

 Summary: Improve use of ResourceNotFoundException in resource 
types code
 Key: YARN-6995
 URL: https://issues.apache.org/jira/browse/YARN-6995
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Minor


Now that all the YarnExceptions have been replaced with 
ResourceNotFoundExceptions, we should make the ResourceNotFoundExceptions as 
useful as possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6994) Remove last uses of Long from resource types code

2017-08-11 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6994:
--

 Summary: Remove last uses of Long from resource types code
 Key: YARN-6994
 URL: https://issues.apache.org/jira/browse/YARN-6994
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Minor


Most of the uses have been removed over the last few patches.  There's only one 
left that I see.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6986) Fair scheduler should add a SchedulerMetrics class

2017-08-10 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6986:
--

 Summary: Fair scheduler should add a SchedulerMetrics class
 Key: YARN-6986
 URL: https://issues.apache.org/jira/browse/YARN-6986
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


Especially now that ATSv2 is almost here, it would be very helpful if the fair 
scheduler offered scheduler metrics, like # pending requests, # running 
containers, # preemptions, size of the event queue, etc.

Currently I see cluster metrics, queue metrics, and scheduler operation 
duration metrics, but no top-level scheduler metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6985) The wrapper methods in Resources aren't useful

2017-08-10 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6985:
--

 Summary: The wrapper methods in Resources aren't useful
 Key: YARN-6985
 URL: https://issues.apache.org/jira/browse/YARN-6985
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


The code would be shorter, easier to read, and a tiny smidgeon faster if we 
just called the {{ResourceCalculator}} methods directly.  I don't see where the 
wrappers improve the code in any way.

For example, with wrappers:{code}Resource normalized = Resources.normalize(
resourceCalculator, ask, minimumResource,
maximumResource, incrementResource);
{code} and without wrappers:{code}Resource normalized = 
resourceCalculator.normalize(ask, minimumResource,
maximumResource, incrementResource);{code}

The difference isn't huge, but I find the latter much more readable.  With the 
former I always have to figure out which parameters are which, because passing 
in the {{ResourceCalculator}} adds in an unrelated additional parameter at the 
head of the list.

There may be some cases where the wrapper methods are mixed in with calls to 
legitimate {{Resources}} methods, making the code more consistent to use the 
wrappers. In those cases, that may be a reason to keep and use the wrapper 
method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6984) DominantResourceCalculator.isAnyMajorResourceZero() should test all resources

2017-08-10 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6984:
--

 Summary: DominantResourceCalculator.isAnyMajorResourceZero() 
should test all resources
 Key: YARN-6984
 URL: https://issues.apache.org/jira/browse/YARN-6984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: YARN-3926
Reporter: Daniel Templeton


The method currently tests only memory and CPU.  It looks to me like it should 
test all resources, i.e. it should do what {{isInvalidDivisor()}} does and 
should, in fact, replace that method.  [~sunilg], since you wrote the method 
originally, can you comment on what its intended semantics are?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6964) Fair scheduler misuses Resources operations

2017-08-07 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6964:
--

 Summary: Fair scheduler misuses Resources operations
 Key: YARN-6964
 URL: https://issues.apache.org/jira/browse/YARN-6964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Assignee: Daniel Templeton


There are several places where YARN uses the {{Resources}} class to do 
comparisons of {{Resource}} instances incorrectly.  This patch corrects those 
mistakes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6953) Clean up ResourceUtils.setMinimumAllocationForMandatoryResources() and setMaximumAllocationForMandatoryResources()

2017-08-04 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6953:
--

 Summary: Clean up 
ResourceUtils.setMinimumAllocationForMandatoryResources() and 
setMaximumAllocationForMandatoryResources()
 Key: YARN-6953
 URL: https://issues.apache.org/jira/browse/YARN-6953
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton
Priority: Minor


The {{setMinimumAllocationForMandatoryResources()}} and 
{{setMaximumAllocationForMandatoryResources()}} methods are quite convoluted.  
They'd be much simpler if they just handled CPU and memory manually instead of 
trying to be clever about doing it in a loop.  There are also issues, such as 
the log warning always talking about memory or the last element of the inner 
array being a copy of the first element.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6934) ResourceUtils.checkMandatoryResources() should also ensure that no min or max is set for vcores or memory

2017-08-04 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved YARN-6934.

Resolution: Invalid

> ResourceUtils.checkMandatoryResources() should also ensure that no min or max 
> is set for vcores or memory
> -
>
> Key: YARN-6934
> URL: https://issues.apache.org/jira/browse/YARN-6934
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>    Reporter: Daniel Templeton
>  Labels: newbie++
> Attachments: YARN-6934.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6935) ResourceProfilesManagerImpl.parseResource() has no need of the key parameter

2017-08-02 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6935:
--

 Summary: ResourceProfilesManagerImpl.parseResource() has no need 
of the key parameter
 Key: YARN-6935
 URL: https://issues.apache.org/jira/browse/YARN-6935
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton


The {{key}} parameter is the name of the resource profile being parsed, which 
is irrelevant to parsing the {{value}} as a {{Resource}} and hence is unused.  
It should be removed, and {{value}} should be renamed to something more 
descriptive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6934) ResourceUtils.checkMandatoryResources() should also ensure that no min or max is set for vcores or memory

2017-08-02 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6934:
--

 Summary: ResourceUtils.checkMandatoryResources() should also 
ensure that no min or max is set for vcores or memory
 Key: YARN-6934
 URL: https://issues.apache.org/jira/browse/YARN-6934
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6933) ResourceUtils.DISALLOWED_NAMES and ResourceUtils.checkMandatoryResources() are duplicating work

2017-08-02 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6933:
--

 Summary: ResourceUtils.DISALLOWED_NAMES and 
ResourceUtils.checkMandatoryResources() are duplicating work
 Key: YARN-6933
 URL: https://issues.apache.org/jira/browse/YARN-6933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton


Both are used to check that the mandatory resources were not redefined.  Only 
one check is needed.  I would recommend dropping {{DISALLOWED_RESOURCES}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6927) Add support for individual resource types requests in MapReduce

2017-08-01 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6927:
--

 Summary: Add support for individual resource types requests in 
MapReduce
 Key: YARN-6927
 URL: https://issues.apache.org/jira/browse/YARN-6927
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton


YARN-6504 adds support for resource profiles in MapReduce jobs, but resource 
profiles don't give users much flexibility in their resource requests.  To 
satisfy users' needs, MapReduce should also allow users to specify arbitrary 
resource requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [Vote] merge feature branch YARN-2915 (Federation) to trunk

2017-08-01 Thread Daniel Templeton

Thanks, Subru!  Carry on. :)

Daniel

On 8/1/17 1:42 PM, Subru Krishnan wrote:

Hi Daniel,

You were just on time, myself & Carlo were just talking about moving
forward with the merge :).

To answer your questions:

1. The expectation about the store is that user will have a database set
up (we only link to install instructions page) but we do have the scripts
for the schema and stored procedures. This is in fact called out in the doc
in the *State Store* section (just before *Running a Sample Job).
*Additionally
we are working on a ZK based implementation of the store. Inigo has patch
in YARN-6900[1].
2. We rely on existing YARN/Hadoop security mechanisms for running
application on Federation as-is so you should not need any additional
Kerberos configuration. Disclaimer: we don't use Kerberos for securing
Hadoop but rely on our production infrastructure.

Thanks,
Subru

[1] https://issues.apache.org/jira/browse/YARN-6900

On Tue, Aug 1, 2017 at 1:25 PM, Daniel Templeton 
wrote:


Subru, sorry for the last minute contribution... :)  I've been looking at
the branch, and I have two questions.

First, what's the out-of-box experience regarding the data store? Is the
expectation that the user will have a database set up and ready to go?
Will the state store set up the schema automatically, or is that on the
user?  I don't see that in the docs.

Second, how well does federation play with Kerberos?  Anything special
that needs to be configured to make it work?

Daniel

On 7/25/17 8:24 PM, Subru Krishnan wrote:


Hi all,

Per earlier discussion [9], I'd like to start a formal vote to merge
feature YARN Federation (YARN-2915) [1] to trunk. The vote will run for 7
days, and will end Aug 1 7PM PDT.

We have been developing the feature in a branch (YARN-2915 [2]) for a
while, and we are reasonably confident that the state of the feature meets
the criteria to be merged onto trunk.

*Key Ideas*:

YARN’s centralized design allows strict enforcement of scheduling
invariants and effective resource sharing, but becomes a scalability
bottleneck (in number of jobs and nodes) well before reaching the scale of
our clusters (e.g., 20k-50k nodes).


To address these limitations, we developed a scale-out, federation-based
solution (YARN-2915). Our architecture scales near-linearly to datacenter
sized clusters, by partitioning nodes across multiple sub-clusters (each
running a YARN cluster of few thousands nodes). Applications can span
multiple sub-clusters *transparently (i.e. no code change or recompilation
of existing apps)*, thanks to a layer of indirection that negotiates with
multiple sub-clusters' Resource Managers on behalf of the application.


This design is structurally scalable, as it bounds the number of nodes
each
RM is responsible for. Appropriate policies ensure that the majority of
applications reside within a single sub-cluster, thus further controlling
the load on each RM. This provides near linear scale-out by simply adding
more sub-clusters. The same mechanism enables pooling of resources from
clusters owned and operated by different teams.

Status:

 - The version we would like to merge to trunk is termed "MVP" (minimal
 viable product). The feature will have a complete end-to-end
application
 execution flow with the ability to span a single application across
 multiple YARN (sub) clusters.
 - There were 50+ sub-tasks that were that were completed as part of
this
 effort. Every patch has been reviewed and +1ed by a committer. Thanks
to
 Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
 - Federation is designed to be built around YARN and consequently has
 minimal code changes to core YARN. The relevant JIRAs that modify
existing
 YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
 attention to ensure that if federation is disabled there is zero
impact to
 existing functionality (disabled by default).
 - We found a few bugs as we went along which we fixed directly
upstream
 in trunk and/or branch-2.
 - We have continuously rebasing the feature branch [2] so the merge
 should be a straightforward cherry-pick.
 - The current version has been rather thoroughly tested and is
currently
 deployed in a *10,000+ node federated YARN cluster that's running
 upwards of 50k jobs daily with a reliability of 99.9%*.
 - We have few ideas for follow-up extensions/improvements which are
 tracked in the umbrella JIRA YARN-5597[3].


Documentation:

 - Quick start guide (maven site) - YARN-6484[4].
 - Overall design doc[5] and the slide-deck [6] we used for our talk at
 Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.


Credits:

This is a group effort that could have not been possible without the ideas
and hard work of many other folks and we would like to specifically call
out Gi

Re: [Vote] merge feature branch YARN-2915 (Federation) to trunk

2017-08-01 Thread Daniel Templeton
Subru, sorry for the last minute contribution... :)  I've been looking 
at the branch, and I have two questions.


First, what's the out-of-box experience regarding the data store? Is the 
expectation that the user will have a database set up and ready to go?  
Will the state store set up the schema automatically, or is that on the 
user?  I don't see that in the docs.


Second, how well does federation play with Kerberos?  Anything special 
that needs to be configured to make it work?


Daniel

On 7/25/17 8:24 PM, Subru Krishnan wrote:

Hi all,

Per earlier discussion [9], I'd like to start a formal vote to merge
feature YARN Federation (YARN-2915) [1] to trunk. The vote will run for 7
days, and will end Aug 1 7PM PDT.

We have been developing the feature in a branch (YARN-2915 [2]) for a
while, and we are reasonably confident that the state of the feature meets
the criteria to be merged onto trunk.

*Key Ideas*:

YARN’s centralized design allows strict enforcement of scheduling
invariants and effective resource sharing, but becomes a scalability
bottleneck (in number of jobs and nodes) well before reaching the scale of
our clusters (e.g., 20k-50k nodes).


To address these limitations, we developed a scale-out, federation-based
solution (YARN-2915). Our architecture scales near-linearly to datacenter
sized clusters, by partitioning nodes across multiple sub-clusters (each
running a YARN cluster of few thousands nodes). Applications can span
multiple sub-clusters *transparently (i.e. no code change or recompilation
of existing apps)*, thanks to a layer of indirection that negotiates with
multiple sub-clusters' Resource Managers on behalf of the application.


This design is structurally scalable, as it bounds the number of nodes each
RM is responsible for. Appropriate policies ensure that the majority of
applications reside within a single sub-cluster, thus further controlling
the load on each RM. This provides near linear scale-out by simply adding
more sub-clusters. The same mechanism enables pooling of resources from
clusters owned and operated by different teams.

Status:

- The version we would like to merge to trunk is termed "MVP" (minimal
viable product). The feature will have a complete end-to-end application
execution flow with the ability to span a single application across
multiple YARN (sub) clusters.
- There were 50+ sub-tasks that were that were completed as part of this
effort. Every patch has been reviewed and +1ed by a committer. Thanks to
Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
- Federation is designed to be built around YARN and consequently has
minimal code changes to core YARN. The relevant JIRAs that modify existing
YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
attention to ensure that if federation is disabled there is zero impact to
existing functionality (disabled by default).
- We found a few bugs as we went along which we fixed directly upstream
in trunk and/or branch-2.
- We have continuously rebasing the feature branch [2] so the merge
should be a straightforward cherry-pick.
- The current version has been rather thoroughly tested and is currently
deployed in a *10,000+ node federated YARN cluster that's running
upwards of 50k jobs daily with a reliability of 99.9%*.
- We have few ideas for follow-up extensions/improvements which are
tracked in the umbrella JIRA YARN-5597[3].


Documentation:

- Quick start guide (maven site) - YARN-6484[4].
- Overall design doc[5] and the slide-deck [6] we used for our talk at
Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.


Credits:

This is a group effort that could have not been possible without the ideas
and hard work of many other folks and we would like to specifically call
out Giovanni, Botong & Ellen for their invaluable contributions. Also big
thanks to the many folks in community  (Sriram, Kishore, Sarvesh, Jian,
Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith and
many more) that helped us shape our ideas and code with very insightful
feedback and comments.

Cheers,
Subru & Carlo

[1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
[2] https://github.com/apache/hadoop/tree/YARN-2915
[3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
[4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
[5] https://issues.apache.org/jira/secure/attachment/12733292/Ya
rn_federation_design_v1.pdf
[6] https://issues.apache.org/jira/secure/attachment/1281922
9/YARN-Federation-Hadoop-Summit_final.pptx
[7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
[8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673
[9]
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201706.mbox/%3CCAOScs9bSsZ7mzH15Y%2BSPDU8YuNUAq7QicjXpDoX_tKh3MS4HsA%40mail.gmail.com%3E




--

Re: [Vote] merge feature branch YARN-2915 (Federation) to trunk

2017-08-01 Thread Daniel Templeton
Subru, sorry for the last minute contribution... :)  I've been looking 
at the branch, and I have two questions.


First, what's the out-of-box experience regarding the data store? Is the 
expectation that the user will have a database set up and ready to go?  
Will the state store set up the schema automatically, or is that on the 
user?  I don't see that in the docs.


Second, how well does federation play with Kerberos?  Anything special 
that needs to be configured to make it work?


Thanks!
Daniel

On 7/25/17 8:24 PM, Subru Krishnan wrote:

Hi all,

Per earlier discussion [9], I'd like to start a formal vote to merge
feature YARN Federation (YARN-2915) [1] to trunk. The vote will run for 7
days, and will end Aug 1 7PM PDT.

We have been developing the feature in a branch (YARN-2915 [2]) for a
while, and we are reasonably confident that the state of the feature meets
the criteria to be merged onto trunk.

*Key Ideas*:

YARN’s centralized design allows strict enforcement of scheduling
invariants and effective resource sharing, but becomes a scalability
bottleneck (in number of jobs and nodes) well before reaching the scale of
our clusters (e.g., 20k-50k nodes).


To address these limitations, we developed a scale-out, federation-based
solution (YARN-2915). Our architecture scales near-linearly to datacenter
sized clusters, by partitioning nodes across multiple sub-clusters (each
running a YARN cluster of few thousands nodes). Applications can span
multiple sub-clusters *transparently (i.e. no code change or recompilation
of existing apps)*, thanks to a layer of indirection that negotiates with
multiple sub-clusters' Resource Managers on behalf of the application.


This design is structurally scalable, as it bounds the number of nodes each
RM is responsible for. Appropriate policies ensure that the majority of
applications reside within a single sub-cluster, thus further controlling
the load on each RM. This provides near linear scale-out by simply adding
more sub-clusters. The same mechanism enables pooling of resources from
clusters owned and operated by different teams.

Status:

- The version we would like to merge to trunk is termed "MVP" (minimal
viable product). The feature will have a complete end-to-end application
execution flow with the ability to span a single application across
multiple YARN (sub) clusters.
- There were 50+ sub-tasks that were that were completed as part of this
effort. Every patch has been reviewed and +1ed by a committer. Thanks to
Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
- Federation is designed to be built around YARN and consequently has
minimal code changes to core YARN. The relevant JIRAs that modify existing
YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
attention to ensure that if federation is disabled there is zero impact to
existing functionality (disabled by default).
- We found a few bugs as we went along which we fixed directly upstream
in trunk and/or branch-2.
- We have continuously rebasing the feature branch [2] so the merge
should be a straightforward cherry-pick.
- The current version has been rather thoroughly tested and is currently
deployed in a *10,000+ node federated YARN cluster that's running
upwards of 50k jobs daily with a reliability of 99.9%*.
- We have few ideas for follow-up extensions/improvements which are
tracked in the umbrella JIRA YARN-5597[3].


Documentation:

- Quick start guide (maven site) - YARN-6484[4].
- Overall design doc[5] and the slide-deck [6] we used for our talk at
Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.


Credits:

This is a group effort that could have not been possible without the ideas
and hard work of many other folks and we would like to specifically call
out Giovanni, Botong & Ellen for their invaluable contributions. Also big
thanks to the many folks in community  (Sriram, Kishore, Sarvesh, Jian,
Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith and
many more) that helped us shape our ideas and code with very insightful
feedback and comments.

Cheers,
Subru & Carlo

[1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
[2] https://github.com/apache/hadoop/tree/YARN-2915
[3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
[4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
[5] https://issues.apache.org/jira/secure/attachment/12733292/Ya
rn_federation_design_v1.pdf
[6] https://issues.apache.org/jira/secure/attachment/1281922
9/YARN-Federation-Hadoop-Summit_final.pptx
[7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
[8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673
[9]
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201706.mbox/%3CCAOScs9bSsZ7mzH15Y%2BSPDU8YuNUAq7QicjXpDoX_tKh3MS4HsA%40mail.gmail.com%3E




--

[jira] [Created] (YARN-6912) Cluster Metrics API should report resource types information

2017-07-31 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6912:
--

 Summary: Cluster Metrics API should report resource types 
information
 Key: YARN-6912
 URL: https://issues.apache.org/jira/browse/YARN-6912
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6909) The performance advantages of YARN-6679 are lost when resource types are used

2017-07-31 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6909:
--

 Summary: The performance advantages of YARN-6679 are lost when 
resource types are used
 Key: YARN-6909
 URL: https://issues.apache.org/jira/browse/YARN-6909
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton
Priority: Critical


YARN-6679 added the {{SimpleResource}} as a lightweight replacement for 
{{ResourcePBImpl}} when a protobuf isn't needed.  With resource types enabled 
and anything other than memory and CPU defined, {{ResourcePBImpl}} will always 
be used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6908) ResourceProfilesManagerImpl is missing @Overrides on methods

2017-07-31 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6908:
--

 Summary: ResourceProfilesManagerImpl is missing @Overrides on 
methods
 Key: YARN-6908
 URL: https://issues.apache.org/jira/browse/YARN-6908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6907) Node information page in the old web UI should report resource types

2017-07-31 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6907:
--

 Summary: Node information page in the old web UI should report 
resource types
 Key: YARN-6907
 URL: https://issues.apache.org/jira/browse/YARN-6907
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6906) Cluster Node API and Cluster Nodes API should report resource types usage

2017-07-31 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6906:
--

 Summary: Cluster Node API and Cluster Nodes API should report 
resource types usage
 Key: YARN-6906
 URL: https://issues.apache.org/jira/browse/YARN-6906
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-3926
Reporter: Daniel Templeton


These endpoints currently report:

{noformat}

/default-rack
RUNNING
localhost:51877
localhost
localhost:8042
1501534150336
3.0.0-beta1-SNAPSHOT

4
5120
3072
4
0
0
0
0
0

0
0
0.0
0
0
0.0


{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6886) AllocationFileLoaderService.loadQueue() should validate that setting do not conflict with parent

2017-07-26 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6886:
--

 Summary: AllocationFileLoaderService.loadQueue() should validate 
that setting do not conflict with parent
 Key: YARN-6886
 URL: https://issues.apache.org/jira/browse/YARN-6886
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Minor


Some settings, like policy, are limited by the queue's parent queue's 
configuration.  We should check those settings when we load the file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6885) AllocationFileLoaderService.loadQueue() should use a switch statement in the main tag parsing loop instead of the if/else-if/...

2017-07-26 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6885:
--

 Summary: AllocationFileLoaderService.loadQueue() should use a 
switch statement in the main tag parsing loop instead of the if/else-if/...
 Key: YARN-6885
 URL: https://issues.apache.org/jira/browse/YARN-6885
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Minor


{code}  if ("minResources".equals(field.getTagName())) {
String text = ((Text)field.getFirstChild()).getData().trim();
Resource val =
FairSchedulerConfiguration.parseResourceConfigValue(text);
minQueueResources.put(queueName, val);
  } else if ("maxResources".equals(field.getTagName())) {
  ...{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6884) AllocationFileLoaderService.loadQueue() has an if without braces

2017-07-26 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6884:
--

 Summary: AllocationFileLoaderService.loadQueue() has an if without 
braces
 Key: YARN-6884
 URL: https://issues.apache.org/jira/browse/YARN-6884
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Trivial


{code}  if (!(fieldNode instanceof Element))
continue;{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6883) AllocationFileLoaderService.reloadAllocations() should use a switch statement in the main tag parsing loop instead of the if/else-if/...

2017-07-26 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6883:
--

 Summary: AllocationFileLoaderService.reloadAllocations() should 
use a switch statement in the main tag parsing loop instead of the 
if/else-if/...
 Key: YARN-6883
 URL: https://issues.apache.org/jira/browse/YARN-6883
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Minor


{code}if ("queue".equals(element.getTagName()) ||
  "pool".equals(element.getTagName())) {
  queueElements.add(element);
} else if ("user".equals(element.getTagName())) {
...{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6882) AllocationFileLoaderService.reloadAllocations() should use the diamond operator

2017-07-26 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6882:
--

 Summary: AllocationFileLoaderService.reloadAllocations() should 
use the diamond operator
 Key: YARN-6882
 URL: https://issues.apache.org/jira/browse/YARN-6882
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Trivial


Here:{code}for (FSQueueType queueType : FSQueueType.values()) {
  configuredQueues.put(queueType, new HashSet());
}{code} and here:{code}List queueElements = new 
ArrayList();{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6881) LOG is unused in AllocationConfiguration

2017-07-26 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6881:
--

 Summary: LOG is unused in AllocationConfiguration
 Key: YARN-6881
 URL: https://issues.apache.org/jira/browse/YARN-6881
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton


The variable can be removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6880) FSQueue.reservedResource can be final

2017-07-26 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6880:
--

 Summary: FSQueue.reservedResource can be final
 Key: YARN-6880
 URL: https://issues.apache.org/jira/browse/YARN-6880
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6879) TestLeafQueue.testDRFUserLimits() has commented out code

2017-07-26 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6879:
--

 Summary: TestLeafQueue.testDRFUserLimits() has commented out code
 Key: YARN-6879
 URL: https://issues.apache.org/jira/browse/YARN-6879
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, test
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Trivial


The commented-out code should be deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



  1   2   3   >