Re: [VOTE] Moving Ozone to a separated Apache project

2020-09-29 Thread Subru Krishnan
+1

Thanks,
Subru

On Mon, Sep 28, 2020 at 6:49 AM Adam Antal 
wrote:

> +1
>
> Thanks,
> Adam Antal
>
> On Mon, Sep 28, 2020 at 2:19 PM 孙立晟  wrote:
>
> > +1
> >
> > Thanks,
> > Lisheng Sun
> >
> > Masatake Iwasaki  于2020年9月28日周一 下午7:19写道:
> >
> > > +1
> > >
> > > Thanks,
> > > Masatake Iwasaki
> > >
> > > On 2020/09/25 14:59, Elek, Marton wrote:
> > > > Hi all,
> > > >
> > > > Thank you for all the feedback and requests,
> > > >
> > > > As we discussed in the previous thread(s) [1], Ozone is proposed to
> be
> > a
> > > separated Apache Top Level Project (TLP)
> > > >
> > > > The proposal with all the details, motivation and history is here:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Hadoop+subproject+to+Apache+TLP+proposal
> > > >
> > > > This voting runs for 7 days and will be concluded at 2nd of October,
> > 6AM
> > > GMT.
> > > >
> > > > Thanks,
> > > > Marton Elek
> > > >
> > > > [1]:
> > >
> >
> https://lists.apache.org/thread.html/rc6c79463330b3e993e24a564c6817aca1d290f186a1206c43ff0436a%40%3Chdfs-dev.hadoop.apache.org%3E
> > > >
> > > > -
> > > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> > > >
> > >
> > > -
> > > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> > >
> > >
> >
>


Re: [DISCUSS] making Ozone a separate Apache project

2020-05-15 Thread Subru Krishnan
+1.

Thanks,
Subru

On Thu, May 14, 2020 at 10:00 PM Akira Ajisaka  wrote:

> +1
>
> -Akira
>
> On Wed, May 13, 2020 at 4:53 PM Elek, Marton  wrote:
>
> >
> >
> > I would like to start a discussion to make a separate Apache project for
> > Ozone
> >
> >
> >
> > ### HISTORY [1]
> >
> >   * Apache Hadoop Ozone development started on a feature branch of
> > Hadoop repository (HDFS-7240)
> >
> >   * In the October of 2017 a discussion has been started to merge it to
> > the Hadoop main branch
> >
> >   * After a long discussion it's merged to Hadoop trunk at the March of
> > 2018
> >
> >   * During the discussion of the merge, it was suggested multiple times
> > to create a separated project for the Ozone. But at that time:
> >  1). Ozone was tightly integrated with Hadoop/HDFS
> >  2). There was an active plan to use Block layer of Ozone (HDDS or
> > HDSL at that time) as the block level of HDFS
> >  3). The community of Ozone was a subset of the HDFS community
> >
> >   * The first beta release of Ozone was just released. Seems to be a
> > good time before the first GA to make a decision about the future.
> >
> >
> >
> > ### WHAT HAS BEEN CHANGED
> >
> >   During the last years Ozone became more and more independent both at
> > the community and code side. The separation has been suggested again and
> > again (for example by Owen [2] and Vinod [3])
> >
> >
> >
> >   From COMMUNITY point of view:
> >
> >
> >* Fortunately more and more new contributors are helping Ozone.
> > Originally the Ozone community was a subset of HDFS project. But now a
> > bigger and bigger part of the community is related to Ozone only.
> >
> >* It seems to be easier to _build_ the community as a separated
> project.
> >
> >* A new, younger project might have different practices
> > (communication, commiter criteria, development style) compared to old,
> > mature project
> >
> >* It's easier to communicate (and improve) these standards in a
> > separated projects with clean boundaries
> >
> >* Separated project/brand can help to increase the adoption rate and
> > attract more individual contributor (AFAIK it has been seen in Submarine
> > after a similar move)
> >
> >   * Contribution process can be communicated more easily, we can make
> > first time contribution more easy
> >
> >
> >
> >   From CODE point of view Ozone became more and more independent:
> >
> >
> >   * Ozone has different release cycle
> >
> >   * Code is already separated from Hadoop code base
> > (apache/hadoop-ozone.git)
> >
> >   * It has separated CI (github actions)
> >
> >   * Ozone uses different (more strict) coding style (zero toleration of
> > unit test / checkstyle errors)
> >
> >   * The code itself became more and more independent from Hadoop on
> > Maven level. Originally it was compiled together with the in-tree latest
> > Hadoop snapshot. Now it depends on released Hadoop artifacts (RPC,
> > Configuration...)
> >
> >   * It starts to use multiple version of Hadoop (on client side)
> >
> >   * Volume of resolved issues are already very high on Ozone side (Ozone
> > had slightly more resolved issues than HDFS/YARN/MAPREDUCE/COMMON all
> > together in the last 2-3 months)
> >
> >
> > Summary: Before the first Ozone GA release, It seems to be a good time
> > to discuss the long-term future of Ozone. Managing it as a separated TLP
> > project seems to have more benefits.
> >
> >
> > Please let me know what your opinion is...
> >
> > Thanks a lot,
> > Marton
> >
> >
> >
> >
> >
> > [1]: For more details, see:
> > https://github.com/apache/hadoop-ozone/blob/master/HISTORY.md
> >
> > [2]:
> >
> >
> https://lists.apache.org/thread.html/0d0253f6e5fa4f609bd9b917df8e1e4d8848e2b7fdb3099b730095e6%40%3Cprivate.hadoop.apache.org%3E
> >
> > [3]:
> >
> >
> https://lists.apache.org/thread.html/8be74421ea495a62e159f2b15d74627c63ea1f67a2464fa02c85d4aa%40%3Chdfs-dev.hadoop.apache.org%3E
> >
> > -
> > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> >
> >
>


Re: [VOTE] EOL Hadoop branch-2.8

2020-03-03 Thread Subru Krishnan
+1

On Tue, Mar 3, 2020 at 11:43 AM Wangda Tan  wrote:

> +1
>
> On Mon, Mar 2, 2020 at 8:50 PM Akira Ajisaka  wrote:
>
> > +1
> >
> > -Akira
> >
> > On Tue, Mar 3, 2020 at 4:55 AM Ayush Saxena  wrote:
> >
> > > +1 for marking 2.8 EOL
> > >
> > > -Ayush
> > >
> > > > On 03-Mar-2020, at 12:18 AM, Wei-Chiu Chuang 
> > wrote:
> > > >
> > > > I am sorry I forgot to start a VOTE thread.
> > > >
> > > > This is the "official" vote thread to mark branch-2.8 End of Life.
> This
> > > is
> > > > based on the following thread and the tracking jira (HADOOP-16880
> > > > ).
> > > >
> > > > This vote will run for 7 days and conclude on March 9th (Mon) 11am
> PST.
> > > >
> > > > Please feel free to share your thoughts.
> > > >
> > > > Thanks,
> > > > Weichiu
> > > >
> > > >> On Mon, Feb 24, 2020 at 10:28 AM Wei-Chiu Chuang <
> > weic...@cloudera.com>
> > > >> wrote:
> > > >>
> > > >> Looking at the EOL policy wiki:
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches
> > > >>
> > > >> The Hadoop community can still elect to make security update for
> > EOL'ed
> > > >> releases.
> > > >>
> > > >> I think the EOL is to give more clarity to downstream applications
> > (such
> > > >> as HBase) the guidance of which Hadoop release lines are still
> active.
> > > >> Additionally, I don't think it is sustainable to maintain 6
> concurrent
> > > >> release lines in this big project, which is why I wanted to start
> this
> > > >> discussion.
> > > >>
> > > >> Thoughts?
> > > >>
> > > >>> On Mon, Feb 24, 2020 at 10:22 AM Sunil Govindan  >
> > > wrote:
> > > >>>
> > > >>> Hi Wei-Chiu
> > > >>>
> > > >>> Extremely sorry for the late reply here.
> > > >>> Cud u pls help to add more clarity on defining what will happen for
> > > >>> branch-2.8 when we call EOL.
> > > >>> Does this mean that, no more release coming out from this branch,
> or
> > > some
> > > >>> more additional guidelines?
> > > >>>
> > > >>> - Sunil
> > > >>>
> > > >>>
> > > >>> On Mon, Feb 24, 2020 at 11:47 PM Wei-Chiu Chuang
> > > >>>  wrote:
> > > >>>
> > >  This thread has been running for 7 days and no -1.
> > > 
> > >  Don't think we've established a formal EOL process, but to
> publicize
> > > the
> > >  EOL, I am going to file a jira, update the wiki and post the
> > > >>> announcement
> > >  to general@ and user@
> > > 
> > >  On Wed, Feb 19, 2020 at 1:40 PM Dinesh Chitlangia <
> > > >>> dineshc@gmail.com>
> > >  wrote:
> > > 
> > > > Thanks Wei-Chiu for initiating this.
> > > >
> > > > +1 for 2.8 EOL.
> > > >
> > > > On Tue, Feb 18, 2020 at 10:48 PM Akira Ajisaka <
> > aajis...@apache.org>
> > > > wrote:
> > > >
> > > >> Thanks Wei-Chiu for starting the discussion,
> > > >>
> > > >> +1 for the EoL.
> > > >>
> > > >> -Akira
> > > >>
> > > >> On Tue, Feb 18, 2020 at 4:59 PM Ayush Saxena <
> ayush...@gmail.com>
> > >  wrote:
> > > >>
> > > >>> Thanx Wei-Chiu for initiating this
> > > >>> +1 for marking 2.8 EOL
> > > >>>
> > > >>> -Ayush
> > > >>>
> > >  On 17-Feb-2020, at 11:14 PM, Wei-Chiu Chuang <
> > > >>> weic...@apache.org>
> > > >> wrote:
> > > 
> > >  The last Hadoop 2.8.x release, 2.8.5, was GA on September
> 15th
> > >  2018.
> > > 
> > >  It's been 17 months since the release and the community by and
> > >  large
> > > >> have
> > >  moved up to 2.9/2.10/3.x.
> > > 
> > >  With Hadoop 3.3.0 over the horizon, is it time to start the
> EOL
> > > >>> discussion
> > >  and reduce the number of active branches?
> > > >>>
> > > >>>
> > > >>>
> -
> > > >>> To unsubscribe, e-mail:
> common-dev-unsubscr...@hadoop.apache.org
> > > >>> For additional commands, e-mail:
> > > >>> common-dev-h...@hadoop.apache.org
> > > >>>
> > > >>>
> > > >>
> > > >
> > > 
> > > >>>
> > > >>
> > >
> > > -
> > > To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
> > >
> > >
> >
>


Re: [DISCUSS] Remove Ozone and Submarine from Hadoop repo

2019-10-24 Thread Subru Krishnan
+1.

Thanks,
Subru

On Thu, Oct 24, 2019 at 12:51 AM 张铎(Duo Zhang) 
wrote:

> +1
>
> Akira Ajisaka  于2019年10月24日周四 下午3:21写道:
>
> > Hi folks,
> >
> > Both Ozone and Apache Submarine have separate repositories.
> > Can we remove these modules from hadoop-trunk?
> >
> > Regards,
> > Akira
> >
>


Re: [DISCUSS] Separate Hadoop Core trunk and Hadoop Ozone trunk source tree

2019-09-17 Thread Subru Krishnan
+1 (binding).

IIUC, there will not be an Ozone module in trunk anymore as that was my
only concern from the original discussion thread? IMHO, this should be the
default approach for new modules.

On Tue, Sep 17, 2019 at 9:58 AM Salvatore LaMendola (BLOOMBERG/ 731 LEX) <
slamendo...@bloomberg.net> wrote:

> +1
>
> From: e...@apache.org At: 09/17/19 05:48:32To:  hdfs-...@hadoop.apache.org,
> mapreduce-...@hadoop.apache.org,  common-...@hadoop.apache.org,
> yarn-dev@hadoop.apache.org
> Subject: [DISCUSS] Separate Hadoop Core trunk and Hadoop Ozone trunk
> source tree
>
>
> TLDR; I propose to move Ozone related code out from Hadoop trunk and
> store it in a separated *Hadoop* git repository apache/hadoop-ozone.git
>
>
> When Ozone was adopted as a new Hadoop subproject it was proposed[1] to
> be part of the source tree but with separated release cadence, mainly
> because it had the hadoop-trunk/SNAPSHOT as compile time dependency.
>
> During the last Ozone releases this dependency is removed to provide
> more stable releases. Instead of using the latest trunk/SNAPSHOT build
> from Hadoop, Ozone uses the latest stable Hadoop (3.2.0 as of now).
>
> As we have no more strict dependency between Hadoop trunk SNAPSHOT and
> Ozone trunk I propose to separate the two code base from each other with
> creating a new Hadoop git repository (apache/hadoop-ozone.git):
>
> With moving Ozone to a separated git repository:
>
>   * It would be easier to contribute and understand the build (as of now
> we always need `-f pom.ozone.xml` as a Maven parameter)
>   * It would be possible to adjust build process without breaking
> Hadoop/Ozone builds.
>   * It would be possible to use different Readme/.asf.yaml/github
> template for the Hadoop Ozone and core Hadoop. (For example the current
> github template [2] has a link to the contribution guideline [3]. Ozone
> has an extended version [4] from this guideline with additional
> information.)
>   * Testing would be more safe as it won't be possible to change core
> Hadoop and Hadoop Ozone in the same patch.
>   * It would be easier to cut branches for Hadoop releases (based on the
> original consensus, Ozone should be removed from all the release
> branches after creating relase branches from trunk)
>
>
> What do you think?
>
> Thanks,
> Marton
>
> [1]:
>
> https://lists.apache.org/thread.html/c85e5263dcc0ca1d13cbbe3bcfb53236784a39111b8
> c353f60582eb4@%3Chdfs-dev.hadoop.apache.org%3E
> [2]:
>
> https://github.com/apache/hadoop/blob/trunk/.github/pull_request_template.md
> [3]: https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
> [4]:
>
> https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute+to+Ozone
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>
>


Re: [VOTE] Mark 2.6, 2.7, 3.0 release lines EOL

2019-08-21 Thread Subru Krishnan
+1

On Wed, Aug 21, 2019 at 8:52 AM Kihwal Lee 
wrote:

> +1
>
> Kihwal
>
> On Tue, Aug 20, 2019 at 10:03 PM Wangda Tan  wrote:
>
> > Hi all,
> >
> > This is a vote thread to mark any versions smaller than 2.7 (inclusive),
> > and 3.0 EOL. This is based on discussions of [1]
> >
> > This discussion runs for 7 days and will conclude on Aug 28 Wed.
> >
> > Please feel free to share your thoughts.
> >
> > Thanks,
> > Wangda
> >
> > [1]
> >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201908.mbox/%3cCAD++eC=ou-tit1faob-dbecqe6ht7ede7t1dyra2p1yinpe...@mail.gmail.com%3e
> > ,
> >
>


Re: [DISCUSS] Move to gitbox

2018-12-11 Thread Subru Krishnan
+1.

On Tue, Dec 11, 2018 at 12:10 AM Mukul Kumar Singh 
wrote:

> +1
> -Mukul
>
> On 12/11/18, 9:56 AM, "Weiwei Yang"  wrote:
>
> +1
>
> On Tue, Dec 11, 2018 at 10:51 AM Anu Engineer <
> aengin...@hortonworks.com>
> wrote:
>
> > +1
> > --Anu
> >
> >
> > On 12/10/18, 6:38 PM, "Vinayakumar B" 
> wrote:
> >
> > +1
> >
> > -Vinay
> >
> > On Mon, 10 Dec 2018, 1:22 pm Elek, Marton  wrote:
> >
> > >
> > > Thanks Akira,
> > >
> > > +1 (non-binding)
> > >
> > > I think it's better to do it now at a planned date.
> > >
> > > If I understood well the only bigger task here is to update
> all the
> > > jenkins jobs. (I am happy to help/contribute what I can do)
> > >
> > >
> > > Marton
> > >
> > > On 12/8/18 6:25 AM, Akira Ajisaka wrote:
> > > > Hi all,
> > > >
> > > > Apache Hadoop git repository is in git-wip-us server and it
> will be
> > > > decommissioned.
> > > > If there are no objection, I'll file a JIRA ticket with
> INFRA to
> > > > migrate to https://gitbox.apache.org/ and update
> documentation.
> > > >
> > > > According to ASF infra team, the timeframe is as follows:
> > > >
> > > >> - December 9th 2018 -> January 9th 2019: Voluntary
> (coordinated)
> > > relocation
> > > >> - January 9th -> February 6th: Mandated (coordinated)
> relocation
> > > >> - February 7th: All remaining repositories are mass
> migrated.
> > > >> This timeline may change to accommodate various scenarios.
> > > >
> > > > If we got consensus by January 9th, I can file a ticket with
> INFRA
> > and
> > > > migrate it.
> > > > Even if we cannot got consensus, the repository will be
> migrated by
> > > > February 7th.
> > > >
> > > > Regards,
> > > > Akira
> > > >
> > > >
> > -
> > > > To unsubscribe, e-mail:
> yarn-dev-unsubscr...@hadoop.apache.org
> > > > For additional commands, e-mail:
> yarn-dev-h...@hadoop.apache.org
> > > >
> > >
> > >
> -
> > > To unsubscribe, e-mail:
> common-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail:
> common-dev-h...@hadoop.apache.org
> > >
> > >
> >
> >
> >
> > -
> > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> >
>
>
> --
> Weiwei Yang
>
>
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>


New committer: Botong Huang

2018-11-21 Thread Subru Krishnan
The Project Management Committee (PMC) for Apache Hadoophas invited
Botong Huang to become a committer and we are pleased to announce that
he has accepted.
Being a committer enables easier contribution to theproject since
there is no need to go via the patchsubmission process. This should
enable better productivity.Being a PMC member enables assistance with
the managementand to guide the direction of the project.

Congrats and welcome aboard.

-Subru


[jira] [Resolved] (YARN-8755) Add clean up for FederationStore apps

2018-09-07 Thread Subru Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan resolved YARN-8755.
--
Resolution: Duplicate

[~bibinchundatt], this should be addressed by YARN-6648 & YARN-7599. Your 
review of the latter will be appreciated.

Thanks.

> Add clean up for FederationStore apps
> -
>
> Key: YARN-8755
> URL: https://issues.apache.org/jira/browse/YARN-8755
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Priority: Major
>
> We should add clean up logic for applications to home cluster mapping  in 
> federation State store. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Hadoop 3.2 Release Plan proposal

2018-07-19 Thread Subru Krishnan
Thanks Sunil for volunteering to lead the release effort. I am generally
supportive of a release but -1 on a 3.2 (prefer a 3.1.x) as feel we already
have too many branches to be maintained. I already see many commits are in
different branches with no apparent rationale, for e.g: 3.1 has commits
which are absent in 3.0 etc.

Additionally AFAIK 3.x has not been deployed in any major production
setting so the cost of adding features should be minimal.

Thoughts?

-Subru

On Thu, Jul 19, 2018 at 12:31 AM, Sunil G  wrote:

> Thanks Steve, Aaron, Wangda for sharing thoughts.
>
> Yes, important changes and features are much needed, hence we will be
> keeping the door open for them as possible. Also considering few more
> offline requests from other folks, I think extending the timeframe by
> couple of weeks makes sense (including a second RC buffer) and this should
> ideally help us to ship this by September itself.
>
> Revised dates (I will be updating same in Roadmap wiki as well)
>
> - Feature freeze date : all features to merge by August 21, 2018.
>
> - Code freeze date : blockers/critical only, no improvements and non
> blocker/critical
>
> bug-fixes  August 31, 2018.
>
> - Release date: September 15, 2018
>
> Thank Eric and Zian, I think Wangda has already answered your questions.
>
> Thanks
> Sunil
>
>
> On Thu, Jul 19, 2018 at 12:13 PM Wangda Tan  wrote:
>
> > Thanks Sunil for volunteering to be RM of 3.2 release, +1 for that.
> >
> > To concerns from Steve,
> >
> > It is a good idea to keep the door open to get important changes /
> > features in before cutoff. I would prefer to keep the proposed release
> date
> > to make sure things can happen earlier instead of last minute and we all
> > know that releases are always get delayed :). I'm also fine if we want
> get
> > another several weeks time.
> >
> > Regarding of 3.3 release, I would suggest doing that before thanksgiving.
> > Do you think is it good or too early / late?
> >
> > Eric,
> >
> > The YARN-8220 will be replaced by YARN-8135, if YARN-8135 can get merged
> > in time, we probably not need the YARN-8220.
> >
> > Sunil,
> >
> > Could u update https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap
> > with proposed plan as well? We can fill feature list first before getting
> > consensus of time.
> >
> > Thanks,
> > Wangda
> >
> > On Wed, Jul 18, 2018 at 6:20 PM Aaron Fabbri  >
> > wrote:
> >
> >> On Tue, Jul 17, 2018 at 7:21 PM Steve Loughran 
> >> wrote:
> >>
> >> >
> >> >
> >> > On 16 Jul 2018, at 23:45, Sunil G  >> > sun...@apache.org>> wrote:
> >> >
> >> > I would also would like to take this opportunity to come up with a
> >> detailed
> >> > plan.
> >> >
> >> > - Feature freeze date : all features should be merged by August 10,
> >> 2018.
> >> >
> >> >
> >> >
> >> > 
> >>
> >> >
> >> > Please let me know if I missed any features targeted to 3.2 per this
> >> >
> >> >
> >> > Well there these big todo lists for S3 & S3Guard.
> >> >
> >> > https://issues.apache.org/jira/browse/HADOOP-15226
> >> > https://issues.apache.org/jira/browse/HADOOP-15220
> >> >
> >> >
> >> > There's a bigger bit of work coming on for Azure Datalake Gen 2
> >> > https://issues.apache.org/jira/browse/HADOOP-15407
> >> >
> >> > I don't think this is quite ready yet, I've been doing work on it, but
> >> if
> >> > we have a 3 week deadline, I'm going to expect some timely reviews on
> >> > https://issues.apache.org/jira/browse/HADOOP-15546
> >> >
> >> > I've uprated that to a blocker feature; will review the S3 & S3Guard
> >> JIRAs
> >> > to see which of those are blocking. Then there are some pressing
> "guave,
> >> > java 9 prep"
> >> >
> >> >
> >>  I can help with this part if you like.
> >>
> >>
> >>
> >> >
> >> >
> >> >
> >> > timeline. I would like to volunteer myself as release manager of 3.2.0
> >> > release.
> >> >
> >> >
> >> > well volunteered!
> >> >
> >> >
> >> >
> >> Yes, thank you for stepping up.
> >>
> >>
> >> >
> >> > I think this raises a good q: what timetable should we have for the
> >> 3.2. &
> >> > 3.3 releases; if we do want a faster cadence, then having the outline
> >> time
> >> > from the 3.2 to the 3.3 release means that there's less concern about
> >> > things not making the 3.2 dealine
> >> >
> >> > -Steve
> >> >
> >> >
> >> Good idea to mitigate the short deadline.
> >>
> >> -AF
> >>
> >
>


Re: [VOTE] reset/force push to clean up inadvertent merge commit pushed to trunk

2018-07-06 Thread Subru Krishnan
t;
> >
> >
> > Thanks
> > Anu
> >
> >
> > On 7/6/18, 10:24 AM, "Arpit Agarwal"  mailto:
> >
> > aagar...@hortonworks.com>> wrote:
> >
> >
> >   -1 for the force push. Nothing is broken in trunk. The history
> looks
> >
> > ugly for two commits and we can live with it.
> >
> >
> >   The revert restored the branch to Giovanni's intent. i.e. only
> >
> > YARN-8435 is applied. Verified there is no delta between hashes
> 0d9804d and
> > 39ad989 (HEAD).
> >
> >
> >   39ad989 2018-07-05 aengineer@ o {apache/trunk} Revert "Merge
> branch
> >
> > 't...
> >
> >   c163d17 2018-07-05 gifuma@apa M─┐ Merge branch 'trunk' of
> >
> > https://git-...
> >
> >   99febe7 2018-07-05 rkanter@ap │ o YARN-7451. Add missing tests to
> >
> > veri...
> >
> >   1726247 2018-07-05 haibochen@ │ o YARN-7556. Fair scheduler
> >
> > configurat...
> >
> >   0d9804d 2018-07-05 gifuma@apa o │ YARN-8435. Fix NPE when the same
> >
> > cli...
> >
> >   71df8c2 2018-07-05 nanda@apac o─┘ HDDS-212. Introduce
> >
> > NodeStateManager...
> >
> >
> >   Regards,
> >   Arpit
> >
> >
> >   On 7/5/18, 2:37 PM, "Subru Krishnan"  >
> > su...@apache.org>> wrote:
> >
> >
> >   Folks,
> >
> >   There was a merge commit accidentally pushed to trunk, you can
> >
> > find the
> >
> >   details in the mail thread [1].
> >
> >   I have raised an INFRA ticket [2] to reset/force push to clean
> up
> >
> > trunk.
> >
> >
> >   Can we have a quick vote for INFRA sign-off to proceed as this
> is
> >
> > blocking
> >
> >   all commits?
> >
> >   Thanks,
> >   Subru
> >
> >   [1]
> >
> >
> > http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/
> 201807.mbox/%3CCAHqguubKBqwfUMwhtJuSD7X1Bgfro_P6FV%2BhhFhMMYRaxFsF9Q%
> 40mail.gmail.com%3E
> >
> >   [2] https://issues.apache.org/jira/browse/INFRA-16727
> >
> >
> >
> >   
> -
> >   To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> >
> > <mailto:common-dev-unsubscr...@hadoop.apache.org
> > >
> >
> >   For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >
> > <mailto:common-dev-h...@hadoop.apache.org
> > >
> >
> >
> >
> >
> > 
> -
> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>  >
> > common-dev-unsubscr...@hadoop.apache.org>
> >
> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >
> > <mailto:common-dev-h...@hadoop.apache.org
> > >
> >
> > 
> -
> > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
>  > yarn-dev-unsubscr...@hadoop.apache.org>
> > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>  > yarn-dev-h...@hadoop.apache.org>
> >
> >
> >
>
>
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


Re: [VOTE] reset/force push to clean up inadvertent merge commit pushed to trunk

2018-07-05 Thread Subru Krishnan
Unfortunately since it was a merge commit, less straightforward to revert.
You can find the details in the original mail thread:
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201807.mbox/%3CCAHqguubKBqwfUMwhtJuSD7X1Bgfro_P6FV%2BhhFhMMYRaxFsF9Q%40mail.gmail.com%3E

On Thu, Jul 5, 2018 at 2:49 PM, Wei-Chiu Chuang  wrote:

> I'm sorry I come to this thread late.
> Anu commented on INFRA-16727 saying he reverted the commit. Do we still
> need the vote?
>
> Thanks
>
> On Thu, Jul 5, 2018 at 2:47 PM Rohith Sharma K S <
> rohithsharm...@apache.org> wrote:
>
>> +1
>>
>> On 5 July 2018 at 14:37, Subru Krishnan  wrote:
>>
>> > Folks,
>> >
>> > There was a merge commit accidentally pushed to trunk, you can find the
>> > details in the mail thread [1].
>> >
>> > I have raised an INFRA ticket [2] to reset/force push to clean up trunk.
>> >
>> > Can we have a quick vote for INFRA sign-off to proceed as this is
>> blocking
>> > all commits?
>> >
>> > Thanks,
>> > Subru
>> >
>> > [1]
>> > http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201807.mbox/%
>> > 3CCAHqguubKBqwfUMwhtJuSD7X1Bgfro_P6FV%2BhhFhMMYRaxFsF9Q%
>> > 40mail.gmail.com%3E
>> > [2] https://issues.apache.org/jira/browse/INFRA-16727
>> >
>>
>> --
>> A very happy Hadoop contributor
>>
>


[VOTE] reset/force push to clean up inadvertent merge commit pushed to trunk

2018-07-05 Thread Subru Krishnan
Folks,

There was a merge commit accidentally pushed to trunk, you can find the
details in the mail thread [1].

I have raised an INFRA ticket [2] to reset/force push to clean up trunk.

Can we have a quick vote for INFRA sign-off to proceed as this is blocking
all commits?

Thanks,
Subru

[1]
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201807.mbox/%3CCAHqguubKBqwfUMwhtJuSD7X1Bgfro_P6FV%2BhhFhMMYRaxFsF9Q%40mail.gmail.com%3E
[2] https://issues.apache.org/jira/browse/INFRA-16727


Re: Merge branch commit in trunk by mistake

2018-07-05 Thread Subru Krishnan
Looking at the merge commit, I feel it's better to reset/force push
especially since this is still the latest commit on trunk.

I have raised an INFRA ticket requesting the same:
https://issues.apache.org/jira/browse/INFRA-16727

-S

On Thu, Jul 5, 2018 at 11:45 AM, Sean Busbey 
wrote:

> FYI, no images make it through ASF mailing lists. I presume the image was
> of the git history? If that's correct, here's what that looks like in a
> paste:
>
> https://paste.apache.org/eRix
>
> There are no force pushes on trunk, so backing the change out would require
> the PMC asking INFRA to unblock force pushes for a period of time.
>
> Probably the merge commit isn't a big enough deal to do that. There was a
> merge commit ~5 months ago for when YARN-6592 merged into trunk.
>
> So I'd say just try to avoid doing it in the future?
>
> -busbey
>
> On Thu, Jul 5, 2018 at 1:31 PM, Giovanni Matteo Fumarola <
> giovanni.fumar...@gmail.com> wrote:
>
> > Hi folks,
> >
> > After I pushed something on trunk a merge commit showed up in the
> history. *My
> > bad*.
> >
> >
> >
> > Since it was one of my first patches, I run a few tests on my machine
> > before checked in.
> > While I was running all the tests, someone else checked in. I correctly
> > pulled all the new changes.
> >
> > Even before I did the "git push" there was no merge commit in my history.
> >
> > Can someone help me reverting this change?
> >
> > Thanks
> > Giovanni
> >
> >
> >
>
>
> --
> busbey
>


[DISCUSS] 2.9+ stabilization branch

2018-02-26 Thread Subru Krishnan
Folks,

We (i.e. Microsoft) have started stabilization of 2.9 for our production
deployment. During planning, we realized that we need to backport 3.x
features to support GPUs (and more resource types like network IO) natively
as part of the upgrade. We'd like to share that work with the community.

Instead of stabilizing the base release and cherry-picking fixes back to
Apache, we want to work publicly and push fixes directly into
trunk/.../branch-2 for a stable 2.10.0 release. Our goal is to create a
bridge release for our production clusters to the 3.x series and to address
scalability problems in large clusters (N*10k nodes). As we find issues, we
will file JIRAs and track resolution of significant regressions/faults in
wiki. Moreover, LinkedIn also has committed plans for a production
deployment of the same branch. We welcome broad participation, particularly
since we'll be stabilizing relatively new features.

The exact list of features we would like to backport in YARN are:

   - Support for Resource types [1][2]
   - Native support for GPUs[3]
   - Absolute Resource configuration in CapacityScheduler [4]


With regards to HDFS, we are currently looking at mainly fixes to Router
based Federation and Windows specific fixes which should anyways flow
normally.

Thoughts?

Thanks,
Subru/Arun

[1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg27786.html
[2] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg28281.html
[3] https://issues.apache.org/jira/browse/YARN-6223
[4] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg28772.html


[jira] [Created] (YARN-7652) Handle AM register requests asynchronously in FederationInterceptor

2017-12-13 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-7652:


 Summary: Handle AM register requests asynchronously in 
FederationInterceptor
 Key: YARN-7652
 URL: https://issues.apache.org/jira/browse/YARN-7652
 Project: Hadoop YARN
  Issue Type: Bug
  Components: amrmproxy, federation
Affects Versions: 2.9.0, 3.0.0
Reporter: Subru Krishnan
Assignee: Botong Huang


We (cc [~goiri]/[~botong]) observed that the {{FederationInterceptor}} in 
{{AMRMProxy}} and consequently the application is blocked if the _StateStore_ 
has outdated info about a _SubCluster_. This is because we handle AM register 
requests synchronously. This jira proposes to move to async similar to how we 
operate with allocate invocations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.9.0 (RC3)

2017-11-17 Thread Subru Krishnan
Wrapping up the vote with my +1.

Deployed RC3 on a federated YARN cluster with 6 sub-clusters:
- ran multiple sample jobs
- enabled opportunistic containers and submitted more samples
- configured HDFS federation and reran jobs.


With 13 binding +1s and 7 non-binding +1s and no +/-1s, pleased to announce
the vote is passed successfully.

Thanks to the many of you who contributed to the release and made this
possible and to everyone in this thread who took the time/effort to
validate and vote!

We’ll push the release bits and send out an announcement for 2.9.0 soon.

Cheers,
Subru

On Fri, Nov 17, 2017 at 12:41 PM Eric Payne 
wrote:

> Thanks Arun and Subru for the hard work on this release.
>
> +1 (binding)
>
> Built from source and stood up a pseudo cluster with 4 NMs
>
> Tested the following:
>
> o User limits are honored during In-queue preemption
>
> o Priorities are honored during In-queue preemption
>
> o Can kill applications from the command line
>
> o Users with different weights are assigned resources proportional to
> their weights.
>
> Thanks,
> -Eric Payne
>
>
> --
> *From:* Arun Suresh 
> *To:* yarn-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; Hadoop
> Common ; Hdfs-dev <
> hdfs-...@hadoop.apache.org>
> *Cc:* Subramaniam Krishnan 
> *Sent:* Monday, November 13, 2017 6:10 PM
>
> *Subject:* [VOTE] Release Apache Hadoop 2.9.0 (RC3)
>
> Hi Folks,
>
> Apache Hadoop 2.9.0 is the first release of Hadoop 2.9 line and will be the
> starting release for Apache Hadoop 2.9.x line - it includes 30 New Features
> with 500+ subtasks, 407 Improvements, 790 Bug fixes new fixed issues since
> 2.8.2.
>
> More information about the 2.9.0 release plan can be found here:
> *
> https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#Roadmap-Version2.9
> <
> https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#Roadmap-Version2.9
> >*
>
> New RC is available at: *
> https://home.apache.org/~asuresh/hadoop-2.9.0-RC3/
> *
>
> The RC tag in git is: release-2.9.0-RC3, and the latest commit id is:
> 756ebc8394e473ac25feac05fa493f6d612e6c50.
>
> The maven artifacts are available via repository.apache.org at:
> <
> https://www.google.com/url?q=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachehadoop-1066=D=1=AFQjCNFcern4uingMV_sEreko_zeLlgdlg
> >*https://repository.apache.org/content/repositories/orgapachehadoop-1068/
>  >*
>
> We are carrying over the votes from the previous RC given that the delta is
> the license fix.
>
> Given the above - we are also going to stick with the original deadline for
> the vote : ending on Friday 17th November 2017 2pm PT time.
>
> Thanks,
> -Arun/Subru
>
>
>


[VOTE] Release Apache Hadoop 2.9.0 (RC2)

2017-11-12 Thread Subru Krishnan
Hi Folks,

Apache Hadoop 2.9.0 is the first release of Hadoop 2.9 line and will be the
starting release for Apache Hadoop 2.9.x line - it includes 30 New Features
with 500+ subtasks, 407 Improvements, 790 Bug fixes new fixed issues since
2.8.2.

More information about the 2.9.0 release plan can be found here:
*https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#Roadmap-Version2.9
*

New RC is available at: http://home.apache.org/~asuresh/hadoop-2.9.0-RC2/


The RC tag in git is: release-2.9.0-RC2, and the latest commit id is:
1eb05c1dd48fbc9e4b375a76f2046a59103bbeb1.

The maven artifacts are available via repository.apache.org at:
https://repository.apache.org/content/repositories/orgapachehadoop-1067/


Please try the release and vote; the vote will run for the usual 5 days,
ending on Friday 17th November 2017 2pm PT time.

We want to give a big shout out to Sunil, Varun, Rohith, Wangda, Vrushali
and Inigo for the extensive testing/validation which helped prepare for
RC2. Do report your results in this vote as it'll be very useful to the
entire community.

Thanks,
-Subru/Arun


[jira] [Resolved] (YARN-7478) TEST-cetest fails in branch-2

2017-11-12 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan resolved YARN-7478.
--
   Resolution: Implemented
Fix Version/s: 2.9.0

Thanks [~varun_saxena] for bringing this up and [~leftnoteasy] for pointing out 
that YARN-7412 fixes it, so I simply cherry-picked YARN-7412 to 
branch=2/2.9/2.9.0.

> TEST-cetest fails in branch-2
> -
>
> Key: YARN-7478
> URL: https://issues.apache.org/jira/browse/YARN-7478
> Project: Hadoop YARN
>  Issue Type: Bug
>    Reporter: Subru Krishnan
>Priority: Minor
> Fix For: 2.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7478) TEST-cetest fails in branch-2

2017-11-12 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-7478:


 Summary: TEST-cetest fails in branch-2
 Key: YARN-7478
 URL: https://issues.apache.org/jira/browse/YARN-7478
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Subru Krishnan
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.9.0 (RC1)

2017-11-10 Thread Subru Krishnan
Thanks John and Mukul for testing/voting on the RC but unfortunately we
have to cancel RC1 as we missed cherry-picking an addendum patch, courtesy
Rohith for pointing that out. Sincerely apologize, we'll be updating and
sending out RC2 asap.

-Subru/Arun

On Fri, Nov 10, 2017 at 2:35 AM, Mukul Kumar Singh <msi...@hortonworks.com>
wrote:

> +1 (non-binding)
>
> I built from source on Mac OS X 10.12.6 Java 1.8.0_111
>
> - Deployed on a single node cluster.
> - Deployed a ViewFS cluster with two hdfs mount points.
> - Performed basic sanity checks.
> - Performed basic DFS operations.
>
> Thanks,
> Mukul
>
>
>
>
>
>
> On 10/11/17, 7:09 AM, "Subru Krishnan" <su...@apache.org> wrote:
>
> >Hi Folks,
> >
> >Apache Hadoop 2.9.0 is the first release of Hadoop 2.9 line and will be
> the
> >starting release for Apache Hadoop 2.9.x line - it includes 30 New
> Features
> >with 500+ subtasks, 407 Improvements, 790 Bug fixes new fixed issues since
> >2.8.2 .
> >
> >More information about the 2.9.0 release plan can be found here:
> >*https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9
> ><https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9>*
> >
> >New RC is available at: http://home.apache.org/~asuresh/hadoop-2.9.0-RC1/
> ><http://www.google.com/url?q=http%3A%2F%2Fhome.apache.org%
> 2F~asuresh%2Fhadoop-2.9.0-RC1%2F=D=1=
> AFQjCNE7BF35IDIMZID3hPqiNglWEVsTpg>
> >
> >The RC tag in git is: release-2.9.0-RC1, and the latest commit id is:
> >7d2ba3e8dd74d2631c51ce6790d59e50eeb7a846.
> >
> >The maven artifacts are available via repository.apache.org at:
> >https://repository.apache.org/content/repositories/orgapachehadoop-1066
> ><https://www.google.com/url?q=https%3A%2F%2Frepository.
> apache.org%2Fcontent%2Frepositories%2Forgapachehadoop-1066=D&
> sntz=1=AFQjCNFcern4uingMV_sEreko_zeLlgdlg>
> >
> >Please try the release and vote; the vote will run for the usual 5 days,
> >ending on Tuesday 14th November 2017 6pm PST time.
> >
> >Thanks,
> >-Subru/Arun
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


[VOTE] Release Apache Hadoop 2.9.0 (RC1)

2017-11-09 Thread Subru Krishnan
Hi Folks,

Apache Hadoop 2.9.0 is the first release of Hadoop 2.9 line and will be the
starting release for Apache Hadoop 2.9.x line - it includes 30 New Features
with 500+ subtasks, 407 Improvements, 790 Bug fixes new fixed issues since
2.8.2 .

More information about the 2.9.0 release plan can be found here:
*https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#Roadmap-Version2.9
*

New RC is available at: http://home.apache.org/~asuresh/hadoop-2.9.0-RC1/


The RC tag in git is: release-2.9.0-RC1, and the latest commit id is:
7d2ba3e8dd74d2631c51ce6790d59e50eeb7a846.

The maven artifacts are available via repository.apache.org at:
https://repository.apache.org/content/repositories/orgapachehadoop-1066


Please try the release and vote; the vote will run for the usual 5 days,
ending on Tuesday 14th November 2017 6pm PST time.

Thanks,
-Subru/Arun


Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-09 Thread Subru Krishnan
Thanks Vinod for your feedback, we'll incorporate it when we spin RC1.

-Subru/Arun

On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli <vino...@apache.org>
wrote:

> A related point - I thought I mentioned this in one of the release
> preparation threads, but in any case.
>
> Starting 2.7.0, for every .0 release, we've been adding a disclaimer (to
> the voting thread as well as the final release) that the first release can
> potentially go through additional fixes to incompatible changes (besides
> stabilization fixes). We should do this with 2.9.0 too.
>
> This has some history - long before this, we tried two different things:
> (a) downstream projects consume an RC (b) downstream projects consume a
> release. Option (a) was tried many times but it was increasingly getting
> hard to manage this across all the projects that depend on Hadoop. When we
> tried option (b), we used to make .0 as a GA release, but downstream
> projects like Tez, Hive, Spark would come back and find an incompatible
> change - and now we were forced into a conundrum - is fixing this
> incompatible change itself an incompatibility? So to avoid this problem,
> we've started marking the first few releases as alpha eventually making a
> stable point release. Clearly, specific users can still use this in
> production as long as we the Hadoop community reserve the right to fix
> incompatibilities.
>
> Long story short, I'd just add to your voting thread and release notes
> that 2.9.0 still needs to be tested downstream and so users may want to
> wait for subsequent point releases.
>
> Thanks
> +Vinod
>
> > On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
> >
> > We are canceling the RC due to the issue that Rohith/Sunil identified.
> The
> > issue was difficult to track down as it only happens when you use IP for
> ZK
> > (works fine with host names) and moreover if ZK and RM are co-located on
> > same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
> >
> > Thanks to everyone for the extensive testing/validation. Hopefully cost
> to
> > replicate with RC1 is much lower.
> >
> > -Subru/Arun.
> >
> > On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <
> kkarana...@gmail.com
> >> wrote:
> >
> >> +1 from me too.
> >>
> >> Did the following:
> >> 1) set up a 9-node cluster;
> >> 2) ran some Gridmix jobs;
> >> 3) ran (2) after enabling opportunistic containers (used a mix of
> >> guaranteed and opportunistic containers for each job);
> >> 4) ran (3) but this time enabling distributed scheduling of
> opportunistic
> >> containers.
> >>
> >> All the above worked with no issues.
> >>
> >> Thanks for all the effort guys!
> >>
> >> Konstantinos
> >>
> >>
> >>
> >> Konstantinos
> >>
> >> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <ebad...@oath.com.invalid>
> >> wrote:
> >>
> >>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
> >>>
> >>> - Verified all hashes and checksums
> >>> - Built from source on macOS 10.12.6, Java 1.8.0u65
> >>> - Deployed a pseudo cluster
> >>> - Ran some example jobs
> >>>
> >>> Thanks,
> >>>
> >>> Eric
> >>>
> >>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wheele...@gmail.com>
> wrote:
> >>>
> >>>> Sunil / Rohith,
> >>>>
> >>>> Could you check if your configs are same as Jonathan posted configs?
> >>>> https://issues.apache.org/jira/browse/YARN-7453?
> >>> focusedCommentId=16242693&
> >>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
> >>>> comment-tabpanel#comment-16242693
> >>>>
> >>>> And could you try if using Jonathan's configs can still reproduce the
> >>>> issue?
> >>>>
> >>>> Thanks,
> >>>> Wangda
> >>>>
> >>>>
> >>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <asur...@apache.org>
> >> wrote:
> >>>>
> >>>>> Thanks for testing Rohith and Sunil
> >>>>>
> >>>>> Can you please confirm if it is not a config issue at your end ?
> >>>>> We (both Jonathan and myself) just tried testing this on a fresh
> >>> cluster
> >>>>> (both automatic and manual) and we are not able to reproduce this.
> >

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-08 Thread Subru Krishnan
We are canceling the RC due to the issue that Rohith/Sunil identified. The
issue was difficult to track down as it only happens when you use IP for ZK
(works fine with host names) and moreover if ZK and RM are co-located on
same machine. We are hopeful to get the fix in tomorrow and roll out RC1.

Thanks to everyone for the extensive testing/validation. Hopefully cost to
replicate with RC1 is much lower.

-Subru/Arun.

On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos  wrote:

> +1 from me too.
>
> Did the following:
> 1) set up a 9-node cluster;
> 2) ran some Gridmix jobs;
> 3) ran (2) after enabling opportunistic containers (used a mix of
> guaranteed and opportunistic containers for each job);
> 4) ran (3) but this time enabling distributed scheduling of opportunistic
> containers.
>
> All the above worked with no issues.
>
> Thanks for all the effort guys!
>
> Konstantinos
>
>
>
> Konstantinos
>
> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger 
> wrote:
>
> > +1 (non-binding) pending the issue that Sunil/Rohith pointed out
> >
> > - Verified all hashes and checksums
> > - Built from source on macOS 10.12.6, Java 1.8.0u65
> > - Deployed a pseudo cluster
> > - Ran some example jobs
> >
> > Thanks,
> >
> > Eric
> >
> > On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan  wrote:
> >
> > > Sunil / Rohith,
> > >
> > > Could you check if your configs are same as Jonathan posted configs?
> > > https://issues.apache.org/jira/browse/YARN-7453?
> > focusedCommentId=16242693&
> > > page=com.atlassian.jira.plugin.system.issuetabpanels:
> > > comment-tabpanel#comment-16242693
> > >
> > > And could you try if using Jonathan's configs can still reproduce the
> > > issue?
> > >
> > > Thanks,
> > > Wangda
> > >
> > >
> > > On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh 
> wrote:
> > >
> > > > Thanks for testing Rohith and Sunil
> > > >
> > > > Can you please confirm if it is not a config issue at your end ?
> > > > We (both Jonathan and myself) just tried testing this on a fresh
> > cluster
> > > > (both automatic and manual) and we are not able to reproduce this.
> I've
> > > > updated the YARN-7453  jira/browse/YARN-7453
> > >
> > > > JIRA
> > > > with details of testing.
> > > >
> > > > Cheers
> > > > -Arun/Subru
> > > >
> > > > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > > > rohithsharm...@apache.org
> > > > > wrote:
> > > >
> > > > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > > >  JIRA to track
> this
> > > > > issue.
> > > > >
> > > > > - Rohith Sharma K S
> > > > >
> > > > > On 7 November 2017 at 16:44, Sunil G  wrote:
> > > > >
> > > > >> Hi Subru and Arun.
> > > > >>
> > > > >> Thanks for driving 2.9 release. Great work!
> > > > >>
> > > > >> I installed cluster built from source.
> > > > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > > > >> - Accessed new UI and it also seems fine.
> > > > >>
> > > > >> However I am also getting same issue as Rohith reported.
> > > > >> - Started an HA cluster
> > > > >> - Pushed RM to standby
> > > > >> - Pushed back RM to active then seeing an exception.
> > > > >>
> > > > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
> > transition
> > > to
> > > > >> Active
> > > > >> at
> > > > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > > >> lectorBasedElectorServic
> > > > >> e.becomeActive(ActiveStandbyElectorBasedElect
> > orService.java:146)
> > > > >> at
> > > > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > > >> eStandbyElector.java:894
> > > > >> )
> > > > >>
> > > > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > > > >> KeeperErrorCode = NoAuth
> > > > >> at
> > > > >> org.apache.zookeeper.KeeperException.create(
> > KeeperException.java:113)
> > > > >> at org.apache.zookeeper.ZooKeeper.multiInternal(
> > > ZooKeeper.java:
> > > > >> 949)
> > > > >>
> > > > >> Will check and post more details,
> > > > >>
> > > > >> - Sunil
> > > > >>
> > > > >>
> > > > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > > > >> rohithsharm...@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >> > Thanks Subru/Arun for the great work!
> > > > >> >
> > > > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > > > cluster
> > > > >> > along with new YARN UI and ATSv2.
> > > > >> >
> > > > >> > I am facing basic RM HA switch issue after first time successful
> > > > start.
> > > > >> > *Can
> > > > >> > anyone else is facing this issue?*
> > > > >> >
> > > > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> > > switch
> > > > to
> > > > >> > active successfully. Exception trace I see from the log is
> > > > >> >
> > > > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > > > 

Re: Cutting branch-2.9

2017-11-03 Thread Subru Krishnan
t-off branch-2.9.0 for 2.9.0 release
> >> work. In the mean time, branch-2.9 should be reserved for next 2.9 point
> >> release (2.9.1) and branch-2 should be reserved for next minor release
> >> (2.10.0 or whatever name it is). Thoughts?
> >>
> >> bq. @Junping, lets move the jdiff conversation to separate thread.
> >> Sure. I will reply jdiff in separated thread.
> >>
> >> Thanks,
> >>
> >> Junping
> >>
> >> 2017-10-31 13:44 GMT-07:00 Arun Suresh <asur...@apache.org>:
> >>
> >>> Hello folks
> >>>
> >>> We just cut branch-2.9 since all the critical/blocker issues are now
> >>> resolved.
> >>> We plan to perform some sanity checks for the rest of the week and cut
> >>> branch-2.9.0 and push out an RC0 by the end of the week.
> >>>
> >>> Kindly refrain from committing to branch-2.9 without giving us a heads
> >>> up.
> >>>
> >>> @Junping, lets move the jdiff conversation to separate thread.
> >>>
> >>> Cheers
> >>> -Arun/Subru
> >>>
> >>> On Mon, Oct 30, 2017 at 12:39 PM, Subru Krishnan <su...@apache.org>
> >>> wrote:
> >>>
> >>> > We want to give heads up that we are going to cut branch-2.9 tomorrow
> >>> > morning.
> >>> >
> >>> > We are down to 3 blockers and they all are close to being committed
> >>> > (thanks everyone):
> >>> > https://cwiki.apache.org/confluence/display/HADOOP/
> Hadoop+2.9+Release
> >>> >
> >>> > There are 4 other non-blocker JIRAs that are targeted for 2.9.0 which
> >>> are
> >>> > close to completion.
> >>> >
> >>> > Folks who are working/reviewing these, kindly prioritize accordingly
> so
> >>> > that we can make the release on time.
> >>> > https://issues.apache.org/jira/browse/YARN-7398?filter=12342468
> >>> >
> >>> > Thanks in advance!
> >>> >
> >>> > -Subru/Arun
> >>> >
> >>> >
> >>> >
> >>>
> >>
> >>
> >
>
>


[jira] [Created] (YARN-7434) Router getApps REST invocation fails with multiple RMs

2017-11-02 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-7434:


 Summary: Router getApps REST invocation fails with multiple RMs
 Key: YARN-7434
 URL: https://issues.apache.org/jira/browse/YARN-7434
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Subru Krishnan
Assignee: Íñigo Goiri
Priority: Critical


Router uses threads to invoke getApps in parallel with multiple RMs and has a 
concurrency bug caused by sharing of the HTTP request object. This jira tracks 
the changes to fix the multi-threading issue by cloning the request.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Cutting branch-2.9

2017-10-31 Thread Subru Krishnan
Hi Junping,

We are planning to cut branch-2.9.0 on Thursday as there are couple of
non-blocking bug fixes coming in which we hope to get in tomorrow. Hope
that answers your question.

Thanks,
Subru

On Tue, Oct 31, 2017 at 2:35 PM, 俊平堵 <dujunp...@gmail.com> wrote:

> Hi Arun/Subru,
> Thanks for updating on 2.9.0 release progress. A quick question here:
> are we planning to release from branch-2.9 directly?
> I doubt this as it seems against our current branch release practice (
> https://wiki.apache.org/hadoop/HowToRelease#Branching). To get rid of any
> confusion, I would suggest to cut-off branch-2.9.0 for 2.9.0 release work.
> In the mean time, branch-2.9 should be reserved for next 2.9 point release
> (2.9.1) and branch-2 should be reserved for next minor release (2.10.0 or
> whatever name it is). Thoughts?
>
> bq. @Junping, lets move the jdiff conversation to separate thread.
> Sure. I will reply jdiff in separated thread.
>
> Thanks,
>
> Junping
>
> 2017-10-31 13:44 GMT-07:00 Arun Suresh <asur...@apache.org>:
>
>> Hello folks
>>
>> We just cut branch-2.9 since all the critical/blocker issues are now
>> resolved.
>> We plan to perform some sanity checks for the rest of the week and cut
>> branch-2.9.0 and push out an RC0 by the end of the week.
>>
>> Kindly refrain from committing to branch-2.9 without giving us a heads up.
>>
>> @Junping, lets move the jdiff conversation to separate thread.
>>
>> Cheers
>> -Arun/Subru
>>
>> On Mon, Oct 30, 2017 at 12:39 PM, Subru Krishnan <su...@apache.org>
>> wrote:
>>
>> > We want to give heads up that we are going to cut branch-2.9 tomorrow
>> > morning.
>> >
>> > We are down to 3 blockers and they all are close to being committed
>> > (thanks everyone):
>> > https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.9+Release
>> >
>> > There are 4 other non-blocker JIRAs that are targeted for 2.9.0 which
>> are
>> > close to completion.
>> >
>> > Folks who are working/reviewing these, kindly prioritize accordingly so
>> > that we can make the release on time.
>> > https://issues.apache.org/jira/browse/YARN-7398?filter=12342468
>> >
>> > Thanks in advance!
>> >
>> > -Subru/Arun
>> >
>> >
>> >
>>
>
>


Cutting branch-2.9

2017-10-30 Thread Subru Krishnan
We want to give heads up that we are going to cut branch-2.9 tomorrow
morning.

We are down to 3 blockers and they all are close to being committed (thanks
everyone):
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.9+Release

There are 4 other non-blocker JIRAs that are targeted for 2.9.0 which are
close to completion.

Folks who are working/reviewing these, kindly prioritize accordingly so
that we can make the release on time.
https://issues.apache.org/jira/browse/YARN-7398?filter=12342468

Thanks in advance!

-Subru/Arun


[jira] [Resolved] (YARN-5326) Support for recurring reservations in the YARN ReservationSystem

2017-10-30 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan resolved YARN-5326.
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.1.0
   3.0.0
   2.9.0

Marking this as resolved as all sub-tasks are closed.

> Support for recurring reservations in the YARN ReservationSystem
> 
>
> Key: YARN-5326
> URL: https://issues.apache.org/jira/browse/YARN-5326
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>    Reporter: Subru Krishnan
>Assignee: Carlo Curino
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: SupportRecurringReservationsInRayon.pdf
>
>
> YARN-1051 introduced a ReservationSytem that enables the YARN RM to handle 
> time explicitly, i.e. users can now "reserve" capacity ahead of time which is 
> predictably allocated to them. Most SLA jobs/workflows are recurring so they 
> need the same resources periodically. With the current implementation, users 
> will have to make individual reservations for each run. This is an umbrella 
> JIRA to enhance the reservation system by adding native support for recurring 
> reservations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7398) LICENSE.txt is broken in branch-2 by YARN-4849

2017-10-25 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-7398:


 Summary: LICENSE.txt is broken in branch-2 by YARN-4849
 Key: YARN-7398
 URL: https://issues.apache.org/jira/browse/YARN-7398
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.9.0
Reporter: Subru Krishnan
Assignee: Wangda Tan
Priority: Blocker


YARN-4849 (commit sha id 56654d8820f345fdefd6a3f81836125aa67adbae) seems to 
have been based out of stale version of LICENSE.txt, for e.g: HSQLDB, gtest 
etc, so I have reverted it. 

[~leftnoteasy]/[~sunilg], can you guys take a look and fix the UI v2 licenses 
asap.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



2.9.0 status update (10/20/2017)

2017-10-20 Thread Subru Krishnan
Today was the feature freeze date and we are glad to inform that all the
major planned features are merged in branch-2:
https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#Roadmap-Plannedfeatures
:

Kudos to everyone who pulled through multiple blockers and made this
happen. Special shoutout to Vrushali, Varun (both :)), Wangda, Inigo,
Sunil, and Jonathan.

I have set up a nightly build for branch-2 (hopefully):
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/

We have 5 blockers and 10 other jiras in development that have to be
addressed by *27th October 2017* following which we plan to cut branch-2.9:
Blockers - https://issues.apache.org/jira/issues/?filter=12342048
WIP JIRAs - https://issues.apache.org/jira/issues/?filter=12342468

We'll be following up with each of the above JIRAs individually next week,
lets make sure that we complete them by next Friday.

-Subru/Arun


[jira] [Resolved] (YARN-4879) Enhance Allocate Protocol to Identify Requests Explicitly

2017-10-03 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan resolved YARN-4879.
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-beta1
   2.9.0

> Enhance Allocate Protocol to Identify Requests Explicitly
> -
>
> Key: YARN-4879
> URL: https://issues.apache.org/jira/browse/YARN-4879
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, resourcemanager
>    Reporter: Subru Krishnan
>    Assignee: Subru Krishnan
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: SimpleAllocateProtocolProposal-v1.pdf, 
> SimpleAllocateProtocolProposal-v2.pdf
>
>
> For legacy reasons, the current allocate protocol expects expanded requests 
> which represent the cumulative request for any change in resource 
> constraints. This is not only very difficult to comprehend but makes it 
> impossible for the scheduler to associate container allocations to the 
> original requests. This problem is amplified by the fact that the expansion 
> is managed by the AMRMClient which makes it cumbersome for non-Java clients 
> as they all have to replicate the non-trivial logic. In this JIRA, we are 
> proposing enhancement to the Allocate Protocol to allow AMs to identify 
> requests explicitly.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



2.9.0 Status Update [09/2017]

2017-09-29 Thread Subru Krishnan
Folks,

Quick update on 2.9.0 release. Looks like we are looking at around a week
of delay in the feature freeze:
https://cwiki.apache.org/confluence/display/HADOOP/Roadmap

For feature owners, where the status is:

   1. Green - no action required at this point.
   2. Orange - revert with confirmation on whether backporting from trunk
   to branch-2 is on track
   3. Red - can we get this in by week of Oct 9th?


There are about 20 blocker/critical issues open:
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.9+Release

We have moved out all non-(blocker/critical) inactive (open) issues to the
next major release. Reach out to us if you feel any of those must be in
2.9.0.

For non-(blocker/critical) active (patch available) issues, can you update
the status. Consider moving out to next release if you anticipate delays.

Thanks,
Subru/Arun


[jira] [Resolved] (YARN-2915) Enable YARN RM scale out via federation using multiple RM's

2017-09-25 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan resolved YARN-2915.
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-beta1
   2.9.0

> Enable YARN RM scale out via federation using multiple RM's
> ---
>
> Key: YARN-2915
> URL: https://issues.apache.org/jira/browse/YARN-2915
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Sriram Rao
>    Assignee: Subru Krishnan
>  Labels: federation
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: Federation-BoF.pdf, 
> FEDERATION_CAPACITY_ALLOCATION_JIRA.pdf, federation-prototype.patch, 
> Yarn_federation_design_v1.pdf, YARN-Federation-Hadoop-Summit_final.pptx
>
>
> This is an umbrella JIRA that proposes to scale out YARN to support large 
> clusters comprising of tens of thousands of nodes.   That is, rather than 
> limiting a YARN managed cluster to about 4k in size, the proposal is to 
> enable the YARN managed cluster to be elastically scalable.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[HEADS-UP] Merge Federation (YARN-2915) into branch-2

2017-09-15 Thread Subru Krishnan
Folks,

We are almost done with the UI patch and so will be merging Federation
(YARN-2915) from trunk to branch-2 in the next week or so.

Just want to give a heads-up to avoid any awkward surprises, though
unlikely :).

Cheers,
Subru


[jira] [Resolved] (YARN-7097) Federation: routing ClientRM invocations transparently to multiple RMs (part 5 - getNode/getNodes/getMetrics)

2017-08-31 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan resolved YARN-7097.
--
   Resolution: Duplicate
Fix Version/s: 3.0.0-beta1

> Federation: routing ClientRM invocations transparently to multiple RMs (part 
> 5 - getNode/getNodes/getMetrics)
> -
>
> Key: YARN-7097
> URL: https://issues.apache.org/jira/browse/YARN-7097
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
> Fix For: 3.0.0-beta1
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7096) Federation: routing ClientRM invocations transparently to multiple RMs (part 2 - getApps)

2017-08-31 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan resolved YARN-7096.
--
Resolution: Implemented

> Federation: routing ClientRM invocations transparently to multiple RMs (part 
> 2 - getApps)
> -
>
> Key: YARN-7096
> URL: https://issues.apache.org/jira/browse/YARN-7096
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [Vote] merge feature branch YARN-2915 (Federation) to trunk

2017-08-01 Thread Subru Krishnan
I'll conclude the vote with my obligatory +1.

With 8 binding +1s, 2 non-binding +1s and 2 +0s (Daniel's may actually be
0.5 --> 1:)), the vote passes. We'll be merging this to trunk shortly.

Thanks everyone for the taking the time/effort to vote.

Cheers,
Subru/Carlo

On Tue, Aug 1, 2017 at 3:22 PM, varunsax...@apache.org <
varun.saxena.apa...@gmail.com> wrote:

> +1.
> Looking forward to it.
>
> Regards,
> Varun Saxena.
>
> On Wed, Jul 26, 2017 at 8:54 AM, Subru Krishnan <su...@apache.org> wrote:
>
> > Hi all,
> >
> > Per earlier discussion [9], I'd like to start a formal vote to merge
> > feature YARN Federation (YARN-2915) [1] to trunk. The vote will run for 7
> > days, and will end Aug 1 7PM PDT.
> >
> > We have been developing the feature in a branch (YARN-2915 [2]) for a
> > while, and we are reasonably confident that the state of the feature
> meets
> > the criteria to be merged onto trunk.
> >
> > *Key Ideas*:
> >
> > YARN’s centralized design allows strict enforcement of scheduling
> > invariants and effective resource sharing, but becomes a scalability
> > bottleneck (in number of jobs and nodes) well before reaching the scale
> of
> > our clusters (e.g., 20k-50k nodes).
> >
> >
> > To address these limitations, we developed a scale-out, federation-based
> > solution (YARN-2915). Our architecture scales near-linearly to datacenter
> > sized clusters, by partitioning nodes across multiple sub-clusters (each
> > running a YARN cluster of few thousands nodes). Applications can span
> > multiple sub-clusters *transparently (i.e. no code change or
> recompilation
> > of existing apps)*, thanks to a layer of indirection that negotiates with
> > multiple sub-clusters' Resource Managers on behalf of the application.
> >
> >
> > This design is structurally scalable, as it bounds the number of nodes
> each
> > RM is responsible for. Appropriate policies ensure that the majority of
> > applications reside within a single sub-cluster, thus further controlling
> > the load on each RM. This provides near linear scale-out by simply adding
> > more sub-clusters. The same mechanism enables pooling of resources from
> > clusters owned and operated by different teams.
> >
> > Status:
> >
> >- The version we would like to merge to trunk is termed "MVP" (minimal
> >viable product). The feature will have a complete end-to-end
> application
> >execution flow with the ability to span a single application across
> >multiple YARN (sub) clusters.
> >- There were 50+ sub-tasks that were that were completed as part of
> this
> >effort. Every patch has been reviewed and +1ed by a committer. Thanks
> to
> >Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
> >- Federation is designed to be built around YARN and consequently has
> >minimal code changes to core YARN. The relevant JIRAs that modify
> > existing
> >YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
> >attention to ensure that if federation is disabled there is zero
> impact
> > to
> >existing functionality (disabled by default).
> >- We found a few bugs as we went along which we fixed directly
> upstream
> >in trunk and/or branch-2.
> >- We have continuously rebasing the feature branch [2] so the merge
> >should be a straightforward cherry-pick.
> >- The current version has been rather thoroughly tested and is
> currently
> >deployed in a *10,000+ node federated YARN cluster that's running
> >upwards of 50k jobs daily with a reliability of 99.9%*.
> >- We have few ideas for follow-up extensions/improvements which are
> >tracked in the umbrella JIRA YARN-5597[3].
> >
> >
> > Documentation:
> >
> >- Quick start guide (maven site) - YARN-6484[4].
> >- Overall design doc[5] and the slide-deck [6] we used for our talk at
> >Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.
> >
> >
> > Credits:
> >
> > This is a group effort that could have not been possible without the
> ideas
> > and hard work of many other folks and we would like to specifically call
> > out Giovanni, Botong & Ellen for their invaluable contributions. Also big
> > thanks to the many folks in community  (Sriram, Kishore, Sarvesh, Jian,
> > Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith and
> > many more) that helped us shape our ideas and code with very insightful
> >

Re: [Vote] merge feature branch YARN-2915 (Federation) to trunk

2017-08-01 Thread Subru Krishnan
Hi Daniel,

You were just on time, myself & Carlo were just talking about moving
forward with the merge :).

To answer your questions:

   1. The expectation about the store is that user will have a database set
   up (we only link to install instructions page) but we do have the scripts
   for the schema and stored procedures. This is in fact called out in the doc
   in the *State Store* section (just before *Running a Sample Job).
*Additionally
   we are working on a ZK based implementation of the store. Inigo has patch
   in YARN-6900[1].
   2. We rely on existing YARN/Hadoop security mechanisms for running
   application on Federation as-is so you should not need any additional
   Kerberos configuration. Disclaimer: we don't use Kerberos for securing
   Hadoop but rely on our production infrastructure.

Thanks,
Subru

[1] https://issues.apache.org/jira/browse/YARN-6900

On Tue, Aug 1, 2017 at 1:25 PM, Daniel Templeton <dan...@cloudera.com>
wrote:

> Subru, sorry for the last minute contribution... :)  I've been looking at
> the branch, and I have two questions.
>
> First, what's the out-of-box experience regarding the data store? Is the
> expectation that the user will have a database set up and ready to go?
> Will the state store set up the schema automatically, or is that on the
> user?  I don't see that in the docs.
>
> Second, how well does federation play with Kerberos?  Anything special
> that needs to be configured to make it work?
>
> Daniel
>
> On 7/25/17 8:24 PM, Subru Krishnan wrote:
>
>> Hi all,
>>
>> Per earlier discussion [9], I'd like to start a formal vote to merge
>> feature YARN Federation (YARN-2915) [1] to trunk. The vote will run for 7
>> days, and will end Aug 1 7PM PDT.
>>
>> We have been developing the feature in a branch (YARN-2915 [2]) for a
>> while, and we are reasonably confident that the state of the feature meets
>> the criteria to be merged onto trunk.
>>
>> *Key Ideas*:
>>
>> YARN’s centralized design allows strict enforcement of scheduling
>> invariants and effective resource sharing, but becomes a scalability
>> bottleneck (in number of jobs and nodes) well before reaching the scale of
>> our clusters (e.g., 20k-50k nodes).
>>
>>
>> To address these limitations, we developed a scale-out, federation-based
>> solution (YARN-2915). Our architecture scales near-linearly to datacenter
>> sized clusters, by partitioning nodes across multiple sub-clusters (each
>> running a YARN cluster of few thousands nodes). Applications can span
>> multiple sub-clusters *transparently (i.e. no code change or recompilation
>> of existing apps)*, thanks to a layer of indirection that negotiates with
>> multiple sub-clusters' Resource Managers on behalf of the application.
>>
>>
>> This design is structurally scalable, as it bounds the number of nodes
>> each
>> RM is responsible for. Appropriate policies ensure that the majority of
>> applications reside within a single sub-cluster, thus further controlling
>> the load on each RM. This provides near linear scale-out by simply adding
>> more sub-clusters. The same mechanism enables pooling of resources from
>> clusters owned and operated by different teams.
>>
>> Status:
>>
>> - The version we would like to merge to trunk is termed "MVP" (minimal
>> viable product). The feature will have a complete end-to-end
>> application
>> execution flow with the ability to span a single application across
>> multiple YARN (sub) clusters.
>> - There were 50+ sub-tasks that were that were completed as part of
>> this
>> effort. Every patch has been reviewed and +1ed by a committer. Thanks
>> to
>> Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
>> - Federation is designed to be built around YARN and consequently has
>> minimal code changes to core YARN. The relevant JIRAs that modify
>> existing
>> YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
>> attention to ensure that if federation is disabled there is zero
>> impact to
>> existing functionality (disabled by default).
>> - We found a few bugs as we went along which we fixed directly
>> upstream
>> in trunk and/or branch-2.
>> - We have continuously rebasing the feature branch [2] so the merge
>> should be a straightforward cherry-pick.
>> - The current version has been rather thoroughly tested and is
>> currently
>> deployed in a *10,000+ node federated YARN cluster that's running
>> upwards of 50k jobs daily with a reliability of 99.9

Re: [Vote] merge feature branch YARN-2915 (Federation) to trunk

2017-07-31 Thread Subru Krishnan
Hi Andrew,

You are raising pertinent questions: one of the key design points of
Federation was to be completely transparent to applications, i.e. there
should no code change or even recompile required to run existing apps in a
federated cluster. In summary apps simply get the appearance of a larger
cluster to play around with. Consequently there are zero public API changes
(we have new APIs for FederationStateStore but those are purely private)
for YARN Federation. Additionally we have backported the code to our
internal branch (currently based on 2.7.1) and have been running in
production at scale of 10s of 1000s of nodes.

I agree with you regarding the backport to branch-2. We are planning to get
that done by August and hence included it in the proposed release plan[1]
for 2.9.0.

Cheers,
Subru

[1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg27126.html



On Mon, Jul 31, 2017 at 1:29 PM, Andrew Wang <andrew.w...@cloudera.com>
wrote:

> Hi all,
>
> Sorry for coming to this late, I wasn't on yarn-dev and someone else
> mentioned that this feature was being merged.
>
> With my RM hat on, trunk is an active release branch, so we want to be
> merging features when they are production-ready. This feature has done one
> better, and has already been run at 10k-node scale! It's great to see this
> level of testing and validation for a branch merge.
>
> Could one of the contributors comment on compatibility and API stability?
> It looks like it's compatible and stable, but I wanted to confirm since the
> target 3.0.0-beta1 release date of mid-September means there isn't much
> time to do additional development in trunk.
>
> Finally, could someone comment on the timeline for merging this into
> branch-2? Given that the feature seems ready, I expect we'd quickly
> backport this to branch-2 as well.
>
> Best,
> Andrew
>
> On Mon, Jul 31, 2017 at 1:05 PM, Naganarasimha Garla <
> naganarasimha...@apache.org> wrote:
>
> > +1, Quite interesting and useful feature. Hoping to see it 2.9 too.
> >
> > On Tue, Aug 1, 2017 at 1:31 AM, Jason Lowe <jl...@yahoo-inc.com.invalid>
> > wrote:
> >
> > > +1
> > > Jason
> > >
> > >
> > > On Tuesday, July 25, 2017 10:24 PM, Subru Krishnan <
> su...@apache.org
> > >
> > > wrote:
> > >
> > >
> > >  Hi all,
> > >
> > > Per earlier discussion [9], I'd like to start a formal vote to merge
> > > feature YARN Federation (YARN-2915) [1] to trunk. The vote will run
> for 7
> > > days, and will end Aug 1 7PM PDT.
> > >
> > > We have been developing the feature in a branch (YARN-2915 [2]) for a
> > > while, and we are reasonably confident that the state of the feature
> > meets
> > > the criteria to be merged onto trunk.
> > >
> > > *Key Ideas*:
> > >
> > > YARN’s centralized design allows strict enforcement of scheduling
> > > invariants and effective resource sharing, but becomes a scalability
> > > bottleneck (in number of jobs and nodes) well before reaching the scale
> > of
> > > our clusters (e.g., 20k-50k nodes).
> > >
> > >
> > > To address these limitations, we developed a scale-out,
> federation-based
> > > solution (YARN-2915). Our architecture scales near-linearly to
> datacenter
> > > sized clusters, by partitioning nodes across multiple sub-clusters
> (each
> > > running a YARN cluster of few thousands nodes). Applications can span
> > > multiple sub-clusters *transparently (i.e. no code change or
> > recompilation
> > > of existing apps)*, thanks to a layer of indirection that negotiates
> with
> > > multiple sub-clusters' Resource Managers on behalf of the application.
> > >
> > >
> > > This design is structurally scalable, as it bounds the number of nodes
> > each
> > > RM is responsible for. Appropriate policies ensure that the majority of
> > > applications reside within a single sub-cluster, thus further
> controlling
> > > the load on each RM. This provides near linear scale-out by simply
> adding
> > > more sub-clusters. The same mechanism enables pooling of resources from
> > > clusters owned and operated by different teams.
> > >
> > > Status:
> > >
> > >   - The version we would like to merge to trunk is termed "MVP"
> (minimal
> > >   viable product). The feature will have a complete end-to-end
> > application
> > >   execution flow with the ability to span a single application across
> > >   multiple YARN (sub) clusters.

[jira] [Created] (YARN-6900) ZooKeeper based implementation of the FederationStateStore

2017-07-28 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6900:


 Summary: ZooKeeper based implementation of the FederationStateStore
 Key: YARN-6900
 URL: https://issues.apache.org/jira/browse/YARN-6900
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Subru Krishnan
Assignee: Ellen Hui


YARN-3662 defines the FederationMembershipStateStore API. This JIRA tracks an 
in-memory based implementation which is useful for both single-box testing and 
for future unit tests that depend on the state store.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[Vote] merge feature branch YARN-2915 (Federation) to trunk

2017-07-25 Thread Subru Krishnan
Hi all,

Per earlier discussion [9], I'd like to start a formal vote to merge
feature YARN Federation (YARN-2915) [1] to trunk. The vote will run for 7
days, and will end Aug 1 7PM PDT.

We have been developing the feature in a branch (YARN-2915 [2]) for a
while, and we are reasonably confident that the state of the feature meets
the criteria to be merged onto trunk.

*Key Ideas*:

YARN’s centralized design allows strict enforcement of scheduling
invariants and effective resource sharing, but becomes a scalability
bottleneck (in number of jobs and nodes) well before reaching the scale of
our clusters (e.g., 20k-50k nodes).


To address these limitations, we developed a scale-out, federation-based
solution (YARN-2915). Our architecture scales near-linearly to datacenter
sized clusters, by partitioning nodes across multiple sub-clusters (each
running a YARN cluster of few thousands nodes). Applications can span
multiple sub-clusters *transparently (i.e. no code change or recompilation
of existing apps)*, thanks to a layer of indirection that negotiates with
multiple sub-clusters' Resource Managers on behalf of the application.


This design is structurally scalable, as it bounds the number of nodes each
RM is responsible for. Appropriate policies ensure that the majority of
applications reside within a single sub-cluster, thus further controlling
the load on each RM. This provides near linear scale-out by simply adding
more sub-clusters. The same mechanism enables pooling of resources from
clusters owned and operated by different teams.

Status:

   - The version we would like to merge to trunk is termed "MVP" (minimal
   viable product). The feature will have a complete end-to-end application
   execution flow with the ability to span a single application across
   multiple YARN (sub) clusters.
   - There were 50+ sub-tasks that were that were completed as part of this
   effort. Every patch has been reviewed and +1ed by a committer. Thanks to
   Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
   - Federation is designed to be built around YARN and consequently has
   minimal code changes to core YARN. The relevant JIRAs that modify existing
   YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
   attention to ensure that if federation is disabled there is zero impact to
   existing functionality (disabled by default).
   - We found a few bugs as we went along which we fixed directly upstream
   in trunk and/or branch-2.
   - We have continuously rebasing the feature branch [2] so the merge
   should be a straightforward cherry-pick.
   - The current version has been rather thoroughly tested and is currently
   deployed in a *10,000+ node federated YARN cluster that's running
   upwards of 50k jobs daily with a reliability of 99.9%*.
   - We have few ideas for follow-up extensions/improvements which are
   tracked in the umbrella JIRA YARN-5597[3].


Documentation:

   - Quick start guide (maven site) - YARN-6484[4].
   - Overall design doc[5] and the slide-deck [6] we used for our talk at
   Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.


Credits:

This is a group effort that could have not been possible without the ideas
and hard work of many other folks and we would like to specifically call
out Giovanni, Botong & Ellen for their invaluable contributions. Also big
thanks to the many folks in community  (Sriram, Kishore, Sarvesh, Jian,
Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith and
many more) that helped us shape our ideas and code with very insightful
feedback and comments.

Cheers,
Subru & Carlo

[1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
[2] https://github.com/apache/hadoop/tree/YARN-2915
[3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
[4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
[5] https://issues.apache.org/jira/secure/attachment/12733292/Ya
rn_federation_design_v1.pdf
[6] https://issues.apache.org/jira/secure/attachment/1281922
9/YARN-Federation-Hadoop-Summit_final.pptx
[7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
[8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673
[9]
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201706.mbox/%3CCAOScs9bSsZ7mzH15Y%2BSPDU8YuNUAq7QicjXpDoX_tKh3MS4HsA%40mail.gmail.com%3E


[jira] [Created] (YARN-6866) Minor clean-up and fixes in anticipation of merge with trunk

2017-07-24 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6866:


 Summary: Minor clean-up and fixes in anticipation of merge with 
trunk
 Key: YARN-6866
 URL: https://issues.apache.org/jira/browse/YARN-6866
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: federation
Reporter: Subru Krishnan
Assignee: Botong Huang


We have done e2e testing of YARN Federation sucessfully and we have minor 
cleans-up like pom version upgrade, redudant "." in configuration string, 
documentation updates etc which we want to clean up before the merge to trunk. 
This jira tracks the fixes we did as described above to ensure proper e2e run.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[DISCUSS] Looking to a 2.9.0 release

2017-07-24 Thread Subru Krishnan
Folks,

With the release for 2.8, we would like to look ahead to 2.9 release as
there are many features/improvements in branch-2 (about 1062 commits), that
are in need of a release vechile.

Here's our first cut of the proposal from the YARN side:

   1. Scheduler improvements (decoupling allocation from node heartbeat,
   allocation ID, concurrency fixes, LightResource etc).
   2. Timeline Service v2
   3. Opportunistic containers
   4. Federation

We would like to hear a formal list from HDFS & Hadoop (& MapReduce if any)
and will update the Roadmap wiki accordingly.

Considering our familiarity with the above mentioned YARN features, we
would like to volunteer as the co-RMs for 2.9.0.

We want to keep the timeline at 8-12 weeks to keep the release pragmatic.

Feedback?

-Subru/Arun


[jira] [Created] (YARN-6821) Move FederationStateStore SQL DDL files from test to main

2017-07-13 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6821:


 Summary: Move FederationStateStore SQL DDL files from test to main
 Key: YARN-6821
 URL: https://issues.apache.org/jira/browse/YARN-6821
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan


The FederationStateStore SQL DDL files are currently in _src/test_ as there's 
no compile time dependency. This jira proposes to move them to _src/main_ to 
ensure they are part of the distro.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6815) [Bug] FederationStateStoreFacade return behavior should be consistent irrespective of whether caching is enabled or not

2017-07-12 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6815:


 Summary: [Bug] FederationStateStoreFacade return behavior should 
be consistent irrespective of whether caching is enabled or not
 Key: YARN-6815
 URL: https://issues.apache.org/jira/browse/YARN-6815
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan


{{FederationStateStoreFacade::getSubCluster/getPolicyConfiguration}} returns 
null if caching is enabled or throws YarnException if caching is disabled if 
the queried entity is absent. This jira proposes to make the return consistent 
to ensure correctness of clients like {{RouterPolicyFacade}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6807) Adding required missing configs to Federation configuration guide based on e2e testing

2017-07-11 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6807:


 Summary: Adding required missing configs to Federation 
configuration guide based on e2e testing
 Key: YARN-6807
 URL: https://issues.apache.org/jira/browse/YARN-6807
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation, federation
Affects Versions: YARN-2915
Reporter: Subru Krishnan
Assignee: Tanuj Nayak


We identified some missing configs that are required for e2e run. This JIRA 
proposes to update the documentation to include the same.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] merging YARN-2915 (Federation) to trunk

2017-06-27 Thread Subru Krishnan
Thanks everyone for the thoughtful comments. We completed the last patch
required (YARN-3659) for testing the YARN-2915 branch e2e yesterday. I'll
go ahead and start a vote thread as soon as the validations are done as
there were no blockers raised.

Cheers,
Subru & Carlo

On Fri, Jun 23, 2017 at 5:11 PM, Wangda Tan <wheele...@gmail.com> wrote:

> Thanks all for working on the feature, I'm in favor of moving forward as
> well.
>
> Best,
> Wangda
>
> On Fri, Jun 23, 2017 at 2:44 PM, Sangjin Lee <sjl...@gmail.com> wrote:
>
>> Thanks for the clarification Subru. I am in favor of moving forward.
>>
>>
>> Sangjin
>>
>> On Thu, Jun 22, 2017 at 6:21 PM, Karthik Shashank Kambatla <
>> ka...@cloudera.com> wrote:
>>
>> > Given RTC and the amount of production testing this feature has
>> received, I
>> > am totally in favor of this merge.
>> >
>> >
>> >
>> > On Tue, Jun 20, 2017 at 4:28 PM, Subru Krishnan <su...@apache.org>
>> wrote:
>> >
>> > > Hi all,
>> > >
>> > > We would like to open a discussion on merging the YARN Federation
>> > > (YARN-2915) [1] feature to trunk.  We have been developing the feature
>> > in a
>> > > feature branch (YARN-2915 [2]) for a while, and we are reasonably
>> > confident
>> > > that the state of the feature meets the criteria to be merged onto
>> trunk.
>> > >
>> > > *Key Ideas*:
>> > >
>> > > YARN’s centralized design allows strict enforcement of scheduling
>> > > invariants and effective resource sharing, but becomes a scalability
>> > > bottleneck (in number of jobs and nodes) well before reaching the
>> scale
>> > of
>> > > our clusters (e.g., 20k-50k nodes).
>> > >
>> > >
>> > > To address these limitations, we developed a scale-out,
>> federation-based
>> > > solution (YARN-2915). Our architecture scales near-linearly to
>> datacenter
>> > > sized clusters, by partitioning nodes across multiple sub-clusters
>> (each
>> > > running a YARN cluster of few thousands nodes). Applications can span
>> > > multiple sub-clusters *transparently (i.e. no code change or
>> > recompilation
>> > > of existing apps)*, thanks to a layer of indirection that negotiates
>> with
>> > > multiple sub-clusters' Resource Managers on behalf of the application.
>> > >
>> > >
>> > > This design is structurally scalable, as it bounds the number of nodes
>> > each
>> > > RM is responsible for. Appropriate policies ensure that the majority
>> of
>> > > applications reside within a single sub-cluster, thus further
>> controlling
>> > > the load on each RM. This provides near linear scale-out by simply
>> adding
>> > > more sub-clusters. The same mechanism enables pooling of resources
>> from
>> > > clusters owned and operated by different teams.
>> > >
>> > > Status:
>> > >
>> > >- The version we would like to merge to trunk is termed "MVP"
>> (minimal
>> > >viable product). The feature will have a complete end-to-end
>> > application
>> > >execution flow with the ability to span a single application across
>> > >multiple YARN (sub) clusters.
>> > >- There were 50+ sub-tasks that were that were completed as part of
>> > this
>> > >effort. Every patch has been reviewed and +1ed by a committer.
>> Thanks
>> > to
>> > >Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough
>> reviews!
>> > >- Federation is designed to be built around YARN and consequently
>> has
>> > >minimal code changes to core YARN. The relevant JIRAs that modify
>> > > existing
>> > >YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid
>> close
>> > >attention to ensure that if federation is disabled there is zero
>> > impact
>> > > to
>> > >existing functionality (disabled by default).
>> > >- We found a few bugs as we went along which we fixed directly
>> > upstream
>> > >in trunk and/or branch-2.
>> > >- We have continuously rebasing the feature branch [2] so the merge
>> > >should be a straightforward cherry-pick.
>> > >- The current version has been rather thoroughly tested a

Re: [DISCUSS] merging YARN-2915 (Federation) to trunk

2017-06-22 Thread Subru Krishnan
Thanks Arun for the vote of confidence.

Sangjin,

Your question is very pertinent and that's why we called it out
specifically in the mail. The short answer is yes, there's absolutely no
impact on keeping it turned off.

The more detailed answer is with Federation there's only *one* integration
point with YARN, i.e. the RMs heartbeat "membership" to the federated
cluster (*YARN-3671*). The heartbeat thread is enabled only when Federation
is turned on. We made a few minor changes in core YARN, every single one
was a refactoring (to reuse code or make something configurable) and bug
fixes. There is zero change both in behavior and data-structures in the RM
(or Scheduler) to support Federation. In fact (except for the heartbeat)
the RM & NM is totally unaware of Federation. In addition, we are also
transparently rolling out Federation in our clusters, i.e. moving from
multiple disjoint YARN clusters to federated ones independent of the apps
and we have not observed any issues so far. I suggest looking at *YARN-3671
*as that'll give you a clear picture. And yes, it is manageable (40KB) most
of which is the new heartbeat thread and tests :).

Hope this addresses your concern.


On Thu, Jun 22, 2017 at 11:17 AM, Sangjin Lee <sj...@apache.org> wrote:

> Thanks much Subru, Carlo, and others for working on this tirelessly and
> bringing it to this point!
>
> I haven't spent enough time on the federation code itself to have a strong
> opinion of the quality. I trust the reviews that folks did for individual
> commits.
>
> One ask I have: could you comment on whether it is turned off cleanly
> without any impact whatsoever? There are many who won't turn on federation
> (now) and it is imperative that there is as little impact as possible when
> this is merged. I'm thinking of behavior, memory footprint of the RM, and
> so on, when federation is turned off.
>
> Sangjin
>
> On Thu, Jun 22, 2017 at 10:59 AM, Arun Suresh <asur...@apache.org> wrote:
>
>> Thanks for all the work on this Subru, Carlo et al.
>> I think we should proceed with a merge vote.
>>
>> Cheers
>> -Arun
>>
>> On Tue, Jun 20, 2017 at 4:28 PM, Subru Krishnan <su...@apache.org> wrote:
>>
>> > Hi all,
>> >
>> > We would like to open a discussion on merging the YARN Federation
>> > (YARN-2915) [1] feature to trunk.  We have been developing the feature
>> in a
>> > feature branch (YARN-2915 [2]) for a while, and we are reasonably
>> confident
>> > that the state of the feature meets the criteria to be merged onto
>> trunk.
>> >
>> > *Key Ideas*:
>> >
>> > YARN’s centralized design allows strict enforcement of scheduling
>> > invariants and effective resource sharing, but becomes a scalability
>> > bottleneck (in number of jobs and nodes) well before reaching the scale
>> of
>> > our clusters (e.g., 20k-50k nodes).
>> >
>> >
>> > To address these limitations, we developed a scale-out, federation-based
>> > solution (YARN-2915). Our architecture scales near-linearly to
>> datacenter
>> > sized clusters, by partitioning nodes across multiple sub-clusters (each
>> > running a YARN cluster of few thousands nodes). Applications can span
>> > multiple sub-clusters *transparently (i.e. no code change or
>> recompilation
>> > of existing apps)*, thanks to a layer of indirection that negotiates
>> with
>> > multiple sub-clusters' Resource Managers on behalf of the application.
>> >
>> >
>> > This design is structurally scalable, as it bounds the number of nodes
>> each
>> > RM is responsible for. Appropriate policies ensure that the majority of
>> > applications reside within a single sub-cluster, thus further
>> controlling
>> > the load on each RM. This provides near linear scale-out by simply
>> adding
>> > more sub-clusters. The same mechanism enables pooling of resources from
>> > clusters owned and operated by different teams.
>> >
>> > Status:
>> >
>> >- The version we would like to merge to trunk is termed "MVP"
>> (minimal
>> >viable product). The feature will have a complete end-to-end
>> application
>> >execution flow with the ability to span a single application across
>> >multiple YARN (sub) clusters.
>> >- There were 50+ sub-tasks that were that were completed as part of
>> this
>> >effort. Every patch has been reviewed and +1ed by a committer.
>> Thanks to
>> >Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
>&g

[DISCUSS] merging YARN-2915 (Federation) to trunk

2017-06-20 Thread Subru Krishnan
Hi all,

We would like to open a discussion on merging the YARN Federation
(YARN-2915) [1] feature to trunk.  We have been developing the feature in a
feature branch (YARN-2915 [2]) for a while, and we are reasonably confident
that the state of the feature meets the criteria to be merged onto trunk.

*Key Ideas*:

YARN’s centralized design allows strict enforcement of scheduling
invariants and effective resource sharing, but becomes a scalability
bottleneck (in number of jobs and nodes) well before reaching the scale of
our clusters (e.g., 20k-50k nodes).


To address these limitations, we developed a scale-out, federation-based
solution (YARN-2915). Our architecture scales near-linearly to datacenter
sized clusters, by partitioning nodes across multiple sub-clusters (each
running a YARN cluster of few thousands nodes). Applications can span
multiple sub-clusters *transparently (i.e. no code change or recompilation
of existing apps)*, thanks to a layer of indirection that negotiates with
multiple sub-clusters' Resource Managers on behalf of the application.


This design is structurally scalable, as it bounds the number of nodes each
RM is responsible for. Appropriate policies ensure that the majority of
applications reside within a single sub-cluster, thus further controlling
the load on each RM. This provides near linear scale-out by simply adding
more sub-clusters. The same mechanism enables pooling of resources from
clusters owned and operated by different teams.

Status:

   - The version we would like to merge to trunk is termed "MVP" (minimal
   viable product). The feature will have a complete end-to-end application
   execution flow with the ability to span a single application across
   multiple YARN (sub) clusters.
   - There were 50+ sub-tasks that were that were completed as part of this
   effort. Every patch has been reviewed and +1ed by a committer. Thanks to
   Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
   - Federation is designed to be built around YARN and consequently has
   minimal code changes to core YARN. The relevant JIRAs that modify existing
   YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
   attention to ensure that if federation is disabled there is zero impact to
   existing functionality (disabled by default).
   - We found a few bugs as we went along which we fixed directly upstream
   in trunk and/or branch-2.
   - We have continuously rebasing the feature branch [2] so the merge
   should be a straightforward cherry-pick.
   - The current version has been rather thoroughly tested and is currently
   deployed in a *10,000+ node federated YARN cluster that's running
   upwards of 50k jobs daily with a reliability of 99.9%*.
   - We have few ideas for follow-up extensions/improvements which are
   tracked in the umbrella JIRA YARN-5597[3].


Documentation:

   - Quick start guide (maven site) - YARN-6484[4].
   - Overall design doc[5] and the slide-deck [6] we used for our talk at
   Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.


Credits:

This is a group effort that could have not been possible without the ideas
and hard work of many other folks and we would like to specifically call
out Giovanni, Botong & Ellen for their invaluable contributions. Also big
thanks to the many folks in community  (Sriram, Kishore, Sarvesh, Jian,
Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith and
many more) that helped us shape our ideas and code with very insightful
feedback and comments.

We plan to start the merge vote in the next week or so. The branch is close
to complete (~5 patches before one can kick the tires on a running
deployment). Please look through the branch; feedback is welcome. Thanks!

Cheers,
Subru & Carlo

[1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
[2] https://github.com/apache/hadoop/tree/YARN-2915
[3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
[4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
[5] https://issues.apache.org/jira/secure/attachment/12733292/
Yarn_federation_design_v1.pdf
[6] https://issues.apache.org/jira/secure/attachment/1281922
9/YARN-Federation-Hadoop-Summit_final.pptx
[7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
[8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673


[jira] [Created] (YARN-6724) Add ability to blacklist sub-clusters when invoking Routing policies

2017-06-20 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6724:


 Summary: Add ability to blacklist sub-clusters when invoking 
Routing policies
 Key: YARN-6724
 URL: https://issues.apache.org/jira/browse/YARN-6724
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: router
Reporter: Subru Krishnan
Assignee: Giovanni Matteo Fumarola


The {{Router}} service invokes the policy to determine the next sub-cluster to 
route the client request to. Since the current policy implementations are 
stateless, we need the ability to pass in blacklisted sub-clusters (which 
failed previously) to prevent selection of non-responsive sub-clusters.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6691) Update YARN daemon startup/shutdown scripts to include Router service

2017-06-05 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6691:


 Summary: Update YARN daemon startup/shutdown scripts to include 
Router service
 Key: YARN-6691
 URL: https://issues.apache.org/jira/browse/YARN-6691
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Giovanni Matteo Fumarola


YARN-5410 introduce a new YARN service, i.e. Router. This jira proposes to 
update YARN daemon startup/shutdown scripts to include Router service.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6687) Validate that the duration of the periodic reservation is less than the periodicity

2017-06-02 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6687:


 Summary: Validate that the duration of the periodic reservation is 
less than the periodicity
 Key: YARN-6687
 URL: https://issues.apache.org/jira/browse/YARN-6687
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: reservation system
Reporter: Subru Krishnan
Assignee: Subru Krishnan


Duration of a reservation (endTime - startTime) is specified independently of 
the recurrence. This can lead to requests where the duration exceeds the 
periodicity, for e.g: an hourly reservation that runs for 2 hours. This will 
result in multiple concurrently active instances for the same reservation which 
is hard both to manage as well as grok. This jira proposes to reject such 
requests by clearly informing the user.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6485) [Regression] TestFederationRMStateStoreService is failing with null pointer exception

2017-05-30 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan resolved YARN-6485.
--
   Resolution: Works for Me
Fix Version/s: YARN-2915

I am closing this as I ran {{TestFederationRMStateStoreService}} sucessfully 
multiple times locally. Also the test is passing now also in YARN-3666. 

> [Regression] TestFederationRMStateStoreService is failing with null pointer 
> exception
> -
>
> Key: YARN-6485
> URL: https://issues.apache.org/jira/browse/YARN-6485
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>    Reporter: Subru Krishnan
>    Assignee: Subru Krishnan
> Fix For: YARN-2915
>
>
> TestFederationRMStateStoreService is failing with null pointer exception. 
> This looks to be a regression caused by YARN-5602



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6634) [API] Define an API for ResourceManager WebServices

2017-05-22 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6634:


 Summary: [API] Define an API for ResourceManager WebServices
 Key: YARN-6634
 URL: https://issues.apache.org/jira/browse/YARN-6634
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.8.0
Reporter: Subru Krishnan
Priority: Critical


The RM exposes few REST queries but there's no clear API interface defined. 
This makes it painful to build either clients or extension components like 
Router (YARN-5412) that expose REST interfaces themselves. This jira proposes 
adding a RM WebServices protocol similar to the one we have for RPC, i.e. 
{{ApplicationClientProtocol}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6542) Fix the logger in TestAlignedPlanner and TestGreedyReservationAgent

2017-05-01 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6542:


 Summary: Fix the logger in TestAlignedPlanner and 
TestGreedyReservationAgent
 Key: YARN-6542
 URL: https://issues.apache.org/jira/browse/YARN-6542
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: reservation system
Reporter: Subru Krishnan
Assignee: Subru Krishnan


Jetty loggers are used in  TestAlignedPlanner and TestGreedyReservationAgent 
while we actually need to use slf4j, this jira proposes switching the logger 
with slf4j.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6541) Optimize {{PeriodicRLESparseResourceAllocation}} by auto-expanding it's time period

2017-05-01 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6541:


 Summary: Optimize {{PeriodicRLESparseResourceAllocation}} by 
auto-expanding it's time period
 Key: YARN-6541
 URL: https://issues.apache.org/jira/browse/YARN-6541
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan


YARN-5531 adds a the specified {{PeriodicRLESparseResourceAllocation}} that 
represents periodic allocation of resources.  It seeds the period directly 
using the max timePeriod which will result in storing multiple instances if the 
user requested periods are small. For e.g: 24 instances of an hourly job if 
timePeriod is one day. We need the max to prevent unbounded time period if user 
decides to input prime numbers but we can auto-expand instead of statically 
fixing it as the period



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6485) [Regression] TestFederationRMStateStoreService is failing with null pointer exception

2017-04-14 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6485:


 Summary: [Regression] TestFederationRMStateStoreService is failing 
with null pointer exception
 Key: YARN-6485
 URL: https://issues.apache.org/jira/browse/YARN-6485
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Subru Krishnan


TestFederationRMStateStoreService is failing with null pointer exception. This 
looks to be a regression caused by YARN-5602



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6484) [Documentation] Documenting the YARN Federation feature

2017-04-14 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6484:


 Summary: [Documentation] Documenting the YARN Federation feature
 Key: YARN-6484
 URL: https://issues.apache.org/jira/browse/YARN-6484
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan


We should document the high level design and configuration to enable YARN 
Federation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Two AMs in one YARN container?

2017-03-17 Thread Subru Krishnan
Thanks Arun for the heads-up.

Hi Sergiy,

We do run an UAM pool under one process (AMRMProxyService in NM) as that's
the mechanism we use to span a single job across multiple clusters that are
under federation. This is achieved by using the doAs method in
UserGroupInformation, exactly as Jason pointed out.

The e2e *prototype* code (and docs/slides) is available in the Federation
umbrella jira:
https://issues.apache.org/jira/browse/YARN-2915

I have created a utility class that's used throughout YARN Federation to
create RMProxies per UGI - FederationProxyProviderUtil

(as part of YARN-3673 ),
which should provide a good starting point for you.

You should also keep an eye on UAM pool JIRA which Botong is working on
right now:
https://issues.apache.org/jira/browse/YARN-5531

-Subru


On Thu, Mar 16, 2017 at 2:49 PM, Arun Suresh  wrote:

> Hey Sergiy,
>
> I think a similar approach IIUC, where an AM for a app running on a
> cluster acts as an unmanaged AM on another cluster. I believe they use a
> separate UGI for each sub-cluster and wrap it around a doAs before the
> actual allocate call.
>
> Subru might be able to give more details.
>
> Cheers
> -Arun
>
> On Thu, Mar 16, 2017 at 2:34 PM, Jason Lowe 
> wrote:
>
>> The doAs method in UserGroupInformation is what you want when dealing
>> with multiple UGIs.  It determines what UGI instance the code within the
>> doAs scope gets when that code tries to lookup the current user.
>> Each AM is designed to run in a separate JVM, so each has some
>> main()-like entry point that does everything to setup the AM.
>> Theoretically all you need to do is create two, separate UGIs then use each
>> instance to perform a doAs wrapping the invocation of the corresponding
>> AM's entry point.  After that, everything that AM does will get the UGI of
>> the doAs invocation as the current user.  Since the AMs are running in
>> separate doAs instances they will get separate UGIs for the current user
>> and thus separate credentials.
>> Jason
>>
>>
>> On Thursday, March 16, 2017 4:03 PM, Sergiy Matusevych <
>> sergiy.matusev...@gmail.com> wrote:
>>
>>
>>  Hi Jason,
>>
>> Thanks a lot for your help again! Having two separate
>> UserGroupInformation instances is exactly what I had in mind. What I do not
>> understand, though, is how to make sure that our second call to
>> .regsiterApplicationMaster() will pick the right UserGroupInformation
>> object. I would love to find a way that does not involve any changes to the
>> YARN client, but if we have to patch it, of course, I agree that we need to
>> have a generic yet minimally invasive solution.
>> Thank you!Sergiy.
>>
>>
>> On Thu, Mar 16, 2017 at 8:03 AM, Jason Lowe  wrote:
>> >
>> > I believe a cleaner way to solve this problem is to create two,
>> _separate_ UserGroupInformation objects and wrap each AM instances in a UGI
>> doAs so they aren't trying to share the same credentials.  This is one
>> example of a token bleeding over and causing problems. I suspect trying to
>> fix these one-by-one as they pop up is going to be frustrating compared to
>> just ensuring the credentials remain separate as if they really were
>> running in separate JVMs.
>> >
>> > Adding Daryn who knows a lot more about the UGI stuff so he can correct
>> any misunderstandings on my part.
>> >
>> > Jason
>> >
>> >
>> > On Wednesday, March 15, 2017 1:11 AM, Sergiy Matusevych <
>> sergiy.matusev...@gmail.com> wrote:
>> >
>> >
>> > Hi YARN developers,
>> >
>> > I have an interesting problem that I think is related to YARN Java
>> client.
>> > I am trying to launch *two* application masters in one container. To be
>> > more specific, I am starting a Spark job on YARN, and launch an Apache
>> REEF
>> > Unmanaged AM from the Spark Driver.
>> >
>> > Technically, YARN Resource Manager should not care which process each AM
>> > runs in. However, there is a problem with the YARN Java client
>> > implementation: there is a global UserGroupInformation object that holds
>> > the user credentials of the current RM session. This data structure is
>> > shared by all AMs, and when REEF application tries to register the
>> second
>> > (unmanaged) AM, the client library presents to YARN RM all credentials,
>> > including the security token of the first (managed) AM. YARN rejects
>> such
>> > registration request, throwing InvalidApplicationMasterRequestException
>> > "Application Master is already registered".
>> >
>> > I feel like this issue can be resolved by a relatively small update to
>> the
>> > YARN Java client - e.g. by introducing a new variant of the
>> > AMRMClientAsync.registerApplicationMaster() that would take 

[jira] [Created] (YARN-6234) Support multiple attempts on the node when AMRMProxy is enabled

2017-02-24 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6234:


 Summary: Support multiple attempts on the node when AMRMProxy is 
enabled
 Key: YARN-6234
 URL: https://issues.apache.org/jira/browse/YARN-6234
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: amrmproxy, federation, nodemanager
Affects Versions: 3.0.0-alpha1, 2.8.0
Reporter: Subru Krishnan
Assignee: Giovanni Matteo Fumarola


Currently {{AMRMProxy}} initializes an interceptor chain pipeline for every 
active AM in the node but it doesn't clean up & reinitialize correctly if 
there's a second attempt for any AM in the same node. This jira is to track the 
changes required to support multiple attempts on the node when AMRMProxy is 
enabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6128) Add support for AMRMProxy HA

2017-01-27 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6128:


 Summary: Add support for AMRMProxy HA
 Key: YARN-6128
 URL: https://issues.apache.org/jira/browse/YARN-6128
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: amrmproxy, nodemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan


YARN-1336 added the ability to restart NM without loosing any running 
containers. In a Federated YARN environment, there's additional state in the 
{{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need to 
enhance {{AMRMProxy}} to support work-preserving restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled

2017-01-27 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6127:


 Summary: Add support for work preserving NM restart when AMRMProxy 
is enabled
 Key: YARN-6127
 URL: https://issues.apache.org/jira/browse/YARN-6127
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: amrmproxy, nodemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan


YARN-1336 added the ability to restart NM without loosing any running 
containers. In a Federated YARN environment, there's additional state in the 
{{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need to 
enhance {{AMRMProxy}} to support work-preserving restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6083) Add doc for reservation in Fair Scheduler

2017-01-11 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan resolved YARN-6083.
--
Resolution: Duplicate

[~yufeigu], IIUC this is a duplicate of YARN-4827. I already have a draft 
version of the doc but I have not uploaded as I am not able to run 
{{ReservationSystem}} e2e with {{FairScheduler}} as I am blocked by YARN-4859

> Add doc for reservation in Fair Scheduler
> -
>
> Key: YARN-6083
> URL: https://issues.apache.org/jira/browse/YARN-6083
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>
> We can enable reservation on a leaf queue by set the  tag for 
> the queue, there is not doc for this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5905) Update the RM webapp host that is reported as part of Federation membership to current primary RM's IP

2016-11-17 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5905:


 Summary: Update the RM webapp host that is reported as part of 
Federation membership to current primary RM's IP
 Key: YARN-5905
 URL: https://issues.apache.org/jira/browse/YARN-5905
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: federation, resourcemanager
Affects Versions: YARN-2915
Reporter: Subru Krishnan
Assignee: Subru Krishnan
Priority: Minor


Currently when RM HA is enabled, the webapp host is randomly picked from one of 
the ensemble RMs and relies on redirect to pick the active primary RM. This has 
a few shortcomings:
  * There's an overhead of additional network hop.
  * Sometimes the rmId selected might be an instance which is 
inactive/decommissioned
  * In few of our clusters, we have redirects disabled (either in client or 
server side) and then the invocation fails.

This JIRA proposes updating the RM webapp host that is reported as part of 
Federation membership to the current primary RM's IP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5904) Reduce the number of server threads for AMRMProxy

2016-11-17 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5904:


 Summary: Reduce the number of server threads for AMRMProxy
 Key: YARN-5904
 URL: https://issues.apache.org/jira/browse/YARN-5904
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0-alpha1, 2.8.0
Reporter: Subru Krishnan
Assignee: Subru Krishnan
Priority: Minor


The default value of the number of server threads for AMRMProxy uses the 
standard default viz 25. This is way too many as the max number we need is the 
number of concurrently active AMs in the node. So this JIRA proposes to reduce 
the default to 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



YARN bug bash summary

2016-10-28 Thread Subru Krishnan
Folks,

First up big thanks to everyone who turned up and participated
enthusiastically for the YARN bug bash event. A special shout out to Sunil
for organizing a parallel event in Bangalore, that too on a very short
notice.

It was great to see people from such a wide range of companies (Microsoft,
Hortonworks, Cloudera, Twitter, Huawei, LinkedIn, Yahoo! to name a few)
pull together for a dedicated common cause. The best part was to see
passionate discussions even at the fag end of a (long) day!

Some highlights from the event:

   - We had *22* contributors/committers physically attending in Bay Area
   with more than *10* contributors/committers joining online during the
   day.
   - *Triaged 200+ tickets*:
   
https://issues.apache.org/jira/issues/?jql=project%20%3D%20YARN%20AND%20labels%20in%20(oct16-easy%2C%20oct16-medium%2C%20oct16-hard)
   - We managed to reduce the patch queue by more than *100*, down from 267
   to below 160:
   
https://issues.apache.org/jira/issues/?jql=project%20%3D%20YARN%20AND%20status%20%3D%20%22Patch%20Available%22%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC.
   I see the count has increased already today :).
   - *25 JIRAs* *resolved * during the event itself:
   
https://issues.apache.org/jira/issues/?jql=project%20%3D%20YARN%20AND%20labels%20in%20(oct16-easy%2C%20oct16-medium%2C%20oct16-hard)%20and%20status%20%3D%20Resolved
   - Review feedback and rebasing of many more patches!

 Summary from Bangalore:

*Committed Patches*

*Added/Rebased Patches*

*Total*

*YARN*

15

4

19

*HDFS*

2

5

7

*HADOOP*

0

2

2


Next Steps:

Though it is satisfying to see a noticeable uptick in YARN issue resolution
(
https://issues.apache.org/jira/browse/YARN/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel),
we need to ensure that we sustain the momentum:

   - Bring JIRAs that were reviewed to a closure.
   - Have more such events at a regular cadence to keep the patch queue at
   manageable levels.


Thanks again to everyone for enabling a successful event through such
active participation.

Cheers,
Subru


[jira] [Created] (YARN-5711) AM cannot reconnect to RM after failover when using RequestHedgingRMFailoverProxyProvider

2016-10-05 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5711:


 Summary: AM cannot reconnect to RM after failover when using 
RequestHedgingRMFailoverProxyProvider
 Key: YARN-5711
 URL: https://issues.apache.org/jira/browse/YARN-5711
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, resourcemanager
Affects Versions: 3.0.0-alpha1, 2.9.0
Reporter: Subru Krishnan
Priority: Critical


When RM failsover, it does _not_ auto re-register running apps and so they need 
to re-register when reconnecting to new primary. This is done by catching 
{{ApplicationMasterNotRegisteredException}} in *allocate* calls and 
re-registering. But *RequestHedgingRMFailoverProxyProvider* does _not_ 
propagate {{YarnException}} as the actual invocation is done asynchronously 
using seperate threads.

This JIRA proposes that the *RequestHedgingRMFailoverProxyProvider* propagate 
any {{YarnException}} that it encounters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5702) Refactor TestPBImplRecords so that we can reuse for testing protocol records in other YARN modules

2016-10-03 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5702:


 Summary: Refactor TestPBImplRecords so that we can reuse for 
testing protocol records in other YARN modules
 Key: YARN-5702
 URL: https://issues.apache.org/jira/browse/YARN-5702
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Subru Krishnan
Assignee: Subru Krishnan


The {{TestPBImplRecords}} has generic helper methods to validate YARN api 
records. This JIRA proposes to refactor the generic helper methods into a base 
class that can then be reused by other YARN modules for testing internal API 
protocol records like in yarn-server-common for Federation (YARN-2915). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5604) Add versioning for FederationStateStore

2016-08-30 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5604:


 Summary: Add versioning for FederationStateStore
 Key: YARN-5604
 URL: https://issues.apache.org/jira/browse/YARN-5604
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Giovanni Matteo Fumarola


Currently we don't have versioning (null version) for the 
FederationStateStore.This JIRA proposes add versioning support that is needed 
to support upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5603) Metrics for Federation entities like StateStore/Router/AMRMProxy

2016-08-30 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5603:


 Summary: Metrics for Federation entities like 
StateStore/Router/AMRMProxy
 Key: YARN-5603
 URL: https://issues.apache.org/jira/browse/YARN-5603
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Giovanni Matteo Fumarola


This JIRA proposes addition of metrics for Federation entities like 
StateStore/Router/AMRMProxy etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5601) Make the RM epoch base value configurable

2016-08-30 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5601:


 Summary: Make the RM epoch base value configurable
 Key: YARN-5601
 URL: https://issues.apache.org/jira/browse/YARN-5601
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Subru Krishnan


Currently the epoch always starts from zero. This can cause container ids to 
conflict for an application under Federation that spans multiple RMs 
concurrently. This JIRA proposes to make the RM epoch base value configurable 
which will allow us to avoid conflicts by setting different values for each RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5597) YARN Federation phase 2

2016-08-30 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5597:


 Summary: YARN Federation phase 2
 Key: YARN-5597
 URL: https://issues.apache.org/jira/browse/YARN-5597
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Subru Krishnan


This umbrella JIRA tracks set of improvements over the YARN Federation MVP 
(YARN-2915)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-3665) Federation subcluster membership mechanisms

2016-08-30 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan resolved YARN-3665.
--
  Resolution: Implemented
Hadoop Flags: Reviewed

Closing this as YARN-3671 includes this too.

> Federation subcluster membership mechanisms
> ---
>
> Key: YARN-3665
> URL: https://issues.apache.org/jira/browse/YARN-3665
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>    Reporter: Subru Krishnan
>    Assignee: Subru Krishnan
>
> The member YARN RMs continuously heartbeat to the state store to keep alive 
> and publish their current capability/load information. This JIRA tracks this 
> mechanisms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5531) UnmanagedAM pool manager for federating application across clusters

2016-08-16 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5531:


 Summary: UnmanagedAM pool manager for federating application 
across clusters
 Key: YARN-5531
 URL: https://issues.apache.org/jira/browse/YARN-5531
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Sarvesh Sakalanaga


One of the main tenets the YARN Federation is to *transparently* scale 
applications across multiple clusters. This is achieved by running UAMs on 
behalf of the application on other clusters. This JIRA tracks the addition of a 
UnmanagedAM pool manager for federating application across clusters which will 
be used the FederationInterceptor (YARN-3666) which is part of the AMRMProxy 
pipeline introduced in YARN-2884.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5467) InputValidator for the FederationStateStore internal APIs

2016-08-02 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5467:


 Summary: InputValidator for the FederationStateStore internal APIs
 Key: YARN-5467
 URL: https://issues.apache.org/jira/browse/YARN-5467
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Giovanni Matteo Fumarola


We need to check for mandatory fields, well formed-ness (for address fields) 
etc of input params to FederationStateStore. This is common across all Store 
implementations and can be used as a _fail-fast_ mechanism on the client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5447) Consider including allocationRequestId in NMContainerStatus to allow recovery in case of RM failover

2016-07-28 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5447:


 Summary: Consider including allocationRequestId in 
NMContainerStatus to allow recovery in case of RM failover
 Key: YARN-5447
 URL: https://issues.apache.org/jira/browse/YARN-5447
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Subru Krishnan


We have added a mapping of the allocated container to the original request 
through YARN-4887/YARN-4888. This JIRA tracks the changes required to include 
the allocationRequestId in NMContainerStatus to allow recovery in case of RM 
failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5441) Fixing minor Scheduler test case failures

2016-07-27 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5441:


 Summary: Fixing minor Scheduler test case failures
 Key: YARN-5441
 URL: https://issues.apache.org/jira/browse/YARN-5441
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Subru Krishnan
Assignee: Subru Krishnan


YARN-5351 added {{ExecutionTypeRequest}} to the {{ResourceRequest}} comparator 
but the ResourceRequest object created via utility methods in few tests like 
*TestFifoScheduler* and  *TestCapacityScheduler* do not set 
ExecutionTypeRequest which results in null pointer exceptions. This JIRA 
proposes a simple fix by setting ExecutionTypeRequest in the utility methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-3927) Make the NodeManager's ContainerManager pluggable

2016-07-20 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan resolved YARN-3927.
--
Resolution: Workaround

We ended up adding AMRMProxy as a first class service in NM.

> Make the NodeManager's ContainerManager pluggable
> -
>
> Key: YARN-3927
> URL: https://issues.apache.org/jira/browse/YARN-3927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>    Reporter: Subru Krishnan
>    Assignee: Subru Krishnan
>
> YARN-2884 proposes proxying all AM-RM communication for:
>   * perform distributed scheduling decisions (YARN-2877)
>   * throttling mis-behaving AMs
>   * mask the access to a federation of RMs (YARN-2915)
> To enable all of the above, we are implementing the AMRMProxy as an extension 
> to NM's ContainerManagerImpl.This JIRA is for making the ContainerManager 
> pluggable so that we allow dynamically swap in of the AMRMProxy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5413) Create a proxy chain for ResourceManager Admin API in the Router

2016-07-20 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5413:


 Summary: Create a proxy chain for ResourceManager Admin API in the 
Router
 Key: YARN-5413
 URL: https://issues.apache.org/jira/browse/YARN-5413
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan


As detailed in the proposal in the umbrella JIRA, we are introducing a new 
component that routes client request to appropriate ResourceManager(s). This 
JIRA tracks the creation of a proxy for ResourceManager Admin API in the 
Router. This provides a placeholder for:
1) throttling mis-behaving clients (YARN-1546)
3) mask the access to multiple RMs (YARN-3659)

We are planning to follow the interceptor pattern like we did in YARN-2884 to 
generalize the approach and have only dynamically coupling for Federation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5412) Create a proxy chain for ResourceManager REST API in the Router

2016-07-20 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5412:


 Summary: Create a proxy chain for ResourceManager REST API in the 
Router
 Key: YARN-5412
 URL: https://issues.apache.org/jira/browse/YARN-5412
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Giovanni Matteo Fumarola


As detailed in the proposal in the umbrella JIRA, we are introducing a new 
component that routes client request to appropriate ResourceManager(s). This 
JIRA tracks the creation of a proxy for ResourceManager REST API in the Router. 
This provides a placeholder for:
1) throttling mis-behaving clients (YARN-1546)
3) mask the access to multiple RMs (YARN-3659)

We are planning to follow the interceptor pattern like we did in YARN-2884 to 
generalize the approach and have only dynamically coupling for Federation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5411) Create a proxy for ApplicationClientProtocol in the Router

2016-07-20 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5411:


 Summary: Create a proxy for ApplicationClientProtocol in the Router
 Key: YARN-5411
 URL: https://issues.apache.org/jira/browse/YARN-5411
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Giovanni Matteo Fumarola


As detailed in the proposal in the umbrella JIRA, we are introducing a new 
component that routes client request to appropriate ResourceManager(s). This 
JIRA tracks the creation of a proxy for ApplicationClientProtocol in the Router.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5410) Bootstrap Router module

2016-07-20 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5410:


 Summary: Bootstrap Router module
 Key: YARN-5410
 URL: https://issues.apache.org/jira/browse/YARN-5410
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Giovanni Matteo Fumarola


As detailed in the proposal in the umbrella JIRA, we are introducing a new 
component that routes client request to appropriate ResourceManager(s). This 
JIRA tracks the creation of a new sub-module for the Router.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5408) In-memory based implementation of the FederationPolicyStore

2016-07-20 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5408:


 Summary: In-memory based implementation of the 
FederationPolicyStore
 Key: YARN-5408
 URL: https://issues.apache.org/jira/browse/YARN-5408
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Ellen Hui


YARN-3664 defines the FederationPolicyStore API. This JIRA tracks an in-memory 
based implementation which is useful for both single-box testing and for future 
unit tests that depend on the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5407) In-memory based implementation of the FederationMembershipStateStore

2016-07-20 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5407:


 Summary: In-memory based implementation of the 
FederationMembershipStateStore
 Key: YARN-5407
 URL: https://issues.apache.org/jira/browse/YARN-5407
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Ellen Hui


YARN-5307 defines the FederationApplicationStateStore API. This JIRA tracks an 
in-memory based implementation which is useful for both single-box testing and 
for future unit tests that depend on the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5406) In-memory based implementation of the FederationMembershipStateStore

2016-07-20 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5406:


 Summary: In-memory based implementation of the 
FederationMembershipStateStore
 Key: YARN-5406
 URL: https://issues.apache.org/jira/browse/YARN-5406
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Ellen Hui


YARN-3662 defines the FederationMembershipStateStore API. This JIRA tracks an 
in-memory based implementation which is useful for both single-box testing and 
for future unit tests that depend on the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5331) Extend RLESparseResourceAllocation with period for supporting recurring reservations in YARN ReservationSystem

2016-07-06 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5331:


 Summary: Extend RLESparseResourceAllocation with period for 
supporting recurring reservations in YARN ReservationSystem
 Key: YARN-5331
 URL: https://issues.apache.org/jira/browse/YARN-5331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subru Krishnan
Assignee: Sangeetha Abdu Jyothi


YARN-5326 proposes adding native support for recurring reservations in the YARN 
ReservationSystem. This JIRA is a sub-task to add a 
PeriodicRLESparseResourceAllocation. Please refer to the design doc in the 
parent JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5330) SharingPolicy enhancements required to support recurring reservations in the YARN ReservationSystem

2016-07-06 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5330:


 Summary: SharingPolicy enhancements required to support recurring 
reservations in the YARN ReservationSystem
 Key: YARN-5330
 URL: https://issues.apache.org/jira/browse/YARN-5330
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subru Krishnan
Assignee: Sangeetha Abdu Jyothi


YARN-5326 proposes adding native support for recurring reservations in the YARN 
ReservationSystem. This JIRA is a sub-task to track the changes required in 
SharingPolicy to accomplish it. Please refer to the design doc in the parent 
JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5329) ReservationAgent enhancements required to support recurring reservations in the YARN ReservationSystem

2016-07-06 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5329:


 Summary: ReservationAgent enhancements required to support 
recurring reservations in the YARN ReservationSystem
 Key: YARN-5329
 URL: https://issues.apache.org/jira/browse/YARN-5329
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subru Krishnan
Assignee: Sangeetha Abdu Jyothi


YARN-5326 proposes adding native support for recurring reservations in the YARN 
ReservationSystem. This JIRA is a sub-task to track the changes required in 
ReservationAgent to accomplish it. Please refer to the design doc in the parent 
JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5328) InMemoryPlan enhancements required to support recurring reservations in the YARN ReservationSystem

2016-07-06 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5328:


 Summary: InMemoryPlan enhancements required to support recurring 
reservations in the YARN ReservationSystem
 Key: YARN-5328
 URL: https://issues.apache.org/jira/browse/YARN-5328
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subru Krishnan
Assignee: Sangeetha Abdu Jyothi


YARN-5326 proposes adding native support for recurring reservations in the YARN 
ReservationSystem. This JIRA is a sub-task to track the changes required in 
InMemoryPlan to accomplish it. Please refer to the design doc in the parent 
JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem

2016-07-06 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5327:


 Summary: API changes required to support recurring reservations in 
the YARN ReservationSystem
 Key: YARN-5327
 URL: https://issues.apache.org/jira/browse/YARN-5327
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subru Krishnan
Assignee: Sangeetha Abdu Jyothi


YARN-5326 proposes adding native support for recurring reservations in the YARN 
ReservationSystem. This JIRA is a sub-task to track the changes in 
ApplicationClientProtocol to accomplish it. Please refer to the design doc in 
the parent JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5326) Add support for recurring reservations in the YARN ReservationSystem

2016-07-06 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5326:


 Summary: Add support for recurring reservations in the YARN 
ReservationSystem
 Key: YARN-5326
 URL: https://issues.apache.org/jira/browse/YARN-5326
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Subru Krishnan


YARN-1051 introduced a ReservationSytem that enables the YARN RM to handle time 
explicitly, i.e. users can now "reserve" capacity ahead of time which is 
predictably allocated to them. Most SLA jobs are recurring so they need the 
same resources periodically. With the current implementation, users will have 
to make individual reservations for each run. This is an umbrella JIRA to 
enhance the reservation system by adding native support for recurring 
reservations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5307) Federation Application State APIs

2016-07-01 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5307:


 Summary: Federation Application State APIs
 Key: YARN-5307
 URL: https://issues.apache.org/jira/browse/YARN-5307
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan


The Federation State defines the additional state that needs to be maintained 
to loosely couple multiple individual sub-clusters into a single large 
federated cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5300) Exclude generated federation protobuf sources from YARN Javadoc build

2016-06-27 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5300:


 Summary: Exclude generated federation protobuf sources from YARN 
Javadoc build
 Key: YARN-5300
 URL: https://issues.apache.org/jira/browse/YARN-5300
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Subru Krishnan
Priority: Minor


This JIRA is the equivalent of YARN-5132 for generated federation protobuf 
sources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5225) Move to a simple clean AMRMClientImpl (v2)

2016-06-09 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5225:


 Summary: Move to a simple clean AMRMClientImpl (v2)
 Key: YARN-5225
 URL: https://issues.apache.org/jira/browse/YARN-5225
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan


YARN-4879 proposes the addition of a simple delta allocate protocol. This JIRA 
is to track the changes in AMRMClient to accomplish it. The detailed proposal 
is in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5203) Return ResourceRequest JAXB object in ResourceManager Cluster Applications REST API

2016-06-06 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5203:


 Summary: Return ResourceRequest JAXB object in ResourceManager 
Cluster Applications REST API
 Key: YARN-5203
 URL: https://issues.apache.org/jira/browse/YARN-5203
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Subru Krishnan


The ResourceManager Cluster Applications REST API returns {{ResourceRequest}} 
as String rather than a JAXB object. This prevents downstream tools like 
Federation Router (YARN-3659) that depend on the REST API to unmarshall the 
{{AppInfo}}. This JIRA proposes updating {{AppInfo}} to return a JAXB version 
of the {{ResourceRequest}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5137) Make DiskChecker pluggable

2016-05-25 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan resolved YARN-5137.
--
Resolution: Duplicate

I am closing this as a duplicate of YARN-4271 as that covers the pluggable 
health checker service for NM. 

> Make DiskChecker pluggable
> --
>
> Key: YARN-5137
> URL: https://issues.apache.org/jira/browse/YARN-5137
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ray Chiang
>Assignee: Yufei Gu
>  Labels: supportability
>
> It would be nice to have the option for a DiskChecker that has more 
> sophisticated checking capabilities.  In order to do this, we would first 
> need DiskChecker to be pluggable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5132) Exclude generated protobuf sources from YARN Javadoc build

2016-05-23 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5132:


 Summary: Exclude generated protobuf sources from YARN Javadoc build
 Key: YARN-5132
 URL: https://issues.apache.org/jira/browse/YARN-5132
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Subru Krishnan
Assignee: Subru Krishnan
Priority: Critical


Currently YARN build includes Javadoc from generated protobuf sources which is 
causing CI to fail. This JIRA proposes to exclude generated protobuf sources 
from YARN Javadoc build



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-4957) Add getNewReservation in ApplicationClientProtocol

2016-04-13 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-4957:


 Summary: Add getNewReservation in ApplicationClientProtocol
 Key: YARN-4957
 URL: https://issues.apache.org/jira/browse/YARN-4957
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: applications, client, resourcemanager
Reporter: Subru Krishnan
Assignee: Sean Po


Currently submitReservation returns a ReservationId if sucessful. This JIRA 
propose adding a getNewReservation in ApplicationClientProtocol for the 
following reasons:
  * Prevent zombie reservations in the face of client and/or network failures 
post submitReservation
  * Align reservation submission with application submission



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4903) Document how to pass ReservationId through the RM REST API

2016-04-08 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan resolved YARN-4903.
--
Resolution: Duplicate

Closing as duplicate YARN-4625 includes updates to the documentation to specify 
ReservationId in RM REST API.

> Document how to pass ReservationId through the RM REST API
> --
>
> Key: YARN-4903
> URL: https://issues.apache.org/jira/browse/YARN-4903
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>    Reporter: Subru Krishnan
>    Assignee: Subru Krishnan
>
> YARN-4625 added the reservation-id field to the RM submitApplication REST 
> API. This JIRA is to add the corresponding documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >