Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Subramaniam V K
Allen, can we bump up the maven surefire heap size to max (if it already is
not) for the branch-2 nightly build and see if it helps?

Thanks,
Subru

On Tue, Oct 24, 2017 at 4:22 PM, Allen Wittenauer 
wrote:

>
> > On Oct 24, 2017, at 4:10 PM, Andrew Wang 
> wrote:
> >
> > FWIW we've been running branch-3.0 unit tests successfully internally,
> though we have separate jobs for Common, HDFS, YARN, and MR. The failures
> here are probably a property of running everything in the same JVM, which
> I've found problematic in the past due to OOMs.
>
> Last time I looked, surefire was configured to launch unit tests
> in different JVMs.  But that might only be true in trunk.  Or maybe only
> for some of the subprojects.
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Allen Wittenauer

> On Oct 24, 2017, at 4:10 PM, Andrew Wang  wrote:
> 
> FWIW we've been running branch-3.0 unit tests successfully internally, though 
> we have separate jobs for Common, HDFS, YARN, and MR. The failures here are 
> probably a property of running everything in the same JVM, which I've found 
> problematic in the past due to OOMs.

Last time I looked, surefire was configured to launch unit tests in 
different JVMs.  But that might only be true in trunk.  Or maybe only for some 
of the subprojects.  
-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Andrew Wang
FWIW we've been running branch-3.0 unit tests successfully internally,
though we have separate jobs for Common, HDFS, YARN, and MR. The failures
here are probably a property of running everything in the same JVM, which
I've found problematic in the past due to OOMs.

On Tue, Oct 24, 2017 at 4:04 PM, Allen Wittenauer 
wrote:

>
> My plan is currently to:
>
> *  switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561
> patch to test it out.
> * if the tests work, work on getting YETUS-561 committed to yetus master
> * switch jobs back to ASF yetus master either post-YETUS-561 or without it
> if it doesn’t work
> * go back to working on something else, regardless of the outcome
>
>
> > On Oct 24, 2017, at 2:55 PM, Chris Douglas  wrote:
> >
> > Sean/Junping-
> >
> > Ignoring the epistemology, it's a problem. Let's figure out what's
> > causing memory to balloon and then we can work out the appropriate
> > remedy.
> >
> > Is this reproducible outside the CI environment? To Junping's point,
> > would YETUS-561 provide more detailed information to aid debugging? -C
> >
> > On Tue, Oct 24, 2017 at 2:50 PM, Junping Du  wrote:
> >> In general, the "solid evidence" of memory leak comes from analysis of
> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which
> piece of code are leaking memory from the analysis.
> >>
> >> Unfortunately, I cannot find any conclusion from previous comments and
> it even cannot tell which daemons/components of HDFS consumes unexpected
> high memory. Don't sounds like a solid bug report to me.
> >>
> >>
> >>
> >> Thanks,?
> >>
> >>
> >> Junping
> >>
> >>
> >> 
> >> From: Sean Busbey 
> >> Sent: Tuesday, October 24, 2017 2:20 PM
> >> To: Junping Du
> >> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev;
> mapreduce-...@hadoop.apache.org; yarn-dev@hadoop.apache.org
> >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
> >>
> >> Just curious, Junping what would "solid evidence" look like? Is the
> supposition here that the memory leak is within HDFS test code rather than
> library runtime code? How would such a distinction be shown?
> >>
> >> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du  > wrote:
> >> Allen,
> >> Do we have any solid evidence to show the HDFS unit tests going
> through the roof are due to serious memory leak by HDFS? Normally, I don't
> expect memory leak are identified in our UTs - mostly, it (test jvm gone)
> is just because of test or deployment issues.
> >> Unless there is concrete evidence, my concern on seriously memory
> leak for HDFS on 2.8 is relatively low given some companies (Yahoo,
> Alibaba, etc.) have deployed 2.8 on large production environment for
> months. Non-serious memory leak (like forgetting to close stream in
> non-critical path, etc.) and other non-critical bugs always happens here
> and there that we have to live with.
> >>
> >> Thanks,
> >>
> >> Junping
> >>
> >> 
> >> From: Allen Wittenauer  a...@effectivemachines.com>>
> >> Sent: Tuesday, October 24, 2017 8:27 AM
> >> To: Hadoop Common
> >> Cc: Hdfs-dev; mapreduce-...@hadoop.apache.org hadoop.apache.org>; yarn-dev@hadoop.apache.org yarn-dev@hadoop.apache.org>
> >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
> >>
> >>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer <
> a...@effectivemachines.com> wrote:
> >>>
> >>>
> >>>
> >>> With no other information or access to go on, my current hunch is that
> one of the HDFS unit tests is ballooning in memory size.  The easiest way
> to kill a Linux machine is to eat all of the RAM, thanks to overcommit and
> that's what this "feels" like.
> >>>
> >>> Someone should verify if 2.8.2 has the same issues before a release
> goes out ...
> >>
> >>
> >>FWIW, I ran 2.8.2 last night and it has the same problems.
> >>
> >>Also: the node didn't die!  Looking through the workspace (so
> the next run will destroy them), two sets of logs stand out:
> >>
> >> https://builds.apache.org/job/hadoop-qbt-branch2-java7-
> linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
> >>
> >>and
> >>
> >> https://builds.apache.org/job/hadoop-qbt-branch2-java7-
> linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
> >>
> >>It looks like my hunch is correct:  RAM in the HDFS unit tests
> are going through the roof.  It's also interesting how MANY log files there
> are.  Is surefire not picking up that jobs are dying?  Maybe not if memory
> is getting tight.
> >>
> >>Anyway, at the point, branch-2.8 and higher are probably
> fubar'd. Additionally, I've filed YETUS-561 so that Yetus-controlled Docker
> containers can have their RAM limits set in order to prevent more nodes
> going catatonic.
> >>
> >>
> >>
> >> --

Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Allen Wittenauer

My plan is currently to:

*  switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561 
patch to test it out. 
* if the tests work, work on getting YETUS-561 committed to yetus master
* switch jobs back to ASF yetus master either post-YETUS-561 or without it if 
it doesn’t work
* go back to working on something else, regardless of the outcome


> On Oct 24, 2017, at 2:55 PM, Chris Douglas  wrote:
> 
> Sean/Junping-
> 
> Ignoring the epistemology, it's a problem. Let's figure out what's
> causing memory to balloon and then we can work out the appropriate
> remedy.
> 
> Is this reproducible outside the CI environment? To Junping's point,
> would YETUS-561 provide more detailed information to aid debugging? -C
> 
> On Tue, Oct 24, 2017 at 2:50 PM, Junping Du  wrote:
>> In general, the "solid evidence" of memory leak comes from analysis of 
>> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which 
>> piece of code are leaking memory from the analysis.
>> 
>> Unfortunately, I cannot find any conclusion from previous comments and it 
>> even cannot tell which daemons/components of HDFS consumes unexpected high 
>> memory. Don't sounds like a solid bug report to me.
>> 
>> 
>> 
>> Thanks,?
>> 
>> 
>> Junping
>> 
>> 
>> 
>> From: Sean Busbey 
>> Sent: Tuesday, October 24, 2017 2:20 PM
>> To: Junping Du
>> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; 
>> mapreduce-...@hadoop.apache.org; yarn-dev@hadoop.apache.org
>> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>> 
>> Just curious, Junping what would "solid evidence" look like? Is the 
>> supposition here that the memory leak is within HDFS test code rather than 
>> library runtime code? How would such a distinction be shown?
>> 
>> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du 
>> mailto:j...@hortonworks.com>> wrote:
>> Allen,
>> Do we have any solid evidence to show the HDFS unit tests going through 
>> the roof are due to serious memory leak by HDFS? Normally, I don't expect 
>> memory leak are identified in our UTs - mostly, it (test jvm gone) is just 
>> because of test or deployment issues.
>> Unless there is concrete evidence, my concern on seriously memory leak 
>> for HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, 
>> etc.) have deployed 2.8 on large production environment for months. 
>> Non-serious memory leak (like forgetting to close stream in non-critical 
>> path, etc.) and other non-critical bugs always happens here and there that 
>> we have to live with.
>> 
>> Thanks,
>> 
>> Junping
>> 
>> 
>> From: Allen Wittenauer 
>> mailto:a...@effectivemachines.com>>
>> Sent: Tuesday, October 24, 2017 8:27 AM
>> To: Hadoop Common
>> Cc: Hdfs-dev; 
>> mapreduce-...@hadoop.apache.org; 
>> yarn-dev@hadoop.apache.org
>> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>> 
>>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer 
>>> mailto:a...@effectivemachines.com>> wrote:
>>> 
>>> 
>>> 
>>> With no other information or access to go on, my current hunch is that one 
>>> of the HDFS unit tests is ballooning in memory size.  The easiest way to 
>>> kill a Linux machine is to eat all of the RAM, thanks to overcommit and 
>>> that's what this "feels" like.
>>> 
>>> Someone should verify if 2.8.2 has the same issues before a release goes 
>>> out ...
>> 
>> 
>>FWIW, I ran 2.8.2 last night and it has the same problems.
>> 
>>Also: the node didn't die!  Looking through the workspace (so the 
>> next run will destroy them), two sets of logs stand out:
>> 
>> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>> 
>>and
>> 
>> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
>> 
>>It looks like my hunch is correct:  RAM in the HDFS unit tests are 
>> going through the roof.  It's also interesting how MANY log files there are. 
>>  Is surefire not picking up that jobs are dying?  Maybe not if memory is 
>> getting tight.
>> 
>>Anyway, at the point, branch-2.8 and higher are probably fubar'd. 
>> Additionally, I've filed YETUS-561 so that Yetus-controlled Docker 
>> containers can have their RAM limits set in order to prevent more nodes 
>> going catatonic.
>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: 
>> yarn-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: 
>> yarn-dev-h...@hadoop.apache.org
>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: 
>> common-dev-unsubscr...@hadoop.apache.org

[jira] [Created] (YARN-7390) All reservation related test cases failed when TestYarnClient runs against Fair Scheduler.

2017-10-24 Thread Yufei Gu (JIRA)
Yufei Gu created YARN-7390:
--

 Summary: All reservation related test cases failed when 
TestYarnClient runs against Fair Scheduler.
 Key: YARN-7390
 URL: https://issues.apache.org/jira/browse/YARN-7390
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, reservation system
Affects Versions: 2.9.0, 3.0.0, 3.1.0
Reporter: Yufei Gu
Assignee: Yufei Gu


All reservation related test cases failed when {{TestYarnClient}} runs against 
Fair Scheduler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.8.2 (RC1)

2017-10-24 Thread Eric Payne
+1 (binding)
Thanks a lot, Junping!
I built and installed the source on a 6-node pseudo cluster. I simple sleep and 
streaming jobs that exercised intra-queue and inter-queue preemption, and used 
user weights.
-Eric

  From: Junping Du 
 To: "common-...@hadoop.apache.org" ; 
"hdfs-...@hadoop.apache.org" ; 
"mapreduce-...@hadoop.apache.org" ; 
"yarn-dev@hadoop.apache.org"  
 Sent: Thursday, October 19, 2017 7:43 PM
 Subject: [VOTE] Release Apache Hadoop 2.8.2 (RC1)
   
Hi folks,
    I've created our new release candidate (RC1) for Apache Hadoop 2.8.2.

    Apache Hadoop 2.8.2 is the first stable release of Hadoop 2.8 line and will 
be the latest stable/production release for Apache Hadoop - it includes 315 new 
fixed issues since 2.8.1 and 69 fixes are marked as blocker/critical issues.

      More information about the 2.8.2 release plan can be found here: 
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release

      New RC is available at: 
http://home.apache.org/~junping_du/hadoop-2.8.2-RC1

      The RC tag in git is: release-2.8.2-RC1, and the latest commit id is: 
66c47f2a01ad9637879e95f80c41f798373828fb

      The maven artifacts are available via 
repository.apache.org at: 
https://repository.apache.org/content/repositories/orgapachehadoop-1064

      Please try the release and vote; the vote will run for the usual 5 days, 
ending on 10/24/2017 6pm PST time.

Thanks,

Junping


   

[jira] [Created] (YARN-7389) Make TestResourceManager Scheduler agnostic

2017-10-24 Thread Robert Kanter (JIRA)
Robert Kanter created YARN-7389:
---

 Summary: Make TestResourceManager Scheduler agnostic
 Key: YARN-7389
 URL: https://issues.apache.org/jira/browse/YARN-7389
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.9.0, 3.0.0
Reporter: Robert Kanter
Assignee: Robert Kanter


Many of the tests in {{TestResourceManager}} override the scheduler to always 
be {{CapacityScheduler}}.  However, these tests should be made scheduler 
agnostic (they are testing the RM, not the scheduler).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Chris Douglas
Sean/Junping-

Ignoring the epistemology, it's a problem. Let's figure out what's
causing memory to balloon and then we can work out the appropriate
remedy.

Is this reproducible outside the CI environment? To Junping's point,
would YETUS-561 provide more detailed information to aid debugging? -C

On Tue, Oct 24, 2017 at 2:50 PM, Junping Du  wrote:
> In general, the "solid evidence" of memory leak comes from analysis of 
> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which 
> piece of code are leaking memory from the analysis.
>
> Unfortunately, I cannot find any conclusion from previous comments and it 
> even cannot tell which daemons/components of HDFS consumes unexpected high 
> memory. Don't sounds like a solid bug report to me.
>
>
>
> Thanks,?
>
>
> Junping
>
>
> 
> From: Sean Busbey 
> Sent: Tuesday, October 24, 2017 2:20 PM
> To: Junping Du
> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; 
> mapreduce-...@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>
> Just curious, Junping what would "solid evidence" look like? Is the 
> supposition here that the memory leak is within HDFS test code rather than 
> library runtime code? How would such a distinction be shown?
>
> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du 
> mailto:j...@hortonworks.com>> wrote:
> Allen,
>  Do we have any solid evidence to show the HDFS unit tests going through 
> the roof are due to serious memory leak by HDFS? Normally, I don't expect 
> memory leak are identified in our UTs - mostly, it (test jvm gone) is just 
> because of test or deployment issues.
>  Unless there is concrete evidence, my concern on seriously memory leak 
> for HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, etc.) 
> have deployed 2.8 on large production environment for months. Non-serious 
> memory leak (like forgetting to close stream in non-critical path, etc.) and 
> other non-critical bugs always happens here and there that we have to live 
> with.
>
> Thanks,
>
> Junping
>
> 
> From: Allen Wittenauer 
> mailto:a...@effectivemachines.com>>
> Sent: Tuesday, October 24, 2017 8:27 AM
> To: Hadoop Common
> Cc: Hdfs-dev; 
> mapreduce-...@hadoop.apache.org; 
> yarn-dev@hadoop.apache.org
> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>
>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer 
>> mailto:a...@effectivemachines.com>> wrote:
>>
>>
>>
>> With no other information or access to go on, my current hunch is that one 
>> of the HDFS unit tests is ballooning in memory size.  The easiest way to 
>> kill a Linux machine is to eat all of the RAM, thanks to overcommit and 
>> that's what this "feels" like.
>>
>> Someone should verify if 2.8.2 has the same issues before a release goes out 
>> ...
>
>
> FWIW, I ran 2.8.2 last night and it has the same problems.
>
> Also: the node didn't die!  Looking through the workspace (so the 
> next run will destroy them), two sets of logs stand out:
>
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>
> and
>
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
>
> It looks like my hunch is correct:  RAM in the HDFS unit tests are 
> going through the roof.  It's also interesting how MANY log files there are.  
> Is surefire not picking up that jobs are dying?  Maybe not if memory is 
> getting tight.
>
> Anyway, at the point, branch-2.8 and higher are probably fubar'd. 
> Additionally, I've filed YETUS-561 so that Yetus-controlled Docker containers 
> can have their RAM limits set in order to prevent more nodes going catatonic.
>
>
>
> -
> To unsubscribe, e-mail: 
> yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: 
> yarn-dev-h...@hadoop.apache.org
>
>
>
> -
> To unsubscribe, e-mail: 
> common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: 
> common-dev-h...@hadoop.apache.org
>
>
>
>
> --
> busbey

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Junping Du
In general, the "solid evidence" of memory leak comes from analysis of 
heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which 
piece of code are leaking memory from the analysis.

Unfortunately, I cannot find any conclusion from previous comments and it even 
cannot tell which daemons/components of HDFS consumes unexpected high memory. 
Don't sounds like a solid bug report to me.



Thanks,?


Junping



From: Sean Busbey 
Sent: Tuesday, October 24, 2017 2:20 PM
To: Junping Du
Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; mapreduce-...@hadoop.apache.org; 
yarn-dev@hadoop.apache.org
Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

Just curious, Junping what would "solid evidence" look like? Is the supposition 
here that the memory leak is within HDFS test code rather than library runtime 
code? How would such a distinction be shown?

On Tue, Oct 24, 2017 at 4:06 PM, Junping Du 
mailto:j...@hortonworks.com>> wrote:
Allen,
 Do we have any solid evidence to show the HDFS unit tests going through 
the roof are due to serious memory leak by HDFS? Normally, I don't expect 
memory leak are identified in our UTs - mostly, it (test jvm gone) is just 
because of test or deployment issues.
 Unless there is concrete evidence, my concern on seriously memory leak for 
HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, etc.) have 
deployed 2.8 on large production environment for months. Non-serious memory 
leak (like forgetting to close stream in non-critical path, etc.) and other 
non-critical bugs always happens here and there that we have to live with.

Thanks,

Junping


From: Allen Wittenauer 
mailto:a...@effectivemachines.com>>
Sent: Tuesday, October 24, 2017 8:27 AM
To: Hadoop Common
Cc: Hdfs-dev; 
mapreduce-...@hadoop.apache.org; 
yarn-dev@hadoop.apache.org
Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer 
> mailto:a...@effectivemachines.com>> wrote:
>
>
>
> With no other information or access to go on, my current hunch is that one of 
> the HDFS unit tests is ballooning in memory size.  The easiest way to kill a 
> Linux machine is to eat all of the RAM, thanks to overcommit and that's what 
> this "feels" like.
>
> Someone should verify if 2.8.2 has the same issues before a release goes out 
> ...


FWIW, I ran 2.8.2 last night and it has the same problems.

Also: the node didn't die!  Looking through the workspace (so the next 
run will destroy them), two sets of logs stand out:

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt

and

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/

It looks like my hunch is correct:  RAM in the HDFS unit tests are 
going through the roof.  It's also interesting how MANY log files there are.  
Is surefire not picking up that jobs are dying?  Maybe not if memory is getting 
tight.

Anyway, at the point, branch-2.8 and higher are probably fubar'd. 
Additionally, I've filed YETUS-561 so that Yetus-controlled Docker containers 
can have their RAM limits set in order to prevent more nodes going catatonic.



-
To unsubscribe, e-mail: 
yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 
yarn-dev-h...@hadoop.apache.org



-
To unsubscribe, e-mail: 
common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 
common-dev-h...@hadoop.apache.org




--
busbey


Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Sean Busbey
Just curious, Junping what would "solid evidence" look like? Is the
supposition here that the memory leak is within HDFS test code rather than
library runtime code? How would such a distinction be shown?

On Tue, Oct 24, 2017 at 4:06 PM, Junping Du  wrote:

> Allen,
>  Do we have any solid evidence to show the HDFS unit tests going
> through the roof are due to serious memory leak by HDFS? Normally, I don't
> expect memory leak are identified in our UTs - mostly, it (test jvm gone)
> is just because of test or deployment issues.
>  Unless there is concrete evidence, my concern on seriously memory
> leak for HDFS on 2.8 is relatively low given some companies (Yahoo,
> Alibaba, etc.) have deployed 2.8 on large production environment for
> months. Non-serious memory leak (like forgetting to close stream in
> non-critical path, etc.) and other non-critical bugs always happens here
> and there that we have to live with.
>
> Thanks,
>
> Junping
>
> 
> From: Allen Wittenauer 
> Sent: Tuesday, October 24, 2017 8:27 AM
> To: Hadoop Common
> Cc: Hdfs-dev; mapreduce-...@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>
> > On Oct 23, 2017, at 12:50 PM, Allen Wittenauer 
> wrote:
> >
> >
> >
> > With no other information or access to go on, my current hunch is that
> one of the HDFS unit tests is ballooning in memory size.  The easiest way
> to kill a Linux machine is to eat all of the RAM, thanks to overcommit and
> that’s what this “feels” like.
> >
> > Someone should verify if 2.8.2 has the same issues before a release goes
> out …
>
>
> FWIW, I ran 2.8.2 last night and it has the same problems.
>
> Also: the node didn’t die!  Looking through the workspace (so the
> next run will destroy them), two sets of logs stand out:
>
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-
> linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>
> and
>
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-
> linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
>
> It looks like my hunch is correct:  RAM in the HDFS unit tests are
> going through the roof.  It’s also interesting how MANY log files there
> are.  Is surefire not picking up that jobs are dying?  Maybe not if memory
> is getting tight.
>
> Anyway, at the point, branch-2.8 and higher are probably fubar’d.
> Additionally, I’ve filed YETUS-561 so that Yetus-controlled Docker
> containers can have their RAM limits set in order to prevent more nodes
> going catatonic.
>
>
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>


-- 
busbey


Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Junping Du
Allen,
 Do we have any solid evidence to show the HDFS unit tests going through 
the roof are due to serious memory leak by HDFS? Normally, I don't expect 
memory leak are identified in our UTs - mostly, it (test jvm gone) is just 
because of test or deployment issues. 
 Unless there is concrete evidence, my concern on seriously memory leak for 
HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, etc.) have 
deployed 2.8 on large production environment for months. Non-serious memory 
leak (like forgetting to close stream in non-critical path, etc.) and other 
non-critical bugs always happens here and there that we have to live with.

Thanks,

Junping


From: Allen Wittenauer 
Sent: Tuesday, October 24, 2017 8:27 AM
To: Hadoop Common
Cc: Hdfs-dev; mapreduce-...@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer  
> wrote:
>
>
>
> With no other information or access to go on, my current hunch is that one of 
> the HDFS unit tests is ballooning in memory size.  The easiest way to kill a 
> Linux machine is to eat all of the RAM, thanks to overcommit and that’s what 
> this “feels” like.
>
> Someone should verify if 2.8.2 has the same issues before a release goes out …


FWIW, I ran 2.8.2 last night and it has the same problems.

Also: the node didn’t die!  Looking through the workspace (so the next 
run will destroy them), two sets of logs stand out:

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt

and

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/

It looks like my hunch is correct:  RAM in the HDFS unit tests are 
going through the roof.  It’s also interesting how MANY log files there are.  
Is surefire not picking up that jobs are dying?  Maybe not if memory is getting 
tight.

Anyway, at the point, branch-2.8 and higher are probably fubar’d. 
Additionally, I’ve filed YETUS-561 so that Yetus-controlled Docker containers 
can have their RAM limits set in order to prevent more nodes going catatonic.



-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7388) TestAMRestart should be scheduler agnostic

2017-10-24 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7388:


 Summary: TestAMRestart should be scheduler agnostic
 Key: YARN-7388
 URL: https://issues.apache.org/jira/browse/YARN-7388
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0-alpha4
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7387) org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer fails intermittently

2017-10-24 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7387:


 Summary: 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
 fails intermittently
 Key: YARN-7387
 URL: https://issues.apache.org/jira/browse/YARN-7387
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


{code}
Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 52.481 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
testDecreaseAfterIncreaseWithAllocationExpiration(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer)
  Time elapsed: 13.292 sec  <<< FAILURE!
java.lang.AssertionError: expected:<3072> but was:<4096>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer.testDecreaseAfterIncreaseWithAllocationExpiration(TestIncreaseAllocationExpirer.java:459)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7386) Duplicate Strings in various places in Yarn memory

2017-10-24 Thread Misha Dmitriev (JIRA)
Misha Dmitriev created YARN-7386:


 Summary: Duplicate Strings in various places in Yarn memory
 Key: YARN-7386
 URL: https://issues.apache.org/jira/browse/YARN-7386
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Misha Dmitriev
Assignee: Misha Dmitriev


Using jxray (www.jxray.com) I've analyzed a Yarn RM heap dump obtained in a big 
cluster. The tool uncovered several sources of memory waste. One problem is 
duplicate strings:

{code}
Total strings Unique strings  Duplicate values   Overhead 
 361,506 86,672  5,928  22,886K (7.6%)
{code}

They are spread across a number of locations. The biggest source of waste is 
the following reference chain:

{code}

7,416K (2.5%), 31292 / 62% dup strings (499 unique), 31292 dup backing arrays:
↖{j.u.HashMap}.values
↖org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.environment
↖org.apache.hadoop.yarn.api.records.impl.pb.ApplicationSubmissionContextPBImpl.amContainer
↖org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.submissionContext
↖{java.util.concurrent.ConcurrentHashMap}.values
↖org.apache.hadoop.yarn.server.resourcemanager.RMActiveServiceContext.applications
↖org.apache.hadoop.yarn.server.resourcemanager.RMContextImpl.activeServiceContext
↖org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor.rmContext
↖Java Local@3ed9ef820 
(org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor)
{code}

However, there are also many others. Mostly they are strings in proto buffer or 
proto buffer builder objects. I plan to get rid of at least the worst offenders 
by inserting String.intern() calls. String.intern() used to consume memory in 
PermGen and was not very scalable up until about the early JDK 7 versions, but 
has greatly improved since then, and I've used it many times without any issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.8.2 (RC1)

2017-10-24 Thread Ravi Prakash
Thanks for all your hard work Junping!

* Checked signature.
* Ran a sleep job.
* Checked NN File browser UI works.

+1 (binding)

Cheers
Ravi

On Tue, Oct 24, 2017 at 12:26 PM, Rakesh Radhakrishnan 
wrote:

> Thanks Junping for getting this out.
>
> +1 (non-binding)
>
> * Built from source on CentOS 7.3.1611, jdk1.8.0_111
> * Deployed 3 node cluster
> * Ran some sample jobs
> * Ran balancer
> * Operate HDFS from command line: ls, put, dfsadmin etc
> * HDFS Namenode UI looks good
>
>
> Thanks,
> Rakesh
>
> On Fri, Oct 20, 2017 at 6:12 AM, Junping Du  wrote:
>
> > Hi folks,
> >  I've created our new release candidate (RC1) for Apache Hadoop
> 2.8.2.
> >
> >  Apache Hadoop 2.8.2 is the first stable release of Hadoop 2.8 line
> > and will be the latest stable/production release for Apache Hadoop - it
> > includes 315 new fixed issues since 2.8.1 and 69 fixes are marked as
> > blocker/critical issues.
> >
> >   More information about the 2.8.2 release plan can be found here:
> > https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release
> >
> >   New RC is available at: http://home.apache.org/~
> > junping_du/hadoop-2.8.2-RC1 > du/hadoop-2.8.2-RC0>
> >
> >   The RC tag in git is: release-2.8.2-RC1, and the latest commit id
> > is: 66c47f2a01ad9637879e95f80c41f798373828fb
> >
> >   The maven artifacts are available via repository.apache.org
>  > repository.apache.org/> at: https://repository.apache.org/
> > content/repositories/orgapachehadoop-1064 > repository.apache.org/content/repositories/orgapachehadoop-1062>
> >
> >   Please try the release and vote; the vote will run for the usual 5
> > days, ending on 10/24/2017 6pm PST time.
> >
> > Thanks,
> >
> > Junping
> >
> >
>


Re: [VOTE] Release Apache Hadoop 2.8.2 (RC1)

2017-10-24 Thread Rakesh Radhakrishnan
Thanks Junping for getting this out.

+1 (non-binding)

* Built from source on CentOS 7.3.1611, jdk1.8.0_111
* Deployed 3 node cluster
* Ran some sample jobs
* Ran balancer
* Operate HDFS from command line: ls, put, dfsadmin etc
* HDFS Namenode UI looks good


Thanks,
Rakesh

On Fri, Oct 20, 2017 at 6:12 AM, Junping Du  wrote:

> Hi folks,
>  I've created our new release candidate (RC1) for Apache Hadoop 2.8.2.
>
>  Apache Hadoop 2.8.2 is the first stable release of Hadoop 2.8 line
> and will be the latest stable/production release for Apache Hadoop - it
> includes 315 new fixed issues since 2.8.1 and 69 fixes are marked as
> blocker/critical issues.
>
>   More information about the 2.8.2 release plan can be found here:
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release
>
>   New RC is available at: http://home.apache.org/~
> junping_du/hadoop-2.8.2-RC1 du/hadoop-2.8.2-RC0>
>
>   The RC tag in git is: release-2.8.2-RC1, and the latest commit id
> is: 66c47f2a01ad9637879e95f80c41f798373828fb
>
>   The maven artifacts are available via repository.apache.org repository.apache.org/> at: https://repository.apache.org/
> content/repositories/orgapachehadoop-1064 repository.apache.org/content/repositories/orgapachehadoop-1062>
>
>   Please try the release and vote; the vote will run for the usual 5
> days, ending on 10/24/2017 6pm PST time.
>
> Thanks,
>
> Junping
>
>


Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-10-24 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/

[Oct 23, 2017 4:43:41 PM] (epayne) YARN-4163: Audit getQueueInfo and 
getApplications calls
[Oct 23, 2017 5:47:16 PM] (arp) HDFS-12683. DFSZKFailOverController re-order 
logic for logging
[Oct 23, 2017 6:12:06 PM] (naganarasimha_gr) HADOOP-14966. Handle JDK-8071638 
for hadoop-common. Contributed by Bibin
[Oct 23, 2017 8:20:46 PM] (arp) HDFS-12650. Use slf4j instead of log4j in 
LeaseManager. Contributed by
[Oct 23, 2017 8:51:19 PM] (templedf) YARN-7357. Several methods in
[Oct 23, 2017 10:24:34 PM] (weichiu) HDFS-12249. dfsadmin -metaSave to output 
maintenance mode blocks.
[Oct 23, 2017 10:32:25 PM] (jzhuge) Revert "HADOOP-14954. 
MetricsSystemImpl#init should increment refCount
[Oct 24, 2017 12:56:56 AM] (rkanter) YARN-7320. Duplicate LiteralByteStrings in
[Oct 24, 2017 2:46:09 AM] (yqlin) HDFS-12695. Add a link to HDFS router 
federation document in site.xml.




-1 overall


The following subsystems voted -1:
asflicense unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.security.TestRaceWhenRelogin 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure 
   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.yarn.server.nodemanager.TestNodeStatusUpdater 
   hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler 
   hadoop.yarn.server.resourcemanager.TestRMAdminService 
   hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue 
   hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler 
   hadoop.yarn.server.TestContainerManagerSecurity 

Timed out junit tests :

   
org.apache.hadoop.yarn.client.api.impl.TestOpportunisticContainerAllocationE2E 
   org.apache.hadoop.mapred.pipes.TestPipeApplication 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/diff-compile-javac-root.txt
  [284K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/diff-checkstyle-root.txt
  [17M]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/diff-patch-shelldocs.txt
  [12K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/whitespace-eol.txt
  [8.5M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/whitespace-tabs.txt
  [292K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/diff-javadoc-javadoc-root.txt
  [760K]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
  [148K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [400K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
  [40K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
  [68K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
  [84K]

   asflicense:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/patch-asflicense-problems.txt
  [4.0K]

Powered by Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [VOTE] Merge Resource Types (YARN-3926) to branch-3.0

2017-10-24 Thread Andrew Wang
+0

On Tue, Oct 24, 2017 at 10:56 AM, Daniel Templeton 
wrote:

> I'd like to formally start the voting process for merging the
> resource-types branch into branch-3.0.  The resource-types branch is a
> selective backport of JIRAs that were already merged into trunk in a
> previous merge vote for YARN-3926 (resource types) [1].  For a full
> explanation of the feature, benefits, and risks, see the previous DISCUSS
> thread [2].  The vote will be 7 days, ending Tuesday Oct 31 at 11:00AM PDT.
>
> In summary, resource types adds the ability to declaratively configure new
> resource types in addition to CPU and memory and request them when
> submitting resource requests.  The resource-types branch currently
> represents 32 patches from trunk drawn from the resource types umbrella
> JIRAs: YARN-3926 [3] and YARN-7069 [4].
>
> Key points:
> * If no additional resource types are configured, the user experience with
> YARN remains unchanged.
> * Performance is the primary risk. We have been closely watching the
> performance impact of adding resource types, and according to current
> measurements the impact is trivial.
> * This merge vote is for resource types excluding the resource profiles
> feature which was included in the original merge vote [1].
> * Documentation is available in trunk via YARN-7056 [5] with improvements
> pending review in YARN-7369 [6].
>
> Refreshed performance numbers on the resource-types branch are pending,
> and I'll post them to this thread as soon as they're ready.
>
> Thanks!
> Daniel
>
> [1] http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201
> 708.mbox/%3CCAD++eCm6xSs4_kXP4Audf85_rGg4pZxKuOx7u2VP8tfzmY4
> p...@mail.gmail.com%3E
> [2] http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201
> 710.mbox/%3Caa2bcc6d-9d88-459d-63f4-5bb43e31f4f4%40cloudera.com%3E
> [3] https://issues.apache.org/jira/browse/YARN-3926
> [4] https://issues.apache.org/jira/browse/YARN-7069
> [5] https://issues.apache.org/jira/browse/YARN-7056
> [6] https://issues.apache.org/jira/browse/YARN-7369
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


[VOTE] Merge Resource Types (YARN-3926) to branch-3.0

2017-10-24 Thread Daniel Templeton
I'd like to formally start the voting process for merging the 
resource-types branch into branch-3.0.  The resource-types branch is a 
selective backport of JIRAs that were already merged into trunk in a 
previous merge vote for YARN-3926 (resource types) [1].  For a full 
explanation of the feature, benefits, and risks, see the previous 
DISCUSS thread [2].  The vote will be 7 days, ending Tuesday Oct 31 at 
11:00AM PDT.


In summary, resource types adds the ability to declaratively configure 
new resource types in addition to CPU and memory and request them when 
submitting resource requests.  The resource-types branch currently 
represents 32 patches from trunk drawn from the resource types umbrella 
JIRAs: YARN-3926 [3] and YARN-7069 [4].


Key points:
* If no additional resource types are configured, the user experience 
with YARN remains unchanged.
* Performance is the primary risk. We have been closely watching the 
performance impact of adding resource types, and according to current 
measurements the impact is trivial.
* This merge vote is for resource types excluding the resource profiles 
feature which was included in the original merge vote [1].
* Documentation is available in trunk via YARN-7056 [5] with 
improvements pending review in YARN-7369 [6].


Refreshed performance numbers on the resource-types branch are pending, 
and I'll post them to this thread as soon as they're ready.


Thanks!
Daniel

[1] 
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201708.mbox/%3ccad++ecm6xss4_kxp4audf85_rgg4pzxkuox7u2vp8tfzmy4...@mail.gmail.com%3E
[2] 
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201710.mbox/%3Caa2bcc6d-9d88-459d-63f4-5bb43e31f4f4%40cloudera.com%3E

[3] https://issues.apache.org/jira/browse/YARN-3926
[4] https://issues.apache.org/jira/browse/YARN-7069
[5] https://issues.apache.org/jira/browse/YARN-7056
[6] https://issues.apache.org/jira/browse/YARN-7369

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.8.2 (RC1)

2017-10-24 Thread Bibinchundatt
+1 (non-binding)

- Build from source 1.8.0_111
- Deployed on 3 node secure setup
- Ran few mapreduce jobs with multiple users.
- Verified basic resource localization
- Failover of Resource Manager.
- Log aggregation verification
- Sanity check of JHS

Thanks
Bibin
--
Bibin A Chundatt Bibin A Chundatt
M: +91-9742095715
E: bibin.chund...@huawei.com
2012实验室-印研IT&Cloud BU分部
2012 Laboratories-IT&Cloud BU Branch Dept.

> On 20/10/17, 6:12 AM, "Junping Du"  wrote:
>
> >Hi folks,
> > I've created our new release candidate (RC1) for Apache Hadoop 2.8.2.
> >
> > Apache Hadoop 2.8.2 is the first stable release of Hadoop 2.8 line
> and will be the latest stable/production release for Apache Hadoop - it
> includes 315 new fixed issues since 2.8.1 and 69 fixes are marked as
> blocker/critical issues.
> >
> >  More information about the 2.8.2 release plan can be found here:
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release
> >
> >  New RC is available at: http://home.apache.org/~
> junping_du/hadoop-2.8.2-RC1 du/hadoop-2.8.2-RC0>
> >
> >  The RC tag in git is: release-2.8.2-RC1, and the latest commit id
> is: 66c47f2a01ad9637879e95f80c41f798373828fb
> >
> >  The maven artifacts are available via repository.apache.org repository.apache.org/> at: https://repository.apache.org/
> content/repositories/orgapachehadoop-1064 repository.apache.org/content/repositories/orgapachehadoop-1062>
> >
> >  Please try the release and vote; the vote will run for the usual 5
> days, ending on 10/24/2017 6pm PST time.
> >
> >Thanks,
> >
> >Junping
> >
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


[jira] [Created] (YARN-7385) TestFairScheduler#testUpdateDemand and TestFSLeafQueue#testUpdateDemand are failing with NPE

2017-10-24 Thread Robert Kanter (JIRA)
Robert Kanter created YARN-7385:
---

 Summary: TestFairScheduler#testUpdateDemand and 
TestFSLeafQueue#testUpdateDemand are failing with NPE
 Key: YARN-7385
 URL: https://issues.apache.org/jira/browse/YARN-7385
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.9.0, 3.0.0
Reporter: Robert Kanter
Assignee: Robert Kanter


{{TestFairScheduler#testUpdateDemand}} and {{TestFSLeafQueue#testUpdateDemand}} 
are failing with NPE:

{noformat}
java.lang.NullPointerException: null
at 
org.apache.hadoop.yarn.util.resource.Resources.addTo(Resources.java:180)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.incUsedResource(FSQueue.java:494)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.addApp(FSLeafQueue.java:92)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testUpdateDemand(TestFairScheduler.java:5264)
 Standard Output84 ms
{noformat}

{noformat}
java.lang.NullPointerException: null
at 
org.apache.hadoop.yarn.util.resource.Resources.addTo(Resources.java:180)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.incUsedResource(FSQueue.java:494)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.addApp(FSLeafQueue.java:92)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue.testUpdateDemand(TestFSLeafQueue.java:92)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.8.2 (RC1)

2017-10-24 Thread Eric Badger
+1 (non-binding)

- Verified all hashes and checksums
- Built from source on macOS 10.12.6, Java 1.8.0u65
- Deployed a pseudo cluster
- Ran some example jobs

Thanks,

Eric

On Tue, Oct 24, 2017 at 12:59 AM, Mukul Kumar Singh 
wrote:

> Thanks Junping,
>
> +1 (non-binding)
>
> I built from source on Mac OS X 10.12.6 Java 1.8.0_111
>
> - Deployed on a single node cluster.
> - Deployed a ViewFS cluster with two hdfs mount points.
> - Performed basic sanity checks.
> - Performed basic DFS operations.
>
> Thanks,
> Mukul
>
>
>
>
>
>
> On 20/10/17, 6:12 AM, "Junping Du"  wrote:
>
> >Hi folks,
> > I've created our new release candidate (RC1) for Apache Hadoop 2.8.2.
> >
> > Apache Hadoop 2.8.2 is the first stable release of Hadoop 2.8 line
> and will be the latest stable/production release for Apache Hadoop - it
> includes 315 new fixed issues since 2.8.1 and 69 fixes are marked as
> blocker/critical issues.
> >
> >  More information about the 2.8.2 release plan can be found here:
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release
> >
> >  New RC is available at: http://home.apache.org/~
> junping_du/hadoop-2.8.2-RC1 du/hadoop-2.8.2-RC0>
> >
> >  The RC tag in git is: release-2.8.2-RC1, and the latest commit id
> is: 66c47f2a01ad9637879e95f80c41f798373828fb
> >
> >  The maven artifacts are available via repository.apache.org repository.apache.org/> at: https://repository.apache.org/
> content/repositories/orgapachehadoop-1064 repository.apache.org/content/repositories/orgapachehadoop-1062>
> >
> >  Please try the release and vote; the vote will run for the usual 5
> days, ending on 10/24/2017 6pm PST time.
> >
> >Thanks,
> >
> >Junping
> >
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Allen Wittenauer

> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer  
> wrote:
> 
> 
> 
> With no other information or access to go on, my current hunch is that one of 
> the HDFS unit tests is ballooning in memory size.  The easiest way to kill a 
> Linux machine is to eat all of the RAM, thanks to overcommit and that’s what 
> this “feels” like.
> 
> Someone should verify if 2.8.2 has the same issues before a release goes out …


FWIW, I ran 2.8.2 last night and it has the same problems.

Also: the node didn’t die!  Looking through the workspace (so the next 
run will destroy them), two sets of logs stand out:

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt

and

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/

It looks like my hunch is correct:  RAM in the HDFS unit tests are 
going through the roof.  It’s also interesting how MANY log files there are.  
Is surefire not picking up that jobs are dying?  Maybe not if memory is getting 
tight. 

Anyway, at the point, branch-2.8 and higher are probably fubar’d. 
Additionally, I’ve filed YETUS-561 so that Yetus-controlled Docker containers 
can have their RAM limits set in order to prevent more nodes going catatonic.



-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/10/

[Oct 23, 2017 4:58:04 PM] (epayne) YARN-4163: Audit getQueueInfo and 
getApplications calls
[Oct 23, 2017 5:47:35 PM] (arp) HDFS-12683. DFSZKFailOverController re-order 
logic for logging
[Oct 23, 2017 6:17:33 PM] (naganarasimha_gr) HADOOP-14966. Handle JDK-8071638 
for hadoop-common. Contributed by Bibin
[Oct 23, 2017 8:52:15 PM] (templedf) YARN-7357. Several methods in
[Oct 24, 2017 2:50:39 AM] (yqlin) HDFS-12695. Add a link to HDFS router 
federation document in site.xml.


[Error replacing 'FILE' - Workspace is not accessible]

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-7365) ResourceLocalization cache cleanup thread stuck

2017-10-24 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R resolved YARN-7365.
-
   Resolution: Fixed
Fix Version/s: 3.1.0
   3.0.0
   2.9.0

> ResourceLocalization cache cleanup thread stuck
> ---
>
> Key: YARN-7365
> URL: https://issues.apache.org/jira/browse/YARN-7365
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
>
> {code}
> "ResourceLocalizationService Cache Cleanup" #36 prio=5 os_prio=0 
> tid=0x7f943562a000 nid=0x1017 waiting on condition [0x7f9419bd7000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0xc21103f8> (a 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
> at java.util.concurrent.FutureTask.get(FutureTask.java:191)
> at 
> org.apache.hadoop.util.concurrent.ExecutorHelper.logThrowableFromAfterExecute(ExecutorHelper.java:47)
> at 
> org.apache.hadoop.util.concurrent.HadoopScheduledThreadPoolExecutor.afterExecute(HadoopScheduledThreadPoolExecutor.java:69)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1150)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> ResourceLocalization Cache Clean Up thread waiting on {{FutureTask.get()}} 
> for infinite time after first execution



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org