[jira] [Created] (YARN-5487) Can't kill nodemanager process when running in the foreground

2016-08-08 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created YARN-5487:
--

 Summary: Can't kill nodemanager process when running in the 
foreground
 Key: YARN-5487
 URL: https://issues.apache.org/jira/browse/YARN-5487
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer


It looks like there is a misconfigured signal handler somewhere in the mix.  
Hitting ctrl-c results in this message:

16/08/08 20:26:37 ERROR nodemanager.NodeManager: RECEIVED SIGNAL 2: SIGINT

... which is then summarily ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] Release numbering semantics with concurrent (>2) releases [Was Setting JIRA fix versions for 3.0.0 releases]

2016-08-08 Thread Junping Du
I think that incompatible API between 3.0.0-alpha and 3.1.0-beta is something 
less confusing than incompatible between 2.8/2.9 and 2.98.x alphas/2.99.x betas.
Why not just follow our previous practice in the beginning of branch-2? we can 
have 3.0.0-alpha, 3.1.0-alpha/beta, but once when we are finalizing our APIs, 
we should bump up trunk version to 4.x for landing new incompatible changes.

Thanks,

Junping

From: Karthik Kambatla 
Sent: Monday, August 08, 2016 6:54 PM
Cc: common-...@hadoop.apache.org; yarn-dev@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org
Subject: Re: [DISCUSS] Release numbering semantics with concurrent (>2) 
releases [Was Setting JIRA fix versions for 3.0.0 releases]

I like the 3.0.0-alphaX approach primarily for simpler understanding of
compatibility guarantees. Calling 3.0.0 alpha and 3.1.0 beta is confusing
because, it is not immediately clear that 3.0.0 and 3.1.0 could be
incompatible in APIs.

I am open to something like 2.98.x for alphas and 2.99.x for betas leading
to a 3.0.0 GA. I have seen other projects use this without causing much
confusion.

On Thu, Aug 4, 2016 at 6:01 PM, Konstantin Shvachko 
wrote:

> On Thu, Aug 4, 2016 at 11:20 AM, Andrew Wang 
> wrote:
>
> > Hi Konst, thanks for commenting,
> >
> > On Wed, Aug 3, 2016 at 11:29 PM, Konstantin Shvachko <
> shv.had...@gmail.com
> > > wrote:
> >
> >> 1. I probably missed something but I didn't get it how "alpha"s made
> >> their way into release numbers again. This was discussed on several
> >> occasions and I thought the common perception was to use just three
> level
> >> numbers for release versioning and avoid branding them.
> >> It is particularly confusing to have 3.0.0-alpha1 and 3.0.0-alpha2. What
> >> is alphaX - fourth level? I think releasing 3.0.0 and setting trunk to
> >> 3.1.0 would be perfectly in line with our current release practices.
> >>
> >
> > We discussed release numbering a while ago when discussing the release
> > plan for 3.0.0, and agreed on this scheme. "-alphaX" is essentially a
> > fourth level as you say, but the intent is to only use it (and "-betaX")
> in
> > the leadup to 3.0.0.
> >
> > The goal here is clarity for end users, since most other enterprise
> > software uses a a.0.0 version to denote the GA of a new major version.
> Same
> > for a.b.0 for a new minor version, though we haven't talked about that
> yet.
> > The alphaX and betaX scheme also shares similarity to release versioning
> of
> > other enterprise software.
> >
>
> As you remember we did this (alpha, beta) for Hadoop-2 and I don't think it
> went well with user perception.
> Say release 2.0.5-alpha turned out to be quite good even though still
> branded "alpha", while 2.2 was not and not branded.
> We should move a release to stable, when people ran it and agree it is GA
> worthy. Otherwise you never know.
>
>
> >
> >> 2. I do not see any confusions with releasing 2.8.0 after 3.0.0.
> >> The release number is not intended to reflect historical release
> >> sequence, but rather the point in the source tree, which it was branched
> >> off. So one can release 2.8, 2.9, etc. after or before 3.0.
> >>
> >
> > As described earlier in this thread, the issue here is setting the fix
> > versions such that the changelog is a useful diff from a previous
> version,
> > and also clear about what changes are present in each branch. If we do
> not
> > order a specific 2.x before 3.0, then we don't know what 2.x to diff
> from.
> >
>
> So the problem is in determining the latest commit, which was not present
> in the last release, when the last release bears higher number than the one
> being released.
> Interesting problem. Don't have a strong opinion on that. I guess it's OK
> to have overlapping in changelogs.
> As long as we keep following the rule that commits should be made to trunk
> first and them propagated to lower branches until the target branch is
> reached.
>
>
> >
> >> 3. I agree that current 3.0.0 branch can be dropped and re-cut. We may
> >> think of another rule that if a release branch is not released in 3
> month
> >> it should be abandoned. Which is applicable to branch 2.8.0 and it is
> too
> >> much work syncing it with branch-2.
> >>
> >> Time-based rules are tough here. I'd prefer we continue to leave this up
> > to release managers. If you think we should recut branch-2.8, recommend
> > pinging Vinod and discussing on a new thread.
> >
>
> Not recut, but abandon 2.8.0. And Vinod (or anybody who volunteers to RM)
> can recut  from the desired point.
> People were committing to branch-2 and branch-2.8 for months. And they are
> out of sync anyways. So what's the point of the extra commit.
> Probably still a different thread.
>
> Thanks,
> --Konst
>

-
To unsubscribe, e-mail: 

[jira] [Created] (YARN-5486) Update OpportunisticConatinerAllocatioAMService allocate method to handle OPPORTUNISTIC container requests

2016-08-08 Thread Arun Suresh (JIRA)
Arun Suresh created YARN-5486:
-

 Summary: Update OpportunisticConatinerAllocatioAMService allocate 
method to handle OPPORTUNISTIC container requests
 Key: YARN-5486
 URL: https://issues.apache.org/jira/browse/YARN-5486
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun Suresh
Assignee: Arun Suresh


YARN-5457 refactors the Distributed Scheduling framework to move the container 
allocator to yarn-server-common.

This JIRA proposes to update the allocate method in the new AM service to use 
the OpportunisticContainerAllocator to allocate opportunistic containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] Release numbering semantics with concurrent (>2) releases [Was Setting JIRA fix versions for 3.0.0 releases]

2016-08-08 Thread Karthik Kambatla
I like the 3.0.0-alphaX approach primarily for simpler understanding of
compatibility guarantees. Calling 3.0.0 alpha and 3.1.0 beta is confusing
because, it is not immediately clear that 3.0.0 and 3.1.0 could be
incompatible in APIs.

I am open to something like 2.98.x for alphas and 2.99.x for betas leading
to a 3.0.0 GA. I have seen other projects use this without causing much
confusion.

On Thu, Aug 4, 2016 at 6:01 PM, Konstantin Shvachko 
wrote:

> On Thu, Aug 4, 2016 at 11:20 AM, Andrew Wang 
> wrote:
>
> > Hi Konst, thanks for commenting,
> >
> > On Wed, Aug 3, 2016 at 11:29 PM, Konstantin Shvachko <
> shv.had...@gmail.com
> > > wrote:
> >
> >> 1. I probably missed something but I didn't get it how "alpha"s made
> >> their way into release numbers again. This was discussed on several
> >> occasions and I thought the common perception was to use just three
> level
> >> numbers for release versioning and avoid branding them.
> >> It is particularly confusing to have 3.0.0-alpha1 and 3.0.0-alpha2. What
> >> is alphaX - fourth level? I think releasing 3.0.0 and setting trunk to
> >> 3.1.0 would be perfectly in line with our current release practices.
> >>
> >
> > We discussed release numbering a while ago when discussing the release
> > plan for 3.0.0, and agreed on this scheme. "-alphaX" is essentially a
> > fourth level as you say, but the intent is to only use it (and "-betaX")
> in
> > the leadup to 3.0.0.
> >
> > The goal here is clarity for end users, since most other enterprise
> > software uses a a.0.0 version to denote the GA of a new major version.
> Same
> > for a.b.0 for a new minor version, though we haven't talked about that
> yet.
> > The alphaX and betaX scheme also shares similarity to release versioning
> of
> > other enterprise software.
> >
>
> As you remember we did this (alpha, beta) for Hadoop-2 and I don't think it
> went well with user perception.
> Say release 2.0.5-alpha turned out to be quite good even though still
> branded "alpha", while 2.2 was not and not branded.
> We should move a release to stable, when people ran it and agree it is GA
> worthy. Otherwise you never know.
>
>
> >
> >> 2. I do not see any confusions with releasing 2.8.0 after 3.0.0.
> >> The release number is not intended to reflect historical release
> >> sequence, but rather the point in the source tree, which it was branched
> >> off. So one can release 2.8, 2.9, etc. after or before 3.0.
> >>
> >
> > As described earlier in this thread, the issue here is setting the fix
> > versions such that the changelog is a useful diff from a previous
> version,
> > and also clear about what changes are present in each branch. If we do
> not
> > order a specific 2.x before 3.0, then we don't know what 2.x to diff
> from.
> >
>
> So the problem is in determining the latest commit, which was not present
> in the last release, when the last release bears higher number than the one
> being released.
> Interesting problem. Don't have a strong opinion on that. I guess it's OK
> to have overlapping in changelogs.
> As long as we keep following the rule that commits should be made to trunk
> first and them propagated to lower branches until the target branch is
> reached.
>
>
> >
> >> 3. I agree that current 3.0.0 branch can be dropped and re-cut. We may
> >> think of another rule that if a release branch is not released in 3
> month
> >> it should be abandoned. Which is applicable to branch 2.8.0 and it is
> too
> >> much work syncing it with branch-2.
> >>
> >> Time-based rules are tough here. I'd prefer we continue to leave this up
> > to release managers. If you think we should recut branch-2.8, recommend
> > pinging Vinod and discussing on a new thread.
> >
>
> Not recut, but abandon 2.8.0. And Vinod (or anybody who volunteers to RM)
> can recut  from the desired point.
> People were committing to branch-2 and branch-2.8 for months. And they are
> out of sync anyways. So what's the point of the extra commit.
> Probably still a different thread.
>
> Thanks,
> --Konst
>


[jira] [Resolved] (YARN-5484) YARN client will still retry for many times on failover even though RM server throw AccessControlException

2016-08-08 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R resolved YARN-5484.
-
Resolution: Duplicate

Yes guys its duplicate and we were discussing the solution and planned to post 
the same

> YARN client will  still retry for many times on failover even though RM 
> server throw AccessControlException
> ---
>
> Key: YARN-5484
> URL: https://issues.apache.org/jira/browse/YARN-5484
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bob.zhao
>
> 1. Enable yarn.acl.enable
> 2. Set up some queue for users on yarn, for example queue1 only for user1
> 3. Using user1 to submit app to queue1, it can run successfully.
> 4. Change user2 to submit app to queue1, it will not be permit submit to 
> queue1.
> So, At RM server side, it will throw 
> IOException(ClientRMService.java#submitApplication), which is the parent  
> implementation of the 
> AccessControlException(RMAppManager.java#createAndPopulateNewRMApp), this 
> IOException will be threw to client that cause yarn client  frequently 
> failover for many times. 
> We'b better avoid this behavior that if client got permission deny from 
> server, it should  try once and exit, no need to retry.
> This issue was introduced by YARN-4522. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2016-08-08 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/

[Aug 7, 2016 9:29:26 PM] (shv) HDFS-10693. metaSave should print blocks, not 
LightWeightHashSet.




-1 overall


The following subsystems voted -1:
asflicense unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.tracing.TestTracing 
   hadoop.security.TestRefreshUserMappings 
   hadoop.yarn.logaggregation.TestAggregatedLogFormat 
   
hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager
 
   hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices 
   hadoop.yarn.server.TestMiniYarnClusterNodeUtilization 
   hadoop.yarn.server.TestContainerManagerSecurity 
   hadoop.yarn.client.api.impl.TestYarnClient 
   hadoop.mapreduce.v2.hs.server.TestHSAdminServer 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/diff-compile-javac-root.txt
  [172K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/diff-checkstyle-root.txt
  [16M]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/diff-patch-pylint.txt
  [16K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/diff-patch-shelldocs.txt
  [16K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/whitespace-eol.txt
  [12M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/whitespace-tabs.txt
  [1.3M]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/diff-javadoc-javadoc-root.txt
  [2.2M]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [312K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
  [24K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
  [36K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
  [268K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-nativetask.txt
  [124K]

   asflicense:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/127/artifact/out/patch-asflicense-problems.txt
  [4.0K]

Powered by Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org



-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-5485) YARN client will still retry for many times on failover even though RM server throw AccessControlException(IOException)

2016-08-08 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G resolved YARN-5485.
---
Resolution: Duplicate

Dup of YARN-5484 ?

> YARN client will  still retry for many times on failover even though RM 
> server throw AccessControlException(IOException)
> 
>
> Key: YARN-5485
> URL: https://issues.apache.org/jira/browse/YARN-5485
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bob.zhao
>
> Issue reproduced steps
> 1. Enable yarn.acl.enable
> 2. Set up some queue for users on yarn, for example queue1 onle for user1
> 3. Using user1 to submit app to queue1, it can run successfully.
> 4. Change user2 to submit app to queue1, it will not be permit submit to 
> queue1.
> So, At RM server side, it will throw 
> IOException(ClientRMService.java#submitApplication), which is the parent  
> implementation of the 
> AccessControlException(RMAppManager.java#createAndPopulateNewRMApp), this 
> IOException will be threw to client that cause yarn client  frequently 
> failover for many times. 
> We'b better avoid this behavior that if client got permission deny from 
> server, it should  try once and exit, no need to retry.
> This issue was introduced by YARN-4522. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5485) YARN client will still retry for many times on failover even though RM server throw AccessControlException

2016-08-08 Thread Bob.zhao (JIRA)
Bob.zhao created YARN-5485:
--

 Summary: YARN client will  still retry for many times on failover 
even though RM server throw AccessControlException
 Key: YARN-5485
 URL: https://issues.apache.org/jira/browse/YARN-5485
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.9.0
Reporter: Bob.zhao


1. Enable yarn.acl.enable
2. Set up some queue for users on yarn, for example queue1 onle for user1
3. Using user1 to submit app to queue1, it can run successfully.
4. Change user2 to submit app to queue1, it will not be permit submit to queue1.
So, At RM server side, it will throw 
IOException(ClientRMService.java#submitApplication), which is the parent  
implementation of the 
AccessControlException(RMAppManager.java#createAndPopulateNewRMApp), this 
IOException will be threw to client that cause yarn client  frequently failover 
for many times. 
We'b better avoid this behavior that if client got permission deny from server, 
it should  try once and exit, no need to retry.
This issue was introduced by YARN-4522. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5484) YARN client will still retry for many times on failover even though RM server throw AccessControlException

2016-08-08 Thread Bob.zhao (JIRA)
Bob.zhao created YARN-5484:
--

 Summary: YARN client will  still retry for many times on failover 
even though RM server throw AccessControlException
 Key: YARN-5484
 URL: https://issues.apache.org/jira/browse/YARN-5484
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.9.0
Reporter: Bob.zhao


1. Enable yarn.acl.enable
2. Set up some queue for users on yarn, for example queue1 onle for user1
3. Using user1 to submit app to queue1, it can run successfully.
4. Change user2 to submit app to queue1, it will not be permit submit to queue1.
So, At RM server side, it will throw 
IOException(ClientRMService.java#submitApplication), which is the parent  
implementation of the 
AccessControlException(RMAppManager.java#createAndPopulateNewRMApp), this 
IOException will be threw to client that cause yarn client  frequently failover 
for many times. 
We'b better avoid this behavior that if client got permission deny from server, 
it should  try once and exit, no need to retry.
This issue was introduced by YARN-4522. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5483) Optimize RMAppAttempt#pullJustFinishedContainers

2016-08-08 Thread sandflee (JIRA)
sandflee created YARN-5483:
--

 Summary: Optimize RMAppAttempt#pullJustFinishedContainers
 Key: YARN-5483
 URL: https://issues.apache.org/jira/browse/YARN-5483
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: sandflee
Assignee: sandflee


about 1000 app running on cluster, jprofiler found pullJustFinishedContainers 
cost too much cpu.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5482) ContainerMetric Lead to memory leaks

2016-08-08 Thread tangshangwen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tangshangwen resolved YARN-5482.

Resolution: Duplicate

> ContainerMetric Lead to memory leaks
> 
>
> Key: YARN-5482
> URL: https://issues.apache.org/jira/browse/YARN-5482
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
> Attachments: oom1.png, oom2.png
>
>
> In our cluster, I often find NodeManager OOM, I dump the heap file and found 
> ContainerMetric takes up a lot of memory
> {code}
> export YARN_NODEMANAGER_OPTS="-Xmx2g -Xms2g -Xmn1g -XX:PermSize=128M 
> -XX:MaxPermSize=128M -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=/data1/yarn-logs/nm_dump.log -Dcom.sun.management.jmxremote 
> -Xloggc:/data1/yarn-logs/nm_gc.log -verbose:gc -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintGCApplicationConcurrentTime -XX:+PrintTenuringDistribution 
> -XX:ErrorFile=/data1/yarn-logs/nm_err_pid"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5482) ContainerMetric Lead to memory leaks

2016-08-08 Thread tangshangwen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tangshangwen resolved YARN-5482.

Resolution: Fixed

> ContainerMetric Lead to memory leaks
> 
>
> Key: YARN-5482
> URL: https://issues.apache.org/jira/browse/YARN-5482
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
> Attachments: oom1.png, oom2.png
>
>
> In our cluster, I often find NodeManager OOM, I dump the heap file and found 
> ContainerMetric takes up a lot of memory
> {code}
> export YARN_NODEMANAGER_OPTS="-Xmx2g -Xms2g -Xmn1g -XX:PermSize=128M 
> -XX:MaxPermSize=128M -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=/data1/yarn-logs/nm_dump.log -Dcom.sun.management.jmxremote 
> -Xloggc:/data1/yarn-logs/nm_gc.log -verbose:gc -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintGCApplicationConcurrentTime -XX:+PrintTenuringDistribution 
> -XX:ErrorFile=/data1/yarn-logs/nm_err_pid"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5482) ContainerMetric Lead to memory leaks

2016-08-08 Thread tangshangwen (JIRA)
tangshangwen created YARN-5482:
--

 Summary: ContainerMetric Lead to memory leaks
 Key: YARN-5482
 URL: https://issues.apache.org/jira/browse/YARN-5482
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: tangshangwen
Assignee: tangshangwen


In our cluster, I often find NodeManager OOM, I dump the heap file and found 
ContainerMetric takes up a lot of memory
{code}
export YARN_NODEMANAGER_OPTS="-Xmx2g -Xms2g -Xmn1g -XX:PermSize=128M 
-XX:MaxPermSize=128M -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/data1/yarn-logs/nm_dump.log -Dcom.sun.management.jmxremote 
-Xloggc:/data1/yarn-logs/nm_gc.log -verbose:gc -XX:+PrintGCDetails 
-XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime 
-XX:+PrintGCApplicationConcurrentTime -XX:+PrintTenuringDistribution 
-XX:ErrorFile=/data1/yarn-logs/nm_err_pid"
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org