[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-10-08 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948192#comment-14948192
 ] 

Devaraj K commented on YARN-3964:
-

Thanks [~leftnoteasy] for review and confirmation, [~Naganarasimha] and 
[~sunilg] for reviews. 

Thanks [~dian.fu] for the patch, It mostly looks good to me except these minor 
comments.

1. Can you update the descriptions for the new configs added in yarn-default.xml

{code:xml}
+The class to use as the node labels fetcher by ResourceManager. It should
+extend org.apache.hadoop.yarn.server.resourcemanager.nodelabels.
+RMNodeLabelsMappingProvider.
{code}

Can you update the description like below,
'When node labels "yarn.node-labels.configuration-type" is
of type "delegated-centralized", Administrators can configure 
the class for fetching node labels by ResourceManager. Configured
class needs to extend

org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsMappingProvider.'

{code:xml}
+The interval to use to update node labels by ResourceManager.
{code}

Can we think of having it like 'This interval is used to update the node labels 
by ResourceManager.'? And also can we describe here that if the value is '-1' 
then there will not be any timer task gets created.

2. In TestRMDelegatedNodeLabelsUpdater.java, can we have an assertion in catch 
block to check the expected exception message?

   {code:xml}
} catch (Exception e) {
  // expected
}
   {code}

3. Can you file a Jira to update the documentation for this?


> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, 
> YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, 
> YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-10-08 Thread Dian Fu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated YARN-3964:
--
Attachment: YARN-3964.016.patch

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, 
> YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, 
> YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.016.patch, 
> YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4240) Add documentation for delegated-centralized node labels feature

2015-10-08 Thread Dian Fu (JIRA)
Dian Fu created YARN-4240:
-

 Summary: Add documentation for delegated-centralized node labels 
feature
 Key: YARN-4240
 URL: https://issues.apache.org/jira/browse/YARN-4240
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Dian Fu
Assignee: Dian Fu


As a follow up of YARN-3964, we should add documentation for 
delegated-centralized node labels feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-10-08 Thread Dian Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948280#comment-14948280
 ] 

Dian Fu commented on YARN-3964:
---

Hi [~devaraj.k],
Thanks a lot for your review. Updated the patch accordingly. Have also created 
ticket YARN-4240 for the documentation.

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, 
> YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, 
> YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.016.patch, 
> YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]

2015-10-08 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948327#comment-14948327
 ] 

Rohith Sharma K S commented on YARN-3771:
-

+1 for the fixing security hole. One concern about the patch is backward 
compatibility since array string is changed to List. If any clients are using 
this default constant, it would cause compilation error to them. 

I would like to hear comments from the other folks for doing this change.

> "final" behavior is not honored for 
> YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH  since it is a String[]
> 
>
> Key: YARN-3771
> URL: https://issues.apache.org/jira/browse/YARN-3771
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Attachments: 0001-YARN-3771.patch
>
>
> i was going through some find bugs rules. One issue reported in that is 
>  public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = {
> and 
>   public static final String[] 
> DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH=
> is not honoring the final qualifier. The string array contents can be re 
> assigned !
> Simple test
> {code}
> public class TestClass {
>   static final String[] t = { "1", "2" };
>   public static void main(String[] args) {
> System.out.println(12 < 10);
> String[] t1={"u"};
> //t = t1; // this will show compilation  error
> t (1) = t1 (1) ; // But this works
>   }
> }
> {code}
> One option is to use Collections.unmodifiableList
> any thoughts ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user

2015-10-08 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948420#comment-14948420
 ] 

Rohith Sharma K S commented on YARN-4235:
-

+1 lgtm

> FairScheduler PrimaryGroup does not handle empty groups returned for a user 
> 
>
> Key: YARN-4235
> URL: https://issues.apache.org/jira/browse/YARN-4235
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-4235.001.patch
>
>
> We see NPE if empty groups are returned for a user. This causes a NPE and 
> cause RM to crash as below
> {noformat}
> 2015-09-22 16:51:52,780  FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ADDED to the scheduler
> java.lang.IndexOutOfBoundsException: Index: 0
>   at java.util.Collections$EmptyList.get(Collections.java:3212)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-09-22 16:51:52,797  INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-10-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948435#comment-14948435
 ] 

Hadoop QA commented on YARN-3964:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  22m 28s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   8m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 13s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m  6s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  4s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 41s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m  2s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 27s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  8s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  51m  0s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 106m 23s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions
 |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler |
|   | hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
|   | hadoop.yarn.server.resourcemanager.TestResourceManager |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
|   | hadoop.yarn.server.resourcemanager.TestRMAdminService |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765542/YARN-3964.016.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 1107bd3 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9377/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9377/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9377/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9377/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9377/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9377/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9377/console |


This message was automatically generated.

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, 

[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-10-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948587#comment-14948587
 ] 

Hadoop QA commented on YARN-3964:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 15s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   8m  6s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 27s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 19s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m  0s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  5s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 42s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  7s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  56m 52s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 107m 57s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765542/YARN-3964.016.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 1107bd3 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9378/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9378/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9378/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9378/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9378/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9378/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9378/console |


This message was automatically generated.

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, 
> YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, 
> YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.016.patch, 
> YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-08 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948709#comment-14948709
 ] 

MENG DING commented on YARN-1509:
-

Hi, [~bikassaha]

Thanks a lot for the valuable comments!

bq. Why are there separate methods for increase and decrease instead of a 
single method to change the container resource size? By comparing the existing 
resource allocation to a container and the new requested resource allocation, 
it should be clear whether an increase or decrease is being requested.

As discussed in the design stage, and also described in the design doc, the 
reason to separate the increase/decrease requests in the APIs and AMRM protocol 
is to make sure that users will make a conscious decision when they are making 
these requests. It is also much easier to catch any potential mistakes that the 
user could make. For example, if a user intends to increase resource of a 
container, but for whatever reason mistakenly specifies a target resource that 
is smaller than the current resource, RM can catch that and throw exception.

bq. Also, for completeness, is there a need for a 
cancelContainerResourceChange()? After a container resource change request has 
been submitted, what are my options as a user other than to wait for the 
request to be satisfied by the RM?

For container resource decrease request, there is practically no chance (and 
probably no need) to cancel the request, as it happens immediately when 
scheduler process the request (this is similar to the release container 
request). For container resource increase, the user can cancel any pending 
increase request still sitting in RM by sending a decrease request of the same 
size of the current container size. I will improve the Javadoc description to 
make it clear on this.

bq. If I release the container, then does it mean all pending change requests 
for that container should be removed? From a quick look at the patch, it does 
not look like that is being covered, unless I am missing something.

You are right that releasing a container should cancel all pending change 
requests for that container. This is missing in the current implementation, I 
will add that.

bq. What will happen if the AM restarts after submitting a change request. Does 
the AM-RM re-register protocol need an update to handle the case of 
re-synchronizing on the change requests? Whats happens if the RM restarts? If 
these are explained in a document, then please point me to the document. The 
patch did not seem to have anything around this area. So I thought I would ask

The current implementation handles RM restarts by maintaining a pendingIncrease 
and pendingDecrease map, just like the pendingRelease list. This is covered in 
the design doc.
For AM restarts, I am not sure what we need to do here. Does AM-RM re-register 
protocol currently handle the re-synchronize of outstanding new container 
requests after AM is restarted? Will you be able to elaborate a little bit on 
this?

bq. Also, why have the callback interface methods been made non-public? Would 
that be an incompatible change?

All interface methods are implicitly public and abstract. The existing public 
modifier on these methods are redundant, so I removed them.

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch, YARN-1509.5.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore

2015-10-08 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-4005:
-
Fix Version/s: (was: 2.8.0)
   2.7.2

I pulled this in to branch-2.7 as well.

> Completed container whose app is finished is not removed from NMStateStore
> --
>
> Key: YARN-4005
> URL: https://issues.apache.org/jira/browse/YARN-4005
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.7.2
>
> Attachments: YARN-4005.01.patch
>
>
> If a container is completed and its corresponding app is finished, NM only 
> removes it from its context and does not add it to 
> 'recentlyStoppedContainers' when calling 'getContainerStatuses'. Then NM will 
> not remove it from NMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore

2015-10-08 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-4005:
-
Fix Version/s: 2.6.2

Also committed to branch-2.6.

> Completed container whose app is finished is not removed from NMStateStore
> --
>
> Key: YARN-4005
> URL: https://issues.apache.org/jira/browse/YARN-4005
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.7.2, 2.6.2
>
> Attachments: YARN-4005.01.patch
>
>
> If a container is completed and its corresponding app is finished, NM only 
> removes it from its context and does not add it to 
> 'recentlyStoppedContainers' when calling 'getContainerStatuses'. Then NM will 
> not remove it from NMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3780) Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition

2015-10-08 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3780:
-
Fix Version/s: (was: 2.8.0)
   2.6.2
   2.7.2

I committed this to branch-2.7 and branch-2.6 as well.

> Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition
> -
>
> Key: YARN-3780
> URL: https://issues.apache.org/jira/browse/YARN-3780
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Fix For: 2.7.2, 2.6.2
>
> Attachments: YARN-3780.000.patch
>
>
> Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition 
> to avoid unnecessary NodeResourceUpdateSchedulerEvent.
> The current code use {{!=}} to compare Resource totalCapability, which will 
> compare reference not the real value in Resource. So we should use equals to 
> compare Resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948870#comment-14948870
 ] 

zhihai xu commented on YARN-3943:
-

The checkstyle issues and release audit warnings for the new patch 
YARN-3943.002.patch were pre-existing.

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch, 
> YARN-3943.002.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3802) Two RMNodes for the same NodeId are used in RM sometimes after NM is reconnected.

2015-10-08 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3802:
-
Fix Version/s: (was: 2.8.0)
   2.6.2
   2.7.2

I committed this to branch-2.7 and branch-2.6 as well.

> Two RMNodes for the same NodeId are used in RM sometimes after NM is 
> reconnected.
> -
>
> Key: YARN-3802
> URL: https://issues.apache.org/jira/browse/YARN-3802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.2, 2.6.2
>
> Attachments: YARN-3802.000.patch, YARN-3802.001.patch
>
>
> Two RMNodes for the same NodeId are used in RM sometimes after NM is 
> reconnected. Scheduler and RMContext use different RMNode reference for the 
> same NodeId sometimes after NM is reconnected, which is not correct. 
> Scheduler and RMContext should always use same RMNode reference for the same 
> NodeId.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4201) AMBlacklist does not work for minicluster

2015-10-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948914#comment-14948914
 ] 

zhihai xu commented on YARN-4201:
-

Thanks for the patch [~hex108]! It is a good catch.
Should we use {{SchedulerNode#getNodeName}} to get the blacklisted node name?
We can add {{getSchedulerNode}} to {{YarnScheduler}}, So we can call 
{{getSchedulerNode}} to look up the the SchedulerNode using NodeId in 
{{RMAppAttemptImpl}}.


> AMBlacklist does not work for minicluster
> -
>
> Key: YARN-4201
> URL: https://issues.apache.org/jira/browse/YARN-4201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4021.001.patch
>
>
> For minicluster (scheduler.include-port-in-node-name is set to TRUE), 
> AMBlacklist does not work. It is because RM just puts host to AMBlacklist 
> whether scheduler.include-port-in-node-name is set or not. In fact RM should 
> put "host + port" to AMBlacklist when it is set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4201) AMBlacklist does not work for minicluster

2015-10-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948931#comment-14948931
 ] 

zhihai xu commented on YARN-4201:
-

Currently {{getSchedulerNode}} is defined at {{AbstractYarnScheduler}}. 
{{SchedulerAppUtils.isBlacklisted}} uses {{node.getNodeName()}} to check 
blacklisted node. So it will be good to use the same way to get blacklisted 
node name. All the configuration and format related to node name will be only 
in SchedulerNode.java.

> AMBlacklist does not work for minicluster
> -
>
> Key: YARN-4201
> URL: https://issues.apache.org/jira/browse/YARN-4201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4021.001.patch
>
>
> For minicluster (scheduler.include-port-in-node-name is set to TRUE), 
> AMBlacklist does not work. It is because RM just puts host to AMBlacklist 
> whether scheduler.include-port-in-node-name is set or not. In fact RM should 
> put "host + port" to AMBlacklist when it is set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3194) RM should handle NMContainerStatuses sent by NM while registering if NM is Reconnected node

2015-10-08 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3194:
-
Fix Version/s: 2.6.2

I committed this to branch-2.6 as well.

> RM should handle NMContainerStatuses sent by NM while registering if NM is 
> Reconnected node
> ---
>
> Key: YARN-3194
> URL: https://issues.apache.org/jira/browse/YARN-3194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: NM restart is enabled
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Fix For: 2.7.0, 2.6.2
>
> Attachments: 0001-YARN-3194.patch, 0001-yarn-3194-v1.patch
>
>
> On NM restart ,NM sends all the outstanding NMContainerStatus to RM during 
> registration. The registration can be treated by RM as New node or 
> Reconnecting node. RM triggers corresponding event on the basis of node added 
> or node reconnected state. 
> # Node added event : Again here 2 scenario's can occur 
> ## New node is registering with different ip:port – NOT A PROBLEM
> ## Old node is re-registering because of RESYNC command from RM when RM 
> restart – NOT A PROBLEM
> # Node reconnected event : 
> ## Existing node is re-registering i.e RM treat it as reconnecting node when 
> RM is not restarted 
> ### NM RESTART NOT Enabled – NOT A PROBLEM
> ### NM RESTART is Enabled 
>  Some applications are running on this node – *Problem is here*
>  Zero applications are running on this node – NOT A PROBLEM
> Since NMContainerStatus are not handled, RM never get to know about 
> completedContainer and never release resource held be containers. RM will not 
> allocate new containers for pending resource request as long as the 
> completedContainer event is triggered. This results in applications to wait 
> indefinitly because of pending containers are not served by RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-10-08 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3896:
-
Fix Version/s: (was: 2.8.0)
   2.6.2
   2.7.2

I committed this to branch-2.7 and branch-2.6 as well.

> RMNode transitioned from RUNNING to REBOOTED because its response id had not 
> been reset synchronously
> -
>
> Key: YARN-3896
> URL: https://issues.apache.org/jira/browse/YARN-3896
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
>  Labels: resourcemanager
> Fix For: 2.7.2, 2.6.2
>
> Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
> YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
> YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch
>
>
> {noformat}
> 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
> Resolved 10.208.132.153 to /default-rack
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Reconnect from the node at: 10.208.132.153
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
> with capability: , assigned nodeId 
> 10.208.132.153:8041
> 2015-07-03 16:49:39,104 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
> behind rm response id:2506413 nm response id:0
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node 10.208.132.153:8041 as it is now REBOOTED
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
> {noformat}
> The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
> set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
> heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission

2015-10-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948995#comment-14948995
 ] 

Junping Du commented on YARN-3223:
--

Sorry for coming late on this as just back from a long leave. Is your patch 
available for review? If so, can you click the button of "submit patch" to 
trigger Jenkins test against your patch? Also, please don't delete old/stale 
patches which could cause us to lose track of full history on 
patches/discussions. Thx!

> Resource update during NM graceful decommission
> ---
>
> Key: YARN-3223
> URL: https://issues.apache.org/jira/browse/YARN-3223
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Junping Du
>Assignee: Brook Zhou
> Attachments: YARN-3223-v0.patch
>
>
> During NM graceful decommission, we should handle resource update properly, 
> include: make RMNode keep track of old resource for possible rollback, keep 
> available resource to 0 and used resource get updated when
> container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-10-08 Thread Neelesh Srinivas Salian (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neelesh Srinivas Salian updated YARN-3996:
--
Attachment: YARN-3996.002.patch

Resolved the issues with the Capacity, FIFO and SLS schedulers.

I am not sure how to approach the testing. Wrote a basic unit test for this at 
the moment.

Trying to think how to make it more robust.
Will update if I think of a sturdier approach.

In the meantime, requesting some feedback on version 002 of the patch.

Thank you.

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>Priority: Critical
> Attachments: YARN-3996.001.patch, YARN-3996.002.patch, 
> YARN-3996.prelim.patch
>
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4236) Metric for aggregated resources allocation per queue

2015-10-08 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4236:
---
Attachment: YARN-4236.patch

> Metric for aggregated resources allocation per queue
> 
>
> Key: YARN-4236
> URL: https://issues.apache.org/jira/browse/YARN-4236
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4236.patch, YARN-4236.patch
>
>
> We currently track allocated memory and allocated vcores per queue but we 
> don't have a good rate metric on how fast we're allocating these things. In 
> other words, a straight line in allocatedmb could equally be one extreme of 
> no new containers are being allocated or allocating a bunch of containers 
> where we free exactly what we allocate each time. Adding a resources 
> allocated per second per queue would give us a better insight into the rate 
> of resource churn on a queue. Based on this aggregated resource allocation 
> per queue we can easily have some tools to measure the rate of resource 
> allocation per queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4237) Support additional queries for ATSv2 Web UI

2015-10-08 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4237:
---
Attachment: YARN-4237-YARN-2928.01.patch

Fields would specify whether metrics for the flowruns will be returned or not. 
For a single flowrun, metrics will be returned. Maybe we can decide whether to 
send them or not on the basis of fields query param as well.

> Support additional queries for ATSv2 Web UI
> ---
>
> Key: YARN-4237
> URL: https://issues.apache.org/jira/browse/YARN-4237
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4237-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4238) createdTime is not reported while publishing entities to ATSv2

2015-10-08 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949064#comment-14949064
 ] 

Varun Saxena commented on YARN-4238:


Yes its there as part of the event inside entity. But its not being set 
explicitly in entity.setCreatedTime(). You are correct this was not being done 
in ATSv1 either. 
In ATSv1 though if start time was not sent for an entity, Leveldb 
implementation of timeline store will check the entity(if it does not exist) 
and parse through all the events. And the smallest one will be chosen as start 
time. And if that was not there, I think current system time was taken(will 
have to check on that one).

Regardless, in ATSv2 we are neither setting created time in the client, nor do 
we have logic like ATSv1 in the HBase writer. So the end result is that created 
time is never updated in the backend.
Either ways this has to be handled. Either from publishing side or the writer.
I am not sure why the approach of fetching it from events was taken in ATSv1. 
As get and then put on every call can be expensive call from a HBase 
perspective, I think client can send it whenever it wants to. I do not see any 
issues around sending it from RM, NM,etc. from where entities are published. 
Otherwise we will have to check for specific events. I will check if there are 
any issues centering around sending it from client when I fix this. 

Now coming to what if client does not send it. This would be an issue if 
entities have to be returned sorted by created time or filtering on the basis 
of created time range has to be done. This can be explicitly stated for clients 
that if you do not report created time then we cannot guarantee order while 
fetching multiple entities. This will make it simple from an implementation 
viewpoint. If not, maybe we can cache it and check if entity has gone in to the 
backed or not and based on that, set created time.
But in this case, issue is what if daemon(having the writer) goes down. Maybe 
we can store this info in a state store. But do we need to do that ?

> createdTime is not reported while publishing entities to ATSv2
> --
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4241) Typo in yarn-default.xml

2015-10-08 Thread Anthony Rojas (JIRA)
Anthony Rojas created YARN-4241:
---

 Summary: Typo in yarn-default.xml
 Key: YARN-4241
 URL: https://issues.apache.org/jira/browse/YARN-4241
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation, yarn
Reporter: Anthony Rojas
Assignee: Anthony Rojas
Priority: Trivial


Typo in description section of yarn-default.xml, under the properties:

yarn.nodemanager.disk-health-checker.min-healthy-disks
yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb

The reference to yarn-nodemanager.local-dirs should be 
yarn.nodemanager.local-dirs




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4241) Typo in yarn-default.xml

2015-10-08 Thread Anthony Rojas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Rojas updated YARN-4241:

Attachment: YARN-4241.patch

> Typo in yarn-default.xml
> 
>
> Key: YARN-4241
> URL: https://issues.apache.org/jira/browse/YARN-4241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, yarn
>Reporter: Anthony Rojas
>Assignee: Anthony Rojas
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-4241.patch
>
>
> Typo in description section of yarn-default.xml, under the properties:
> yarn.nodemanager.disk-health-checker.min-healthy-disks
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
> yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
> The reference to yarn-nodemanager.local-dirs should be 
> yarn.nodemanager.local-dirs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4237) Support additional queries for ATSv2 Web UI

2015-10-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949200#comment-14949200
 ] 

Hadoop QA commented on YARN-4237:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 52s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  5s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 15s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 16s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 53s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   2m 49s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  40m 54s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765639/YARN-4237-YARN-2928.01.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 5a3db96 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9380/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9380/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9380/console |


This message was automatically generated.

> Support additional queries for ATSv2 Web UI
> ---
>
> Key: YARN-4237
> URL: https://issues.apache.org/jira/browse/YARN-4237
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4237-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission

2015-10-08 Thread Brook Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949223#comment-14949223
 ] 

Brook Zhou commented on YARN-3223:
--

Ah okay, sorry about that, will do. 

It seems to be passing test-patch on my local trunk repo, so I will update with 
submit patch.

> Resource update during NM graceful decommission
> ---
>
> Key: YARN-3223
> URL: https://issues.apache.org/jira/browse/YARN-3223
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Junping Du
>Assignee: Brook Zhou
> Attachments: YARN-3223-v0.patch
>
>
> During NM graceful decommission, we should handle resource update properly, 
> include: make RMNode keep track of old resource for possible rollback, keep 
> available resource to 0 and used resource get updated when
> container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4237) Support additional queries for ATSv2 Web UI

2015-10-08 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949267#comment-14949267
 ] 

Varun Saxena commented on YARN-4237:


Or maybe mock the row key class.

> Support additional queries for ATSv2 Web UI
> ---
>
> Key: YARN-4237
> URL: https://issues.apache.org/jira/browse/YARN-4237
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4237-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-10-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949299#comment-14949299
 ] 

Hadoop QA commented on YARN-3996:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 30s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   9m  6s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 47s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 21s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 18s | The applied patch generated  1 
new checkstyle issues (total was 279, now 279). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 41s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 37s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   0m 56s | Tests passed in 
hadoop-sls. |
| {color:red}-1{color} | yarn tests |  58m  3s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 107m  2s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765634/YARN-3996.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0841940 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9379/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9379/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9379/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9379/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9379/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9379/console |


This message was automatically generated.

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>Priority: Critical
> Attachments: YARN-3996.001.patch, YARN-3996.002.patch, 
> YARN-3996.prelim.patch
>
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4241) Typo in yarn-default.xml

2015-10-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949314#comment-14949314
 ] 

Hadoop QA commented on YARN-4241:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 34s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 50s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 21s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 43s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | yarn tests |   2m 11s | Tests failed in 
hadoop-yarn-common. |
| | |  42m 45s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.logaggregation.TestAggregatedLogsBlock |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765653/YARN-4241.patch |
| Optional Tests | javadoc javac unit |
| git revision | trunk / 0841940 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9382/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9382/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9382/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9382/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9382/console |


This message was automatically generated.

> Typo in yarn-default.xml
> 
>
> Key: YARN-4241
> URL: https://issues.apache.org/jira/browse/YARN-4241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, yarn
>Reporter: Anthony Rojas
>Assignee: Anthony Rojas
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-4241.patch
>
>
> Typo in description section of yarn-default.xml, under the properties:
> yarn.nodemanager.disk-health-checker.min-healthy-disks
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
> yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
> The reference to yarn-nodemanager.local-dirs should be 
> yarn.nodemanager.local-dirs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4236) Metric for aggregated resources allocation per queue

2015-10-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949322#comment-14949322
 ] 

Hadoop QA commented on YARN-4236:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 26s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 58s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 34s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 49s | The applied patch generated  2 
new checkstyle issues (total was 52, now 54). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  57m  1s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  97m 51s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765637/YARN-4236.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0841940 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9381/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9381/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9381/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9381/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9381/console |


This message was automatically generated.

> Metric for aggregated resources allocation per queue
> 
>
> Key: YARN-4236
> URL: https://issues.apache.org/jira/browse/YARN-4236
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4236.patch, YARN-4236.patch
>
>
> We currently track allocated memory and allocated vcores per queue but we 
> don't have a good rate metric on how fast we're allocating these things. In 
> other words, a straight line in allocatedmb could equally be one extreme of 
> no new containers are being allocated or allocating a bunch of containers 
> where we free exactly what we allocate each time. Adding a resources 
> allocated per second per queue would give us a better insight into the rate 
> of resource churn on a queue. Based on this aggregated resource allocation 
> per queue we can easily have some tools to measure the rate of resource 
> allocation per queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API

2015-10-08 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949338#comment-14949338
 ] 

Jonathan Eagles commented on YARN-4009:
---

[~vvasudev], I'm trying hard to find a balance here. On the one hand I want to 
support backwards compatibility on the other hand I want configuration to be 
simple. I want to support way that I can enable only RM and timeline CORS 
support while only specifying the configuration once (not once for common CORS 
and once for timeline CORS). However, I want to support both the old 
configuration parameters.

Proposals
1) If timeline CORS is enabled () we can have the timeline cors configuration 
override the common CORS if they are present otherwise use the common 
configuration. 
2) Create a second timeline enabled flag that will only use the new CORS 
classes, configs and behavior. This will allow the the old way using the old 
configs with timeline prefix to work, but allow users to migrate to the new way 
to simplify configuration.

What do you think?

> CORS support for ResourceManager REST API
> -
>
> Key: YARN-4009
> URL: https://issues.apache.org/jira/browse/YARN-4009
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Varun Vasudev
> Attachments: YARN-4009.001.patch, YARN-4009.002.patch, 
> YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, 
> YARN-4009.006.patch
>
>
> Currently the REST API's do not have CORS support. This means any UI (running 
> in browser) cannot consume the REST API's. For ex Tez UI would like to use 
> the REST API for getting application, application attempt information exposed 
> by the API's. 
> It would be very useful if CORS is enabled for the REST API's.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4241) Typo in yarn-default.xml

2015-10-08 Thread Anthony Rojas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Rojas updated YARN-4241:

Attachment: YARN-4241.patch.1

Removed trailing whitespaces on lines 19 and 28.


> Typo in yarn-default.xml
> 
>
> Key: YARN-4241
> URL: https://issues.apache.org/jira/browse/YARN-4241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, yarn
>Reporter: Anthony Rojas
>Assignee: Anthony Rojas
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-4241.patch, YARN-4241.patch.1
>
>
> Typo in description section of yarn-default.xml, under the properties:
> yarn.nodemanager.disk-health-checker.min-healthy-disks
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
> yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
> The reference to yarn-nodemanager.local-dirs should be 
> yarn.nodemanager.local-dirs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4241) Typo in yarn-default.xml

2015-10-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949407#comment-14949407
 ] 

Hadoop QA commented on YARN-4241:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | patch |   0m  0s | The patch file was not named 
according to hadoop's naming conventions. Please see 
https://wiki.apache.org/hadoop/HowToContribute for instructions. |
| {color:blue}0{color} | pre-patch |  15m 41s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 13s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 37s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 19s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| | |  38m 57s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765674/YARN-4241.patch.1 |
| Optional Tests | javadoc javac unit |
| git revision | trunk / 0841940 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9384/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9384/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9384/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9384/console |


This message was automatically generated.

> Typo in yarn-default.xml
> 
>
> Key: YARN-4241
> URL: https://issues.apache.org/jira/browse/YARN-4241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, yarn
>Reporter: Anthony Rojas
>Assignee: Anthony Rojas
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-4241.patch, YARN-4241.patch.1
>
>
> Typo in description section of yarn-default.xml, under the properties:
> yarn.nodemanager.disk-health-checker.min-healthy-disks
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
> yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
> The reference to yarn-nodemanager.local-dirs should be 
> yarn.nodemanager.local-dirs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission

2015-10-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949483#comment-14949483
 ] 

Hadoop QA commented on YARN-3223:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 12 new or modified test files. |
| {color:green}+1{color} | javac |   9m 13s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 45s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 17s | The applied patch generated  
14 new checkstyle issues (total was 180, now 194). |
| {color:red}-1{color} | whitespace |   0m  8s | The patch has 22  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 42s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 40s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   0m 55s | Tests passed in 
hadoop-sls. |
| {color:red}-1{color} | yarn tests |  57m 44s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 107m 10s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestClientRMService |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764643/YARN-3223-v0.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0841940 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9383/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9383/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9383/artifact/patchprocess/whitespace.txt
 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9383/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9383/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9383/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9383/console |


This message was automatically generated.

> Resource update during NM graceful decommission
> ---
>
> Key: YARN-3223
> URL: https://issues.apache.org/jira/browse/YARN-3223
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Junping Du
>Assignee: Brook Zhou
> Attachments: YARN-3223-v0.patch
>
>
> During NM graceful decommission, we should handle resource update properly, 
> include: make RMNode keep track of old resource for possible rollback, keep 
> available resource to 0 and used resource get updated when
> container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-08 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949512#comment-14949512
 ] 

Jason Lowe commented on YARN-3943:
--

+1 lgtm.  Committing this.

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch, 
> YARN-3943.002.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949536#comment-14949536
 ] 

zhihai xu commented on YARN-3943:
-

Thanks [~jlowe] for the review and committing the patch, greatly appreciated!

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch, 
> YARN-3943.002.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4207) Add a non-judgemental YARN app completion status

2015-10-08 Thread Rich Haase (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949546#comment-14949546
 ] 

Rich Haase commented on YARN-4207:
--

This looks like a pretty trivial change.  Adding an additional value to the 
o.a.h.yarn.records.FinalApplicationStatus enum.  In a quick search I didn't see 
anything downstream within Hadoop that would be impacted by such a patch.  If 
no one else is working on this JIRA and the approach I've described is 
acceptable I will put together a patch.




> Add a non-judgemental YARN app completion status
> 
>
> Key: YARN-4207
> URL: https://issues.apache.org/jira/browse/YARN-4207
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>
> For certain applications, it doesn't make sense to have SUCCEEDED or FAILED 
> end state. For example, Tez sessions may include multiple DAGs, some of which 
> have succeeded and some have failed; there's no clear status for the session 
> both logically and from user perspective (users are confused either way). 
> There needs to be a status not implying success or failure, such as 
> "done"/"ended"/"finished".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949554#comment-14949554
 ] 

Hudson commented on YARN-3943:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8596 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8596/])
YARN-3943. Use separate threshold configurations for disk-full detection 
(jlowe: rev 8d226225d030253152494bda32708377ad0f7af7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt


> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch, 
> YARN-3943.002.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-10-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949586#comment-14949586
 ] 

Wangda Tan commented on YARN-4140:
--

Thanks for update, [~bibinchundatt], patch looks good, pending Jenkins.

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep 
> "root.b.b1" | wc -l
> 500
> {code}
>  
> (Consumes about 6 minutes)
>  



--
This message was sent by A

[jira] [Created] (YARN-4242) add analyze command to explictly cache file metadata in HBase metastore

2015-10-08 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created YARN-4242:
--

 Summary: add analyze command to explictly cache file metadata in 
HBase metastore
 Key: YARN-4242
 URL: https://issues.apache.org/jira/browse/YARN-4242
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sergey Shelukhin


ANALYZE TABLE (spec as usual) CACHE METADATA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4242) add analyze command to explictly cache file metadata in HBase metastore

2015-10-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved YARN-4242.

Resolution: Invalid

Wrong project

> add analyze command to explictly cache file metadata in HBase metastore
> ---
>
> Key: YARN-4242
> URL: https://issues.apache.org/jira/browse/YARN-4242
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> ANALYZE TABLE (spec as usual) CACHE METADATA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-08 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949639#comment-14949639
 ] 

MENG DING commented on YARN-1509:
-

Had an offline discussion with [~leftnoteasy] and [~bikassaha]. Overall we 
agreed that we can combine the separate increase/decrease requests into one API 
in the client:

* Combine {{requestContainerResourceIncrease}} and 
{{requestContainerResourceDecrease}} into one API. For example:
{code}
  /**
   * Request container resource change before calling allocate.
   * Any previous pending resource change request of the same container will be
   * cancelled.
   *
   * @param container The container returned from the last successful resource
   *  allocation or resource change
   * @param capability  The target resource capability of the container
   */
  public abstract void requestContainerResourceChange(
  Container container, Resource capability);
{code}
User must pass in a container object (instead of just a container ID), and the 
target resource capability. Because the container object contains the existing 
container Resource, the AMRMClient can use that information to compare against 
the target resource to figure out if this is an increase or decrease request.

* There is *NO* need to change the AMRM protocol. 

* For the CallbackHandler methods, we can also combine 
{{onContainersResourceDecreased}} and {{onContainersResourceIncreased}} into 
one API:
{code}
public abstract void onContainersResourceChanged(
List containers);
{code}
The user can compare the passed-in containers with the containers they have 
remembered to determine if this is an increase or decrease request. Or maybe we 
can make it even simpler by doing something like the following? Thoughts?
{code}
public abstract void onContainersResourceChanged(
List increasedContainers,  List 
decreasedContainers);
{code}

* We can *deprecate* the existing CallbackHandler interface and use the 
AbstractCallbackHandler instead.

[~bikassaha], [~leftnoteasy], any comments?

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch, YARN-1509.5.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4207) Add a non-judgemental YARN app completion status

2015-10-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949644#comment-14949644
 ] 

Sergey Shelukhin commented on YARN-4207:


It's unassigned, so I gather noone is working on it. This plan sounds good to 
me (non-binding :))

> Add a non-judgemental YARN app completion status
> 
>
> Key: YARN-4207
> URL: https://issues.apache.org/jira/browse/YARN-4207
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>
> For certain applications, it doesn't make sense to have SUCCEEDED or FAILED 
> end state. For example, Tez sessions may include multiple DAGs, some of which 
> have succeeded and some have failed; there's no clear status for the session 
> both logically and from user perspective (users are confused either way). 
> There needs to be a status not implying success or failure, such as 
> "done"/"ended"/"finished".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-10-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949658#comment-14949658
 ] 

Wangda Tan commented on YARN-4162:
--

[~Naganarasimha],

Thanks a lot for updating! Looked at patch and tried it locally, some minor 
comments:

1. UserInfo#getResources -> getResourceUsageInfo

2. CapacitySchedulerPage, renderQueueCapacityInfo can be removed? Is it 
equivalent if using renderQueueCapacityInfo(ri, lqinfo.get(DEFAULT_PARTITION)) 
instead? 

3. Also,
For 
{code}
  UL ul = html.ul("#pq");
  for (CapacitySchedulerQueueInfo info : subQueues) {
float used;
float absCap;
float absMaxCap;
float absUsedCap;
//...
{code}
Is it possible to use the same PartitionQueueCapacitiesInfo instead of check if 
csqinfo.label == null or not?

4. PartitionResourceUsageInfo.amResource -> amUsed

5. Why this isExclusiveNodeLabel check is needed?
{code}
  if (!nodeLabel.equals(NodeLabel.DEFAULT_NODE_LABEL_PARTITION)
  && csqinfo.isExclusiveNodeLabel
{code}

6. Could you update {{}} to {{DEFAULT_PARTITION}}? Since the 
{{< ... >}} could be a illegal attribute for some xml parser, and I'm not sure 
if it is a standard XML property.

> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch, YARN-4162.v2.001.patch, 
> YARN-4162.v2.002.patch, YARN-4162.v2.003.patch, restAndJsonOutput.zip
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949710#comment-14949710
 ] 

Hudson commented on YARN-3943:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #510 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/510/])
YARN-3943. Use separate threshold configurations for disk-full detection 
(jlowe: rev 8d226225d030253152494bda32708377ad0f7af7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch, 
> YARN-3943.002.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-10-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949737#comment-14949737
 ] 

Hadoop QA commented on YARN-4140:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   9m  0s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 40s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 21s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 23s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 46s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 55s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 29s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |  63m 31s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 112m 23s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764796/0014-YARN-4140.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8d22622 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9385/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9385/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9385/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9385/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9385/console |


This message was automatically generated.

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.Sc

[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-10-08 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949747#comment-14949747
 ] 

Bibin A Chundatt commented on YARN-4140:


Hi [~leftnoteasy]

Thanks for looking into it.Release audit warning not related to current patch

{noformat}
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/util/tree.h
Lines that start with ? in the release audit  report indicate files that do 
not have an Apache license header.
{noformat}

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContaine

[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949751#comment-14949751
 ] 

Hudson commented on YARN-3943:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1238 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1238/])
YARN-3943. Use separate threshold configurations for disk-full detection 
(jlowe: rev 8d226225d030253152494bda32708377ad0f7af7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java


> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch, 
> YARN-3943.002.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949758#comment-14949758
 ] 

Hudson commented on YARN-3943:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2445 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2445/])
YARN-3943. Use separate threshold configurations for disk-full detection 
(jlowe: rev 8d226225d030253152494bda32708377ad0f7af7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch, 
> YARN-3943.002.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-08 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-4243:
---

 Summary: Add retry on establishing Zookeeper conenction in 
EmbeddedElectorService#serviceInit
 Key: YARN-4243
 URL: https://issues.apache.org/jira/browse/YARN-4243
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong


Right now, the RM would shut down if the zk connection is down when the RM do 
the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-08 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4243:

Attachment: YARN-4243.1.patch

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-08 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949760#comment-14949760
 ] 

Xuan Gong commented on YARN-4243:
-

Override the createConnection() in EmbeddedElectorService to add some retry, 
and create a Yarn Configuration for the maxAttempts because we have shared code 
(ActiveStandbyElector)and related configuration with HDFS ZKFC

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949770#comment-14949770
 ] 

Hudson commented on YARN-3943:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #501 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/501/])
YARN-3943. Use separate threshold configurations for disk-full detection 
(jlowe: rev 8d226225d030253152494bda32708377ad0f7af7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch, 
> YARN-3943.002.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4244) BlockPlacementPolicy related logs should contain the details about the filename and blockid

2015-10-08 Thread J.Andreina (JIRA)
J.Andreina created YARN-4244:


 Summary: BlockPlacementPolicy related logs should contain the 
details about the filename and blockid
 Key: YARN-4244
 URL: https://issues.apache.org/jira/browse/YARN-4244
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: J.Andreina
Assignee: J.Andreina


Currently the user will not get the details about which file/block , the 
BlockPlacementPolicy is not able to find a replica node , if there is a huge 
client write operation is going on.

For example consider below failure message , which does'nt have details about 
file/block , which will be difficult to track later.
{noformat}
  final String message =
  "Failed to place enough replicas, still in need of "
  + (totalReplicasExpected - results.size()) + " to reach " + 
totalReplicasExpected
  + " (unavailableStorages=" + unavailableStorages + ", 
storagePolicy="
  + storagePolicy + ", newBlock=" + newBlock + ")";

String msg = "All required storage types are unavailable: "
+ " unavailableStorages=" + unavailableStorages
+ ", storagePolicy=" + storagePolicy.getName();
{noformat}

It  is better to provide the file/block information in the logs for better 
debugability .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4240) Add documentation for delegated-centralized node labels feature

2015-10-08 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949848#comment-14949848
 ] 

Naganarasimha G R commented on YARN-4240:
-

Hi [~dian.fu],
Can you please wait, i am working on Distributed Node Labels documentatioin 
YARN-4100 and waiting for YARN-2729 to be checked in couple of days and once 
thats done i can push this doc jira and further on top of it, you can update 
for "Delegated-Centralized"

> Add documentation for delegated-centralized node labels feature
> ---
>
> Key: YARN-4240
> URL: https://issues.apache.org/jira/browse/YARN-4240
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Dian Fu
>Assignee: Dian Fu
>
> As a follow up of YARN-3964, we should add documentation for 
> delegated-centralized node labels feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949849#comment-14949849
 ] 

Hudson commented on YARN-3943:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #474 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/474/])
YARN-3943. Use separate threshold configurations for disk-full detection 
(jlowe: rev 8d226225d030253152494bda32708377ad0f7af7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java


> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch, 
> YARN-3943.002.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-10-08 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949871#comment-14949871
 ] 

Naganarasimha G R commented on YARN-4162:
-

Hi [~wangda],
Thanks for the review comments and supporting with local testing.
bq. Is it possible to use the same PartitionQueueCapacitiesInfo instead of 
check if csqinfo.label == null or not?
well i can avoid the if block but csqinfo.label itself cannot be set to the 
default Partition as its also been used as flag to determine to show the leaf 
queue in the normal way or the partition way.

bq. Why this isExclusiveNodeLabel check is needed?
isExclusiveNodeLabel is the check we had earlier in 
CapacitySchedulerInfo.getQueues, basically to avoid displaying the queues which 
is not accessible to a given NodeLabelPartition.
{code}
93  for (CSQueue queue : parentQueue.getChildQueues()) {92  
for (CSQueue queue : parentQueue.getChildQueues()) {
94if (nodeLabel.getIsExclusive()
95&& !((AbstractCSQueue) queue).accessibleToPartition(nodeLabel 

96.getLabelName())) {   
97  // Skip displaying the hierarchy for the queues for which the 
exclusive 
98  // labels are not accessible
99  continue;   
100   }
{code}
bq. Could you update  to DEFAULT_PARTITION?
Well shall i update in all places displayed in UI or only in REST ?

For other comments will get it updated in the next patch .


> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch, YARN-4162.v2.001.patch, 
> YARN-4162.v2.002.patch, YARN-4162.v2.003.patch, restAndJsonOutput.zip
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4240) Add documentation for delegated-centralized node labels feature

2015-10-08 Thread Dian Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949881#comment-14949881
 ] 

Dian Fu commented on YARN-4240:
---

Hi [~Naganarasimha],
Yes, of course. I will update documentation for "Delegated-Centralized" on top 
of YARN-4100 after it is committed.

> Add documentation for delegated-centralized node labels feature
> ---
>
> Key: YARN-4240
> URL: https://issues.apache.org/jira/browse/YARN-4240
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Dian Fu
>Assignee: Dian Fu
>
> As a follow up of YARN-3964, we should add documentation for 
> delegated-centralized node labels feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user

2015-10-08 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4235:

Component/s: fairscheduler

> FairScheduler PrimaryGroup does not handle empty groups returned for a user 
> 
>
> Key: YARN-4235
> URL: https://issues.apache.org/jira/browse/YARN-4235
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-4235.001.patch
>
>
> We see NPE if empty groups are returned for a user. This causes a NPE and 
> cause RM to crash as below
> {noformat}
> 2015-09-22 16:51:52,780  FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ADDED to the scheduler
> java.lang.IndexOutOfBoundsException: Index: 0
>   at java.util.Collections$EmptyList.get(Collections.java:3212)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-09-22 16:51:52,797  INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3286) Cleanup RMNode#ReconnectNodeTransition

2015-10-08 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved YARN-3286.
-
Resolution: Won't Fix

As of now, this JIRA wont be fixing since it changes existing Non HA behavior. 
Closing as wont fix

> Cleanup RMNode#ReconnectNodeTransition
> --
>
> Key: YARN-3286
> URL: https://issues.apache.org/jira/browse/YARN-3286
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-3286.patch, YARN-3286-test-only.patch
>
>
> RMNode#ReconnectNodeTransition has messed up for every ReconnectedEvent. This 
> part of the code can be clean up where we do not require to remove node and 
> add new node every time.
> Supporting to above point, in the issue discussion YARN-3222 mentioned in the 
> comment 
> [link1|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14339799&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14339799]
>  and 
> [link2|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14344739&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14344739]
> Clean up can do the following things
> # It always remove an old node and add a new node. This is not really 
> required, instead old node can be updated with new values.
> # RMNode#totalCapability has stale capability after NM is reconnected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3753) RM failed to come up with "java.io.IOException: Wait for ZKClient creation timed out"

2015-10-08 Thread Sumit Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949911#comment-14949911
 ] 

Sumit Nigam commented on YARN-3753:
---

I had a question. Do I need to explicitly set some yarn-site parameter to 
control runWithRetries in such a case? If so, which parameter needs to be set?

> RM failed to come up with "java.io.IOException: Wait for ZKClient creation 
> timed out"
> -
>
> Key: YARN-3753
> URL: https://issues.apache.org/jira/browse/YARN-3753
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: YARN-3753.1.patch, YARN-3753.2.patch, YARN-3753.patch
>
>
> RM failed to come up with the following error while submitting an mapreduce 
> job.
> {code:title=RM log}
> 015-05-30 03:40:12,190 ERROR recovery.RMStateStore 
> (RMStateStore.java:transition(179)) - Error storing app: 
> application_1432956515242_0006
> java.io.IOException: Wait for ZKClient creation timed out
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1098)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:609)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:175)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:160)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-05-30 03:40:12,194 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:handle(750)) - Received a 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause:
> java.io.IOException: Wait for ZKClient creation timed out
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1098)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:609)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:175)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:160)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTrans

[jira] [Commented] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user

2015-10-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949912#comment-14949912
 ] 

Hudson commented on YARN-4235:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8599 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8599/])
YARN-4235. FairScheduler PrimaryGroup does not handle empty groups 
(rohithsharmaks: rev 8f195387a4a4a5a278119bf4c2f15cad61f0e2c7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueuePlacementPolicy.java


> FairScheduler PrimaryGroup does not handle empty groups returned for a user 
> 
>
> Key: YARN-4235
> URL: https://issues.apache.org/jira/browse/YARN-4235
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-4235.001.patch
>
>
> We see NPE if empty groups are returned for a user. This causes a NPE and 
> cause RM to crash as below
> {noformat}
> 2015-09-22 16:51:52,780  FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ADDED to the scheduler
> java.lang.IndexOutOfBoundsException: Index: 0
>   at java.util.Collections$EmptyList.get(Collections.java:3212)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-09-22 16:51:52,797  INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4201) AMBlacklist does not work for minicluster

2015-10-08 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-4201:
---
Attachment: YARN-4201.002.patch

> AMBlacklist does not work for minicluster
> -
>
> Key: YARN-4201
> URL: https://issues.apache.org/jira/browse/YARN-4201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4021.001.patch, YARN-4201.002.patch
>
>
> For minicluster (scheduler.include-port-in-node-name is set to TRUE), 
> AMBlacklist does not work. It is because RM just puts host to AMBlacklist 
> whether scheduler.include-port-in-node-name is set or not. In fact RM should 
> put "host + port" to AMBlacklist when it is set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949921#comment-14949921
 ] 

Hadoop QA commented on YARN-4243:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  22m 44s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 53s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 22s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m  1s | The applied patch generated  2 
new checkstyle issues (total was 211, now 212). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 52s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 36s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  19m 18s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | yarn tests |   0m 24s | Tests failed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |  62m 59s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 138m  8s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.conf.TestYarnConfigurationFields |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
| Timed out tests | org.apache.hadoop.http.TestHttpServerLifecycle |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765723/YARN-4243.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / e1bf8b3 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9386/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9386/console |


This message was automatically generated.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4201) AMBlacklist does not work for minicluster

2015-10-08 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949923#comment-14949923
 ] 

Jun Gong commented on YARN-4201:


Thanks [~zxu] for the review and very valuable suggestion. The code is more 
clean now.

Attach a new patch to address your comment.

> AMBlacklist does not work for minicluster
> -
>
> Key: YARN-4201
> URL: https://issues.apache.org/jira/browse/YARN-4201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4021.001.patch, YARN-4201.002.patch
>
>
> For minicluster (scheduler.include-port-in-node-name is set to TRUE), 
> AMBlacklist does not work. It is because RM just puts host to AMBlacklist 
> whether scheduler.include-port-in-node-name is set or not. In fact RM should 
> put "host + port" to AMBlacklist when it is set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949936#comment-14949936
 ] 

Hudson commented on YARN-3943:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2412 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2412/])
YARN-3943. Use separate threshold configurations for disk-full detection 
(jlowe: rev 8d226225d030253152494bda32708377ad0f7af7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch, 
> YARN-3943.002.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-10-08 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949941#comment-14949941
 ] 

Devaraj K commented on YARN-3964:
-

Thanks [~dian.fu] for the updated patch. 

Latest patch looks good to me. I will commit it tomorrow if there are no 
further comments/objections.


> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, 
> YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, 
> YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.016.patch, 
> YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user

2015-10-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949949#comment-14949949
 ] 

Hudson commented on YARN-4235:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #513 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/513/])
YARN-4235. FairScheduler PrimaryGroup does not handle empty groups 
(rohithsharmaks: rev 8f195387a4a4a5a278119bf4c2f15cad61f0e2c7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueuePlacementPolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java


> FairScheduler PrimaryGroup does not handle empty groups returned for a user 
> 
>
> Key: YARN-4235
> URL: https://issues.apache.org/jira/browse/YARN-4235
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-4235.001.patch
>
>
> We see NPE if empty groups are returned for a user. This causes a NPE and 
> cause RM to crash as below
> {noformat}
> 2015-09-22 16:51:52,780  FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ADDED to the scheduler
> java.lang.IndexOutOfBoundsException: Index: 0
>   at java.util.Collections$EmptyList.get(Collections.java:3212)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-09-22 16:51:52,797  INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4238) createdTime is not reported while publishing entities to ATSv2

2015-10-08 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949959#comment-14949959
 ] 

Varun Saxena commented on YARN-4238:


[~sjlee0], [~djp], thoughts on this ?

> createdTime is not reported while publishing entities to ATSv2
> --
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user

2015-10-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949986#comment-14949986
 ] 

Hudson commented on YARN-4235:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1240 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1240/])
YARN-4235. FairScheduler PrimaryGroup does not handle empty groups 
(rohithsharmaks: rev 8f195387a4a4a5a278119bf4c2f15cad61f0e2c7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueuePlacementPolicy.java
* hadoop-yarn-project/CHANGES.txt


> FairScheduler PrimaryGroup does not handle empty groups returned for a user 
> 
>
> Key: YARN-4235
> URL: https://issues.apache.org/jira/browse/YARN-4235
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-4235.001.patch
>
>
> We see NPE if empty groups are returned for a user. This causes a NPE and 
> cause RM to crash as below
> {noformat}
> 2015-09-22 16:51:52,780  FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ADDED to the scheduler
> java.lang.IndexOutOfBoundsException: Index: 0
>   at java.util.Collections$EmptyList.get(Collections.java:3212)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-09-22 16:51:52,797  INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4017) container-executor overuses PATH_MAX

2015-10-08 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949987#comment-14949987
 ] 

Sidharta Seethana commented on YARN-4017:
-

It seems to me that using a defined value of 4096 should suffice. I'll upload a 
patch shortly. 

> container-executor overuses PATH_MAX
> 
>
> Key: YARN-4017
> URL: https://issues.apache.org/jira/browse/YARN-4017
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>
> Lots of places in container-executor are now using PATH_MAX, which is simply 
> too small on a lot of platforms.  We should use a larger buffer size and be 
> done with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4017) container-executor overuses PATH_MAX

2015-10-08 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana reassigned YARN-4017:
---

Assignee: Sidharta Seethana

> container-executor overuses PATH_MAX
> 
>
> Key: YARN-4017
> URL: https://issues.apache.org/jira/browse/YARN-4017
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Sidharta Seethana
>
> Lots of places in container-executor are now using PATH_MAX, which is simply 
> too small on a lot of platforms.  We should use a larger buffer size and be 
> done with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4017) container-executor overuses PATH_MAX

2015-10-08 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-4017:

Attachment: YARN-4017.001.patch

uploading a patch with changes to container-executor to remove use to PATH_MAX 
. 

> container-executor overuses PATH_MAX
> 
>
> Key: YARN-4017
> URL: https://issues.apache.org/jira/browse/YARN-4017
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Sidharta Seethana
> Attachments: YARN-4017.001.patch
>
>
> Lots of places in container-executor are now using PATH_MAX, which is simply 
> too small on a lot of platforms.  We should use a larger buffer size and be 
> done with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4201) AMBlacklist does not work for minicluster

2015-10-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949991#comment-14949991
 ] 

zhihai xu commented on YARN-4201:
-

Thanks for the new patch [~hex108], I think it will be better to check 
{{scheduler.getSchedulerNode(nodeId)}} not null to avoid NPE.
If {{scheduler.getSchedulerNode(nodeId)}} return null, it means the blacklisted 
node is just removed from scheduler, I think it will be ok to not add a removed 
node to black List.

> AMBlacklist does not work for minicluster
> -
>
> Key: YARN-4201
> URL: https://issues.apache.org/jira/browse/YARN-4201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4021.001.patch, YARN-4201.002.patch
>
>
> For minicluster (scheduler.include-port-in-node-name is set to TRUE), 
> AMBlacklist does not work. It is because RM just puts host to AMBlacklist 
> whether scheduler.include-port-in-node-name is set or not. In fact RM should 
> put "host + port" to AMBlacklist when it is set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)