[jira] [Commented] (YARN-2468) Log handling for LRS

2014-10-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156127#comment-14156127
 ] 

Zhijie Shen commented on YARN-2468:
---

bq. I would like to check how many log files we can upload this time. If the 
number is 0, we can skip this time. And this check is also happened before 
LogKey.write(), otherwise, we will write key, but without value.

I think Vinod meant that pendingUploadFiles is needed, but doesn't need to the 
member variable. getPendingLogFilesToUploadForThisContainer can return this 
collection, and pass it into LogValue.write by adding one param of it.

2. IMHO, the following code can be improved. If we use iterator, we can delete 
the unnecessary element on the fly.
{code}
  for (File file : candidates) {
Matcher fileMatcher = filterPattern.matcher(file.getName());
if (fileMatcher.find()) {
  filteredFiles.add(file);
}
  }
  if (!exclusion) {
return filteredFiles;
  } else {
candidates.removeAll(filteredFiles);
return candidates;
  }
{code}
This block could be:
{code}
...
while(candidatesItr.hasNext()) {
  candidate = candidatesItr.next();
  ...
  if ((not match && inclusive) || (match && exclusive)) {
candidatesItr.remove()
  } 
}
{code}

3. [~jianhe] mentioned to me before that the following condition is not always 
true to determine an AM container. Any idea? And it seems that we don't need 
shouldUploadLogsForRunningContainer, we can re-use shouldUploadLogs and set 
wasContainerSuccessful to true. Personally, if it's not trivial to identify the 
AM container, I prefer to write a TODO comment and leave it until we implement 
the log retention API.
{code}
  if (containerId.getId() == 1) {
return true;
  }
{code}

bq. It seems to be, let's validate this via a test-case.

Is it addressed by
{code}
this.conf.setLong(YarnConfiguration.DEBUG_NM_DELETE_DELAY_SEC, 3600);
{code}
Is it better to add a line of comment of the rationale behind the config?

5. Can the following code
{code}
Set finishedContainers = new HashSet();
for (ContainerId id : pendingContainerInThisCycle) {
  finishedContainers.add(id);
}
{code}
be simplified as
{code}
 Set finishedContainers = new 
HashSet(pendingContainerInThisCycle);
{code}

> Log handling for LRS
> 
>
> Key: YARN-2468
> URL: https://issues.apache.org/jira/browse/YARN-2468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
> YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
> YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
> YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
> YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, 
> YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, 
> YARN-2468.9.1.patch, YARN-2468.9.patch
>
>
> Currently, when application is finished, NM will start to do the log 
> aggregation. But for Long running service applications, this is not ideal. 
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a 
> single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2254) TestRMWebServicesAppsModification should run against both CS and FS

2014-10-01 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156104#comment-14156104
 ] 

zhihai xu commented on YARN-2254:
-

The release audit warning is not related to my change.
{code}
 !? 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/hadoop-hdfs-project/hadoop-hdfs/.gitattributes
Lines that start with ? in the release audit report indicate files that do 
not have an Apache license header.
{code}

> TestRMWebServicesAppsModification should run against both CS and FS
> ---
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
> YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156101#comment-14156101
 ] 

Hadoop QA commented on YARN-2562:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672497/YARN-2562.5.patch
  against trunk revision 9e40de6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5232//console

This message is automatically generated.

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
> YARN-2562.4.patch, YARN-2562.5.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2562:
-
Attachment: YARN-2562.5.patch

Rebased on trunk.

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
> YARN-2562.4.patch, YARN-2562.5.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-01 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156079#comment-14156079
 ] 

Anubhav Dhoot commented on YARN-2624:
-

The fix addresses the scenario moving from  pre node manager recovery to 
turning on node manager recovery. As per  YARN-1338 the directories are not 
cleaned up inorder to preserve running containers. But uniqueNumberGenerator 
will not know about preexisting directories which were normally deleted on NM 
startup and are unknown to recovery enabled NM. In this case we still want 
directory cleanup to happen.

> Resource Localization fails on a cluster due to existing cache directories
> --
>
> Key: YARN-2624
> URL: https://issues.apache.org/jira/browse/YARN-2624
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
> Attachments: YARN-2624.001.patch, YARN-2624.001.patch
>
>
> We have found resource localization fails on a cluster with following error 
> in certain cases.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Failed to download rsrc { { 
> hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
>  1412027745352, FILE, null 
> },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
> java.io.IOException: Rename cannot overwrite non empty destination directory 
> /data/yarn/nm/filecache/27
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
>   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
>   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-01 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156073#comment-14156073
 ] 

Anubhav Dhoot commented on YARN-2624:
-

Failure seems unrelated to changes and does not repro locally

> Resource Localization fails on a cluster due to existing cache directories
> --
>
> Key: YARN-2624
> URL: https://issues.apache.org/jira/browse/YARN-2624
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
> Attachments: YARN-2624.001.patch, YARN-2624.001.patch
>
>
> We have found resource localization fails on a cluster with following error 
> in certain cases.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Failed to download rsrc { { 
> hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
>  1412027745352, FILE, null 
> },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
> java.io.IOException: Rename cannot overwrite non empty destination directory 
> /data/yarn/nm/filecache/27
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
>   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
>   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156070#comment-14156070
 ] 

Hadoop QA commented on YARN-2562:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672494/YARN-2562.4.patch
  against trunk revision 9e40de6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5231//console

This message is automatically generated.

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
> YARN-2562.4.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2562:
-
Attachment: YARN-2562.4.patch

[~jianhe], good catch. Fixed the comment.

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
> YARN-2562.4.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs

2014-10-01 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156020#comment-14156020
 ] 

Sandy Ryza commented on YARN-1414:
--

[~jrottinghuis] I will take a look. [~l201514] mind rebasing so that the patch 
will apply?

> with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
> -
>
> Key: YARN-1414
> URL: https://issues.apache.org/jira/browse/YARN-1414
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
> Fix For: 2.2.0
>
> Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2638) Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156019#comment-14156019
 ] 

Hadoop QA commented on YARN-2638:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672476/YARN-2638-1.patch
  against trunk revision 9e40de6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5230//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5230//console

This message is automatically generated.

> Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)
> --
>
> Key: YARN-2638
> URL: https://issues.apache.org/jira/browse/YARN-2638
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2638-1.patch
>
>
> TestRM fails when using FairScheduler or FifoScheduler. The failures not 
> shown in trunk as the trunk uses the default capacity scheduler. We need to 
> let TestRM run with all types of schedulers, to make sure any new change 
> wouldn't break any scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-10-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156000#comment-14156000
 ] 

Jian He commented on YARN-2562:
---

thanks for updating, one minor thing:
- container_e*epoch*_\*clusterTimestamp*_\*attemptId*_\*appId*_\*containerId*, 
it should be appId followed by attemptId


> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155992#comment-14155992
 ] 

Hadoop QA commented on YARN-2617:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672474/YARN-2617.6.patch
  against trunk revision 9e40de6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5229//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5229//console

This message is automatically generated.

> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch, 
> YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2638) Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)

2014-10-01 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2638:
--
Attachment: YARN-2638-1.patch

> Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)
> --
>
> Key: YARN-2638
> URL: https://issues.apache.org/jira/browse/YARN-2638
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2638-1.patch
>
>
> TestRM fails when using FairScheduler or FifoScheduler. The failures not 
> shown in trunk as the trunk uses the default capacity scheduler. We need to 
> let TestRM run with all types of schedulers, to make sure any new change 
> wouldn't break any scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-01 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155960#comment-14155960
 ] 

Jun Gong commented on YARN-2617:


It seems that there is something wrong with Jenkins. 

>From the console output 
>https://builds.apache.org/job/PreCommit-YARN-Build/5227//console, it seems to 
>apply a wrong patch.

Going to apply patch with: /usr/bin/patch -p0
patching file 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java


> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch, 
> YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-01 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2617:
--
Attachment: YARN-2617.6.patch

> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch, 
> YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free

2014-10-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155954#comment-14155954
 ] 

Jian He commented on YARN-2628:
---

looks good, +1

> Capacity scheduler with DominantResourceCalculator carries out reservation 
> even though slots are free
> -
>
> Key: YARN-2628
> URL: https://issues.apache.org/jira/browse/YARN-2628
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.5.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2628.0.patch, apache-yarn-2628.1.patch
>
>
> We've noticed that if you run the CapacityScheduler with the 
> DominantResourceCalculator, sometimes apps will end up with containers in a 
> reserved state even though free slots are available.
> The root cause seems to be this piece of code from CapacityScheduler.java -
> {noformat}
> // Try to schedule more if there are no reservations to fulfill
> if (node.getReservedContainer() == null) {
>   if (Resources.greaterThanOrEqual(calculator, getClusterResource(),
>   node.getAvailableResource(), minimumAllocation)) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Trying to schedule on node: " + node.getNodeName() +
>   ", available: " + node.getAvailableResource());
> }
> root.assignContainers(clusterResource, node, false);
>   }
> } else {
>   LOG.info("Skipping scheduling since node " + node.getNodeID() + 
>   " is reserved by application " + 
>   
> node.getReservedContainer().getContainerId().getApplicationAttemptId()
>   );
> }
> {noformat}
> The code is meant to check if a node has any slots available for containers . 
> Since it uses the greaterThanOrEqual function, we end up in situation where 
> greaterThanOrEqual returns true, even though we may not have enough CPU or 
> memory to actually run the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155949#comment-14155949
 ] 

Tsuyoshi OZAWA commented on YARN-2562:
--

[~jianhe], [~vinodkv], could you check the latest patch?

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155948#comment-14155948
 ] 

Hadoop QA commented on YARN-2562:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672462/YARN-2562.3.patch
  against trunk revision 0708827.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5226//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5226//console

This message is automatically generated.

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2639) TestClientToAMTokens should run with all types of schedulers

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155946#comment-14155946
 ] 

Hadoop QA commented on YARN-2639:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672470/YARN-2639-1.patch
  against trunk revision 9e40de6.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5228//console

This message is automatically generated.

> TestClientToAMTokens should run with all types of schedulers
> 
>
> Key: YARN-2639
> URL: https://issues.apache.org/jira/browse/YARN-2639
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2639-1.patch
>
>
> TestClientToAMTokens fails with FairScheduler now. We should let 
> TestClientToAMTokens run with all kinds of schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155941#comment-14155941
 ] 

Hadoop QA commented on YARN-2617:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672468/YARN-2617.5.patch
  against trunk revision 9e40de6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5227//console

This message is automatically generated.

> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2639) TestClientToAMTokens should run with all types of schedulers

2014-10-01 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2639:
--
Attachment: YARN-2639-1.patch

> TestClientToAMTokens should run with all types of schedulers
> 
>
> Key: YARN-2639
> URL: https://issues.apache.org/jira/browse/YARN-2639
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2639-1.patch
>
>
> TestClientToAMTokens fails with FairScheduler now. We should let 
> TestClientToAMTokens run with all kinds of schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2446) Using TimelineNamespace to shield the entities of a user

2014-10-01 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155930#comment-14155930
 ] 

Vinod Kumar Vavilapalli commented on YARN-2446:
---

Merged this into branch-2.6 also.

> Using TimelineNamespace to shield the entities of a user
> 
>
> Key: YARN-2446
> URL: https://issues.apache.org/jira/browse/YARN-2446
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2446.1.patch, YARN-2446.2.patch, YARN-2446.3.patch
>
>
> Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the 
> entities, preventing them from being accessed or affected by other users' 
> operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2639) TestClientToAMTokens should run with all types of schedulers

2014-10-01 Thread Wei Yan (JIRA)
Wei Yan created YARN-2639:
-

 Summary: TestClientToAMTokens should run with all types of 
schedulers
 Key: YARN-2639
 URL: https://issues.apache.org/jira/browse/YARN-2639
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan


TestClientToAMTokens fails with FairScheduler now. We should let 
TestClientToAMTokens run with all kinds of schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2446) Using TimelineNamespace to shield the entities of a user

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155926#comment-14155926
 ] 

Hudson commented on YARN-2446:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6173 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6173/])
YARN-2446. Augmented Timeline service APIs to start taking in domains as a 
parameter while posting entities and events. Contributed by Zhijie Shen. 
(vinodkv: rev 9e40de6af7959ac7bb5f4e4d2833ca14ea457614)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/security/TestTimelineACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/timeline/TestTimelineRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineDataManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelinePutResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineEntity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TimelineStoreTestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java


> Using TimelineNamespace to shield the entities of a user
> 
>
> Key: YARN-2446
> URL: https://issues.apache.org/jira/browse/YARN-2446
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2446.1.patch, YARN-2446.2.patch, YARN-2446.3.patch
>
>
> Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the 
> entities, preventing them from being accessed or affected by other users' 
> operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-01 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2617:
--
Attachment: YARN-2617.5.patch

same patch uploaded again 

> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155913#comment-14155913
 ] 

Hadoop QA commented on YARN-2624:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672461/YARN-2624.001.patch
  against trunk revision 0708827.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5224//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5224//console

This message is automatically generated.

> Resource Localization fails on a cluster due to existing cache directories
> --
>
> Key: YARN-2624
> URL: https://issues.apache.org/jira/browse/YARN-2624
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
> Attachments: YARN-2624.001.patch, YARN-2624.001.patch
>
>
> We have found resource localization fails on a cluster with following error 
> in certain cases.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Failed to download rsrc { { 
> hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
>  1412027745352, FILE, null 
> },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
> java.io.IOException: Rename cannot overwrite non empty destination directory 
> /data/yarn/nm/filecache/27
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
>   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
>   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155912#comment-14155912
 ] 

Hadoop QA commented on YARN-2617:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672458/YARN-2617.5.patch
  against trunk revision 0708827.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5225//console

This message is automatically generated.

> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2562:
-
Attachment: YARN-2562.3.patch

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2638) Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)

2014-10-01 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2638:
--
Description: TestRM fails when using FairScheduler or FifoScheduler. The 
failures not shown in trunk as the trunk uses the default capacity scheduler. 
We need to let TestRM run with all types of schedulers, to make sure any new 
change wouldn't break any scheduler.  (was: TestRM fails when using 
FairScheduler. The failures not shown in trunk as the trunk uses the default 
capacity scheduler. We need to let TestRM run with all types of schedulers, to 
make sure any new change wouldn't break any scheduler.)

> Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)
> --
>
> Key: YARN-2638
> URL: https://issues.apache.org/jira/browse/YARN-2638
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
>
> TestRM fails when using FairScheduler or FifoScheduler. The failures not 
> shown in trunk as the trunk uses the default capacity scheduler. We need to 
> let TestRM run with all types of schedulers, to make sure any new change 
> wouldn't break any scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2446) Using TimelineNamespace to shield the entities of a user

2014-10-01 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155898#comment-14155898
 ] 

Vinod Kumar Vavilapalli commented on YARN-2446:
---

This looks good, +1. Checking this in.

> Using TimelineNamespace to shield the entities of a user
> 
>
> Key: YARN-2446
> URL: https://issues.apache.org/jira/browse/YARN-2446
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2446.1.patch, YARN-2446.2.patch, YARN-2446.3.patch
>
>
> Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the 
> entities, preventing them from being accessed or affected by other users' 
> operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart fails with FairScheduler

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155893#comment-14155893
 ] 

Hadoop QA commented on YARN-2635:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672460/YARN-2635-1.patch
  against trunk revision 0708827.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5223//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5223//console

This message is automatically generated.

> TestRMRestart fails with FairScheduler
> --
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155880#comment-14155880
 ] 

Hadoop QA commented on YARN-2617:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672458/YARN-2617.5.patch
  against trunk revision 0708827.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5222//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5222//console

This message is automatically generated.

> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155868#comment-14155868
 ] 

Tsuyoshi OZAWA commented on YARN-2562:
--

My concern at first was that AppMaster won't work with new RM because of the 
change of containerId's format. However, we can change it since the protocol 
between AM and RM is changed and old AppMaster won't work in any case. Then 
it's better to use the format Vinod mentioned at first. Updating a patch soon.

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2638) Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)

2014-10-01 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2638:
--
Description: TestRM fails when using FairScheduler. The failures not shown 
in trunk as the trunk uses the default capacity scheduler. We need to let 
TestRM run with all types of schedulers, to make sure any new change wouldn't 
break any scheduler.

> Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)
> --
>
> Key: YARN-2638
> URL: https://issues.apache.org/jira/browse/YARN-2638
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
>
> TestRM fails when using FairScheduler. The failures not shown in trunk as the 
> trunk uses the default capacity scheduler. We need to let TestRM run with all 
> types of schedulers, to make sure any new change wouldn't break any scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2638) Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)

2014-10-01 Thread Wei Yan (JIRA)
Wei Yan created YARN-2638:
-

 Summary: Let TestRM run with all types of schedulers (FIFO, 
Capacity, Fair)
 Key: YARN-2638
 URL: https://issues.apache.org/jira/browse/YARN-2638
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-01 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2624:

Attachment: YARN-2624.001.patch

No apparent failure in jenkins output. Uploading it again

> Resource Localization fails on a cluster due to existing cache directories
> --
>
> Key: YARN-2624
> URL: https://issues.apache.org/jira/browse/YARN-2624
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
> Attachments: YARN-2624.001.patch, YARN-2624.001.patch
>
>
> We have found resource localization fails on a cluster with following error 
> in certain cases.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Failed to download rsrc { { 
> hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
>  1412027745352, FILE, null 
> },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
> java.io.IOException: Rename cannot overwrite non empty destination directory 
> /data/yarn/nm/filecache/27
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
>   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
>   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor

2014-10-01 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155842#comment-14155842
 ] 

Vinod Kumar Vavilapalli commented on YARN-1972:
---

BTW, the new test TestContainerExecutor from YARN-443 was originally missed in 
branch-2, I committed it here.

> Implement secure Windows Container Executor
> ---
>
> Key: YARN-1972
> URL: https://issues.apache.org/jira/browse/YARN-1972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Fix For: 2.6.0
>
> Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, 
> YARN-1972.delta.4.patch, YARN-1972.delta.5-branch-2.patch, 
> YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch
>
>
> h1. Windows Secure Container Executor (WCE)
> YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
> user as a solution for the problem of having a security boundary between 
> processes executed in YARN containers and the Hadoop services. The WCE is a 
> container executor that leverages the winutils capabilities introduced in 
> YARN-1063 and launches containers as an OS process running as the job 
> submitter user. A description of the S4U infrastructure used by YARN-1063 
> alternatives considered can be read on that JIRA.
> The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
> drive the flow of execution, but it overwrrides some emthods to the effect of:
> * change the DCE created user cache directories to be owned by the job user 
> and by the nodemanager group.
> * changes the actual container run command to use the 'createAsUser' command 
> of winutils task instead of 'create'
> * runs the localization as standalone process instead of an in-process Java 
> method call. This in turn relies on the winutil createAsUser feature to run 
> the localization as the job user.
>  
> When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
> differences:
> * it does no delegate the creation of the user cache directories to the 
> native implementation.
> * it does no require special handling to be able to delete user files
> The approach on the WCE came from a practical trial-and-error approach. I had 
> to iron out some issues around the Windows script shell limitations (command 
> line length) to get it to work, the biggest issue being the huge CLASSPATH 
> that is commonplace in Hadoop environment container executions. The job 
> container itself is already dealing with this via a so called 'classpath 
> jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
> as a separate container the same issue had to be resolved and I used the same 
> 'classpath jar' approach.
> h2. Deployment Requirements
> To use the WCE one needs to set the 
> `yarn.nodemanager.container-executor.class` to 
> `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` 
> and set the `yarn.nodemanager.windows-secure-container-executor.group` to a 
> Windows security group name that is the nodemanager service principal is a 
> member of (equivalent of LCE 
> `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE 
> does not require any configuration outside of the Hadoop own's yar-site.xml.
> For WCE to work the nodemanager must run as a service principal that is 
> member of the local Administrators group or LocalSystem. this is derived from 
> the need to invoke LoadUserProfile API which mention these requirements in 
> the specifications. This is in addition to the SE_TCB privilege mentioned in 
> YARN-1063, but this requirement will automatically imply that the SE_TCB 
> privilege is held by the nodemanager. For the Linux speakers in the audience, 
> the requirement is basically to run NM as root.
> h2. Dedicated high privilege Service
> Due to the high privilege required by the WCE we had discussed the need to 
> isolate the high privilege operations into a separate process, an 'executor' 
> service that is solely responsible to start the containers (incloding the 
> localizer). The NM would have to authenticate, authorize and communicate with 
> this service via an IPC mechanism and use this service to launch the 
> containers. I still believe we'll end up deploying such a service, but the 
> effort to onboard such a new platfrom specific new service on the project are 
> not trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-443) allow OS scheduling priority of NM to be different than the containers it launches

2014-10-01 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155840#comment-14155840
 ] 

Vinod Kumar Vavilapalli commented on YARN-443:
--

The new test TestContainerExecutor was missed in branch-2, I committed it as 
part of YARN-1972.

> allow OS scheduling priority of NM to be different than the containers it 
> launches
> --
>
> Key: YARN-443
> URL: https://issues.apache.org/jira/browse/YARN-443
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Fix For: 0.23.7, 2.0.4-alpha
>
> Attachments: YARN-443-branch-0.23.patch, YARN-443-branch-0.23.patch, 
> YARN-443-branch-0.23.patch, YARN-443-branch-0.23.patch, 
> YARN-443-branch-2.patch, YARN-443-branch-2.patch, YARN-443-branch-2.patch, 
> YARN-443.patch, YARN-443.patch, YARN-443.patch, YARN-443.patch, 
> YARN-443.patch, YARN-443.patch, YARN-443.patch
>
>
> It would be nice if we could have the nodemanager run at a different OS 
> scheduling priority than the containers so that you can still communicate 
> with the nodemanager if the containers out of control.  
> On linux we could launch the nodemanager at a higher priority, but then all 
> the containers it launches would also be at that higher priority, so we need 
> a way for the container executor to launch them at a lower priority.
> I'm not sure how this applies to windows if at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2635) TestRMRestart fails with FairScheduler

2014-10-01 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2635:
--
Attachment: YARN-2635-1.patch

Post a patch which let TestRMRestart run with all types of schedulers, and fix 
the failures related to FS.

> TestRMRestart fails with FairScheduler
> --
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-10-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155828#comment-14155828
 ] 

Jian He commented on YARN-2562:
---

bq. A number at the end for me always pointed to the container-id
I think this is a point. and logically, epochId precedes applicationId, 
[~ozawa], your opinion?

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1391) Lost node list should be identify by NodeId

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155821#comment-14155821
 ] 

Hadoop QA commented on YARN-1391:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672452/YARN-1391.v2.patch
  against trunk revision 8dfe54f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5220//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5220//console

This message is automatically generated.

> Lost node list should be identify by NodeId
> ---
>
> Key: YARN-1391
> URL: https://issues.apache.org/jira/browse/YARN-1391
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-1391.v1.patch, YARN-1391.v2.patch
>
>
> in case of multiple node managers on a single machine. each of them should be 
> identified by NodeId, which is more unique than just host name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155820#comment-14155820
 ] 

Hadoop QA commented on YARN-2624:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672456/YARN-2624.001.patch
  against trunk revision 8dfe54f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5221//console

This message is automatically generated.

> Resource Localization fails on a cluster due to existing cache directories
> --
>
> Key: YARN-2624
> URL: https://issues.apache.org/jira/browse/YARN-2624
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
> Attachments: YARN-2624.001.patch
>
>
> We have found resource localization fails on a cluster with following error 
> in certain cases.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Failed to download rsrc { { 
> hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
>  1412027745352, FILE, null 
> },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
> java.io.IOException: Rename cannot overwrite non empty destination directory 
> /data/yarn/nm/filecache/27
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
>   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
>   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-01 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2617:
--
Attachment: YARN-2617.5.patch

Upload same patch, not sure why jenkins report eclipse failure

> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2613) NMClient doesn't have retries for supporting rolling-upgrades

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155817#comment-14155817
 ] 

Hudson commented on YARN-2613:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6172 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6172/])
YARN-2613. Support retry in NMClient for rolling-upgrades. (Contributed by Jian 
He) (junping_du: rev 0708827a935d190d439854e08bb5a655d7daa606)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/NMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/ContainerManagementProtocolProxy.java


> NMClient doesn't have retries for supporting rolling-upgrades
> -
>
> Key: YARN-2613
> URL: https://issues.apache.org/jira/browse/YARN-2613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2613.1.patch, YARN-2613.2.patch, YARN-2613.3.patch
>
>
> While NM is rolling upgrade, client should retry NM until it comes up. This 
> jira is to add a NMProxy (similar to RMProxy) with retry implementation to 
> support rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2591) AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data

2014-10-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155810#comment-14155810
 ] 

Jian He commented on YARN-2591:
---

looked at the patch, maybe create a new exception type, instead of catching the 
exception msg ?

> AHSWebServices should return FORBIDDEN(403) if the request user doesn't have 
> access to the history data
> ---
>
> Key: YARN-2591
> URL: https://issues.apache.org/jira/browse/YARN-2591
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 3.0.0, 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2591.1.patch
>
>
> AHSWebServices should return FORBIDDEN(403) if the request user doesn't have 
> access to the history data. Currently, it is going to return 
> INTERNAL_SERVER_ERROR(500).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155806#comment-14155806
 ] 

Hadoop QA commented on YARN-2628:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12672437/apache-yarn-2628.1.patch
  against trunk revision 52bbe0f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5214//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5214//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5214//console

This message is automatically generated.

> Capacity scheduler with DominantResourceCalculator carries out reservation 
> even though slots are free
> -
>
> Key: YARN-2628
> URL: https://issues.apache.org/jira/browse/YARN-2628
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.5.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2628.0.patch, apache-yarn-2628.1.patch
>
>
> We've noticed that if you run the CapacityScheduler with the 
> DominantResourceCalculator, sometimes apps will end up with containers in a 
> reserved state even though free slots are available.
> The root cause seems to be this piece of code from CapacityScheduler.java -
> {noformat}
> // Try to schedule more if there are no reservations to fulfill
> if (node.getReservedContainer() == null) {
>   if (Resources.greaterThanOrEqual(calculator, getClusterResource(),
>   node.getAvailableResource(), minimumAllocation)) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Trying to schedule on node: " + node.getNodeName() +
>   ", available: " + node.getAvailableResource());
> }
> root.assignContainers(clusterResource, node, false);
>   }
> } else {
>   LOG.info("Skipping scheduling since node " + node.getNodeID() + 
>   " is reserved by application " + 
>   
> node.getReservedContainer().getContainerId().getApplicationAttemptId()
>   );
> }
> {noformat}
> The code is meant to check if a node has any slots available for containers . 
> Since it uses the greaterThanOrEqual function, we end up in situation where 
> greaterThanOrEqual returns true, even though we may not have enough CPU or 
> memory to actually run the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-01 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2624:

Attachment: YARN-2624.001.patch

Attaching a patch that cleans up the local resource cache directories when the 
statestore is built up first time. That would take care of cleanup of leftover 
directories when moving from non-work preserving to work preserving in most 
cases. There can still be failures in NM in between creating state and running 
the cleanup.

> Resource Localization fails on a cluster due to existing cache directories
> --
>
> Key: YARN-2624
> URL: https://issues.apache.org/jira/browse/YARN-2624
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
> Attachments: YARN-2624.001.patch
>
>
> We have found resource localization fails on a cluster with following error 
> in certain cases.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Failed to download rsrc { { 
> hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
>  1412027745352, FILE, null 
> },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
> java.io.IOException: Rename cannot overwrite non empty destination directory 
> /data/yarn/nm/filecache/27
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
>   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
>   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155766#comment-14155766
 ] 

Hadoop QA commented on YARN-2617:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672391/YARN-2617.5.patch
  against trunk revision 52bbe0f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5218//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5218//console

This message is automatically generated.

> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.5.patch, YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-01 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155764#comment-14155764
 ] 

Craig Welch commented on YARN-1198:
---

[~john.jian.fang] I look a look at implementing the change with the tweaked .7 
approach per your suggestion above and it seemed to just be trading some 
complexities for others, so I set it aside and I think the current .7 approach 
is as good as any.  I uploaded a .10 patch which is the .7 fixed to apply 
cleanly to current trunk (.7 no longer quite does for me).  I took a look at 
incorporating [YARN-1857] into this change but chose not to, as I think they 
should be committed independently.  The .10 (.7) patch factors the change for 
[YARN-1857] up into a different method, getHeadroom(), if you replace it with 
the below:


{code} 
private Resource getHeadroom(User user, Resource queueMaxCap,
  Resource clusterResource, Resource userLimit) {
Resource headroom = 
  Resources.min(resourceCalculator, clusterResource,
Resources.subtract(
Resources.min(resourceCalculator, clusterResource, 
userLimit, queueMaxCap), 
user.getConsumedResources()),
Resources.subtract(queueMaxCap, usedResources));
return headroom;
  }
{code}
  
then you should have the combined logic.  Note, the LeafQueue tests will then 
not all pass, I believe because results changed when that patch was applied - 
I've not before tried the two in combination, assuming we would apply one at a 
time, and then address the impact on the other.

> Capacity Scheduler headroom calculation does not work as expected
> -
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Craig Welch
> Attachments: YARN-1198.1.patch, YARN-1198.10.patch, 
> YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, 
> YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for 
> this calculation
> * If a container finishes then headroom for that application will change and 
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the 
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM 
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then 
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom 
> per User per LeafQueue so that everyone gets the same picture (apps belonging 
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications 
> submitted by all users in that queue should be notified of the headroom 
> change.
> * Also today headroom is an absolute number ( I think it should be normalized 
> but then this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-01 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155767#comment-14155767
 ] 

Craig Welch commented on YARN-1198:
---

The Jenkins failures do not actually seem to have anything to do with the 
patch, the output is complaining about being behind trunk...

> Capacity Scheduler headroom calculation does not work as expected
> -
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Craig Welch
> Attachments: YARN-1198.1.patch, YARN-1198.10.patch, 
> YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, 
> YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for 
> this calculation
> * If a container finishes then headroom for that application will change and 
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the 
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM 
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then 
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom 
> per User per LeafQueue so that everyone gets the same picture (apps belonging 
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications 
> submitted by all users in that queue should be notified of the headroom 
> change.
> * Also today headroom is an absolute number ( I think it should be normalized 
> but then this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1391) Lost node list should be identify by NodeId

2014-10-01 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-1391:
--
Attachment: YARN-1391.v2.patch

> Lost node list should be identify by NodeId
> ---
>
> Key: YARN-1391
> URL: https://issues.apache.org/jira/browse/YARN-1391
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-1391.v1.patch, YARN-1391.v2.patch
>
>
> in case of multiple node managers on a single machine. each of them should be 
> identified by NodeId, which is more unique than just host name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155757#comment-14155757
 ] 

Hadoop QA commented on YARN-1198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672450/YARN-1198.10.patch
  against trunk revision 8dfe54f.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5219//console

This message is automatically generated.

> Capacity Scheduler headroom calculation does not work as expected
> -
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Craig Welch
> Attachments: YARN-1198.1.patch, YARN-1198.10.patch, 
> YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, 
> YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for 
> this calculation
> * If a container finishes then headroom for that application will change and 
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the 
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM 
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then 
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom 
> per User per LeafQueue so that everyone gets the same picture (apps belonging 
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications 
> submitted by all users in that queue should be notified of the headroom 
> change.
> * Also today headroom is an absolute number ( I think it should be normalized 
> but then this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-01 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1198:
--
Attachment: YARN-1198.10.patch

And again, this time, with the additional files, .10 (nee .9, .7)

> Capacity Scheduler headroom calculation does not work as expected
> -
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Craig Welch
> Attachments: YARN-1198.1.patch, YARN-1198.10.patch, 
> YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, 
> YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for 
> this calculation
> * If a container finishes then headroom for that application will change and 
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the 
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM 
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then 
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom 
> per User per LeafQueue so that everyone gets the same picture (apps belonging 
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications 
> submitted by all users in that queue should be notified of the headroom 
> change.
> * Also today headroom is an absolute number ( I think it should be normalized 
> but then this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-10-01 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155747#comment-14155747
 ] 

Varun Vasudev commented on YARN-90:
---

The release audit warning is unrelated to the patch.

> NodeManager should identify failed disks becoming good back again
> -
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ravi Gummadi
>Assignee: Varun Vasudev
> Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
> apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch, 
> apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch, 
> apache-yarn-90.8.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager

2014-10-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155744#comment-14155744
 ] 

Zhijie Shen commented on YARN-2527:
---

Thanks for be patient about the comment. How about doing the following?
{code}
AccessControlList applicationACLInMap = acls.get(applicationAccessType);
if (applicationACLInMap) {
 applicationACL = applicationACLInMap;
else { 
 if (LOG.isDebugEnabled()) {
LOG.debug("ACL not found for access-type " + applicationAccessType
+ " for application " + applicationId + " owned by "
+ applicationOwner + ". Using default ["
+ YarnConfiguration.DEFAULT_YARN_APP_ACL + "]");
 }
 applicationACL = DEFAULT_YARN_APP_ACL;
}
{code}

> NPE in ApplicationACLsManager
> -
>
> Key: YARN-2527
> URL: https://issues.apache.org/jira/browse/YARN-2527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: YARN-2527.patch, YARN-2527.patch
>
>
> NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
> The relevant stacktrace snippet from the ResourceManager logs is as below
> {code}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {code}
> This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155745#comment-14155745
 ] 

Hadoop QA commented on YARN-90:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12672436/apache-yarn-90.8.patch
  against trunk revision dd1b8f2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5213//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5213//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5213//console

This message is automatically generated.

> NodeManager should identify failed disks becoming good back again
> -
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ravi Gummadi
>Assignee: Varun Vasudev
> Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
> apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch, 
> apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch, 
> apache-yarn-90.8.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1714) Per user and per queue view in YARN RM

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155735#comment-14155735
 ] 

Hadoop QA commented on YARN-1714:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628368/YARN-1714.v3.patch
  against trunk revision 52bbe0f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5215//console

This message is automatically generated.

> Per user and per queue view in YARN RM
> --
>
> Key: YARN-1714
> URL: https://issues.apache.org/jira/browse/YARN-1714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-1714.v1.patch, YARN-1714.v2.patch, 
> YARN-1714.v3.patch
>
>
> ResourceManager exposes either one or all jobs via WebUI. It would be good to 
> have filter for user so that see only their jobs.
> Provide rest style url to access only user specified queue or user apps. 
> For instance,
> http://hadoop-example.com:50030/cluster/user/toto 
> displays apps owned by toto
> http://hadoop-example.com:50030/cluster/user/toto,glinda  
> displays apps owned by toto and glinda
> http://hadoop-example.com:50030/cluster/queue/root.queue1 
>displays apps in root.queue1
> http://hadoop-example.com:50030/cluster/queue/root.queue1,root.queue2   
> displays apps in root.queue1 and  root.queue2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155727#comment-14155727
 ] 

Hadoop QA commented on YARN-1198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672445/YARN-1198.9.patch
  against trunk revision 52bbe0f.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5217//console

This message is automatically generated.

> Capacity Scheduler headroom calculation does not work as expected
> -
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Craig Welch
> Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, 
> YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, 
> YARN-1198.8.patch, YARN-1198.9.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for 
> this calculation
> * If a container finishes then headroom for that application will change and 
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the 
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM 
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then 
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom 
> per User per LeafQueue so that everyone gets the same picture (apps belonging 
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications 
> submitted by all users in that queue should be notified of the headroom 
> change.
> * Also today headroom is an absolute number ( I think it should be normalized 
> but then this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-01 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1198:
--
Attachment: YARN-1198.9.patch

Updated version of .7 patch to current trunk (as .7 now fails to fully apply)

> Capacity Scheduler headroom calculation does not work as expected
> -
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Craig Welch
> Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, 
> YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, 
> YARN-1198.8.patch, YARN-1198.9.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for 
> this calculation
> * If a container finishes then headroom for that application will change and 
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the 
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM 
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then 
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom 
> per User per LeafQueue so that everyone gets the same picture (apps belonging 
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications 
> submitted by all users in that queue should be notified of the headroom 
> change.
> * Also today headroom is an absolute number ( I think it should be normalized 
> but then this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155715#comment-14155715
 ] 

Hadoop QA commented on YARN-2617:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672391/YARN-2617.5.patch
  against trunk revision dd1b8f2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5212//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5212//console

This message is automatically generated.

> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.5.patch, YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2254) TestRMWebServicesAppsModification should run against both Capacity and FairSchedulers

2014-10-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2254:
---
Summary: TestRMWebServicesAppsModification should run against both Capacity 
and FairSchedulers  (was: change TestRMWebServicesAppsModification to support 
FairScheduler.)

> TestRMWebServicesAppsModification should run against both Capacity and 
> FairSchedulers
> -
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
> YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2254) TestRMWebServicesAppsModification should run against both CS and FS

2014-10-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2254:
---
Summary: TestRMWebServicesAppsModification should run against both CS and 
FS  (was: TestRMWebServicesAppsModification should run against both Capacity 
and FairSchedulers)

> TestRMWebServicesAppsModification should run against both CS and FS
> ---
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
> YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155708#comment-14155708
 ] 

Hadoop QA commented on YARN-2312:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672397/YARN-2312.2-3.patch
  against trunk revision 875aa79.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 16 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapred.TestMiniMRBringup
  org.apache.hadoop.mapred.TestClusterMapReduceTestCase
  org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
  org.apache.hadoop.mapred.pipes.TestPipeApplication

  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5207//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5207//artifact/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5207//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5207//console

This message is automatically generated.

> Marking ContainerId#getId as deprecated
> ---
>
> Key: YARN-2312
> URL: https://issues.apache.org/jira/browse/YARN-2312
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, 
> YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch
>
>
> {{ContainerId#getId}} will only return partial value of containerId, only 
> sequence number of container id without epoch, after YARN-2229. We should 
> mark {{ContainerId#getId}} as deprecated and use 
> {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155707#comment-14155707
 ] 

Hadoop QA commented on YARN-1879:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672418/YARN-1879.18.patch
  against trunk revision 875aa79.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5210//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5210//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5210//console

This message is automatically generated.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.2-wip.patch, 
> YARN-1879.2.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, 
> YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1391) Lost node list should be identify by NodeId

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155705#comment-14155705
 ] 

Hadoop QA commented on YARN-1391:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618147/YARN-1391.v1.patch
  against trunk revision 52bbe0f.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5216//console

This message is automatically generated.

> Lost node list should be identify by NodeId
> ---
>
> Key: YARN-1391
> URL: https://issues.apache.org/jira/browse/YARN-1391
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-1391.v1.patch
>
>
> in case of multiple node managers on a single machine. each of them should be 
> identified by NodeId, which is more unique than just host name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1714) Per user and per queue view in YARN RM

2014-10-01 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155700#comment-14155700
 ] 

Siqi Li commented on YARN-1714:
---

I am all for making the webUI more interactive. And I checked 
RMWebServices#getApps, it will give a different format of applications and 
statuses. You can't drill in the apps and see how everything is going. However, 
some users want the same UI as in the RM, so this patch will provide them with 
a simple url that displays only their job or queues.

> Per user and per queue view in YARN RM
> --
>
> Key: YARN-1714
> URL: https://issues.apache.org/jira/browse/YARN-1714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-1714.v1.patch, YARN-1714.v2.patch, 
> YARN-1714.v3.patch
>
>
> ResourceManager exposes either one or all jobs via WebUI. It would be good to 
> have filter for user so that see only their jobs.
> Provide rest style url to access only user specified queue or user apps. 
> For instance,
> http://hadoop-example.com:50030/cluster/user/toto 
> displays apps owned by toto
> http://hadoop-example.com:50030/cluster/user/toto,glinda  
> displays apps owned by toto and glinda
> http://hadoop-example.com:50030/cluster/queue/root.queue1 
>displays apps in root.queue1
> http://hadoop-example.com:50030/cluster/queue/root.queue1,root.queue2   
> displays apps in root.queue1 and  root.queue2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1715) Per queue view in RM is not implemented correctly

2014-10-01 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li resolved YARN-1715.
---
Resolution: Duplicate

> Per queue view in RM is not implemented correctly
> -
>
> Key: YARN-1715
> URL: https://issues.apache.org/jira/browse/YARN-1715
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>
> For now, per queue view in YARN RM has not yet implemented.
> in RmController.java it only set page title for per queue page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155702#comment-14155702
 ] 

Hudson commented on YARN-2630:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6170 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6170/])
YARN-2630. Prevented previous AM container status from being acquired by the 
current restarted AM. Contributed by Jian He. (zjshen: rev 
52bbe0f11bc8e97df78a1ab9b63f4eff65fd7a76)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


> TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
> -
>
> Key: YARN-2630
> URL: https://issues.apache.org/jira/browse/YARN-2630
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, 
> YARN-2630.4.patch
>
>
> The problem is that after YARN-1372, in work-preserving AM restart, the 
> re-launched AM will also receive previously failed AM container. But 
> DistributedShell logic is not expecting this extra completed container.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free

2014-10-01 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2628:

Attachment: apache-yarn-2628.1.patch

Uploaded a patch to address [~jianhe]'s comments.

> Capacity scheduler with DominantResourceCalculator carries out reservation 
> even though slots are free
> -
>
> Key: YARN-2628
> URL: https://issues.apache.org/jira/browse/YARN-2628
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.5.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2628.0.patch, apache-yarn-2628.1.patch
>
>
> We've noticed that if you run the CapacityScheduler with the 
> DominantResourceCalculator, sometimes apps will end up with containers in a 
> reserved state even though free slots are available.
> The root cause seems to be this piece of code from CapacityScheduler.java -
> {noformat}
> // Try to schedule more if there are no reservations to fulfill
> if (node.getReservedContainer() == null) {
>   if (Resources.greaterThanOrEqual(calculator, getClusterResource(),
>   node.getAvailableResource(), minimumAllocation)) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Trying to schedule on node: " + node.getNodeName() +
>   ", available: " + node.getAvailableResource());
> }
> root.assignContainers(clusterResource, node, false);
>   }
> } else {
>   LOG.info("Skipping scheduling since node " + node.getNodeID() + 
>   " is reserved by application " + 
>   
> node.getReservedContainer().getContainerId().getApplicationAttemptId()
>   );
> }
> {noformat}
> The code is meant to check if a node has any slots available for containers . 
> Since it uses the greaterThanOrEqual function, we end up in situation where 
> greaterThanOrEqual returns true, even though we may not have enough CPU or 
> memory to actually run the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager

2014-10-01 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155679#comment-14155679
 ] 

Benoy Antony commented on YARN-2527:


Thanks for the review [~zjshen]. 
Do you mean to change as below ?
{code}
 if (acls.get(applicationAccessType) != null) {
  applicationACL = acls.get(applicationAccessType) ; 
else { 
if (LOG.isDebugEnabled()) {
  LOG.debug("ACL not found for access-type " + applicationAccessType
  + " for application " + applicationId + " owned by "
  + applicationOwner + ". Using default ["
  + YarnConfiguration.DEFAULT_YARN_APP_ACL + "]");
}
applicationACL = DEFAULT_YARN_APP_ACL;
  }

{code}
The only downside to suggested approach is that it will involve two lookups in 
_acls_  _HashMap_ whereas the current apprach in the above comment involves 
only one lookup.

> NPE in ApplicationACLsManager
> -
>
> Key: YARN-2527
> URL: https://issues.apache.org/jira/browse/YARN-2527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: YARN-2527.patch, YARN-2527.patch
>
>
> NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
> The relevant stacktrace snippet from the ResourceManager logs is as below
> {code}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {code}
> This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-10-01 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-90:
--
Attachment: apache-yarn-90.8.patch

Thanks for the review [~mingma]!

{quote}
1. What if a dir is transitioned from DISK_FULL state to OTHER state? 
DirectoryCollection.checkDirs doesn't seem to update errorDirs and fullDirs 
properly. We can use some state machine for each dir and make sure each 
transition is covered.
{quote}

Fixed. I've re-written the checkDir function but I haven't used a state 
machine. Can you please review?

{quote}
2. DISK_FULL state is counted toward the error disk threshold by 
LocalDirsHandlerService.areDisksHealthy; later RM could mark NM NODE_UNUSABLE. 
If we believe DISK_FULL is mostly temporary issue, should we consider disks are 
healthy if disks only stay in DISK_FULL for some short period of time?
{quote}

The issue here is that if a disk is full, we can't launch new containers on it. 
If we can't launch containers, the RM should consider the node is unhealthy. 
Once the disk is cleaned up, the RM will assign containers to it.

{quote}
3. In AppLogAggregatorImpl.java, "(Path[]) localAppLogDirs.toArray(new 
Path\[localAppLogDirs.size()]).". It seems the (Path[]) cast isn't necessary.
{quote}

Fixed.

{quote}
4. What is the intention of numFailures? Method getNumFailures isn't used.
{quote}

This is a carry over function - it existed as part of the existing 
implementation.

{quote}
5. Nit: It is better to expand "import java.util.*;" in 
DirectoryCollection.java and LocalDirsHandlerService.java.
{quote}

Fixed.

> NodeManager should identify failed disks becoming good back again
> -
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ravi Gummadi
>Assignee: Varun Vasudev
> Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
> apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch, 
> apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch, 
> apache-yarn-90.8.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155669#comment-14155669
 ] 

Hadoop QA commented on YARN-2254:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672416/YARN-2254.004.patch
  against trunk revision 875aa79.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5209//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5209//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5209//console

This message is automatically generated.

> change TestRMWebServicesAppsModification to support FairScheduler.
> --
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
> YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155664#comment-14155664
 ] 

Hadoop QA commented on YARN-913:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672406/YARN-913-016.patch
  against trunk revision 875aa79.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 36 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1266 javac 
compiler warnings (more than the trunk's current 1265 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell

  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5208//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5208//artifact/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5208//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-registry.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5208//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5208//console

This message is automatically generated.

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
> YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
> YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
> YARN-913-016.patch, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation

2014-10-01 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2637:
-
Summary: maximum-am-resource-percent could be violated when resource of AM 
is > minimumAllocation  (was: maximum-am-resource-percent will be violated when 
resource of AM is > minimumAllocation)

> maximum-am-resource-percent could be violated when resource of AM is > 
> minimumAllocation
> 
>
> Key: YARN-2637
> URL: https://issues.apache.org/jira/browse/YARN-2637
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Wangda Tan
>Priority: Critical
>
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be 
> activated in following way:
> {code}
> for (Iterator i=pendingApplications.iterator(); 
>  i.hasNext(); ) {
>   FiCaSchedulerApp application = i.next();
>   
>   // Check queue limit
>   if (getNumActiveApplications() >= getMaximumActiveApplications()) {
> break;
>   }
>   
>   // Check user limit
>   User user = getUser(application.getUser());
>   if (user.getActiveApplications() < 
> getMaximumActiveApplicationsPerUser()) {
> user.activateApplication();
> activeApplications.add(application);
> i.remove();
> LOG.info("Application " + application.getApplicationId() +
> " from user: " + application.getUser() + 
> " activated in queue: " + getQueueName());
>   }
> }
> {code}
> An example is,
> If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
> resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
> launched is 200, and if user uses 5M for each AM (> minimum_allocation). All 
> apps can still be activated, and it will occupy all resource of a queue 
> instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2637) maximum-am-resource-percent will be violated when resource of AM is > minimumAllocation

2014-10-01 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2637:


 Summary: maximum-am-resource-percent will be violated when 
resource of AM is > minimumAllocation
 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Priority: Critical


Currently, number of AM in leaf queue will be calculated in following way:
{code}
max_am_resource = queue_max_capacity * maximum_am_resource_percent
#max_am_number = max_am_resource / minimum_allocation
#max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
{code}
And when submit new application to RM, it will check if an app can be activated 
in following way:
{code}
for (Iterator i=pendingApplications.iterator(); 
 i.hasNext(); ) {
  FiCaSchedulerApp application = i.next();
  
  // Check queue limit
  if (getNumActiveApplications() >= getMaximumActiveApplications()) {
break;
  }
  
  // Check user limit
  User user = getUser(application.getUser());
  if (user.getActiveApplications() < getMaximumActiveApplicationsPerUser()) 
{
user.activateApplication();
activeApplications.add(application);
i.remove();
LOG.info("Application " + application.getApplicationId() +
" from user: " + application.getUser() + 
" activated in queue: " + getQueueName());
  }
}
{code}

An example is,
If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
launched is 200, and if user uses 5M for each AM (> minimum_allocation). All 
apps can still be activated, and it will occupy all resource of a queue instead 
of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155649#comment-14155649
 ] 

Hadoop QA commented on YARN-1414:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632578/YARN-1221-v2.patch
  against trunk revision dd1b8f2.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5211//console

This message is automatically generated.

> with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
> -
>
> Key: YARN-1414
> URL: https://issues.apache.org/jira/browse/YARN-1414
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
> Fix For: 2.2.0
>
> Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155643#comment-14155643
 ] 

Zhijie Shen commented on YARN-2583:
---

Per discussion offline:

1. In AggregatedLogDeletionService of JHS, we delete the log files of completed 
app, and in AppLogAggregatorImpl of NM, we delete the log files of the running 
LRS. We need to add a test case to verify AggregatedLogDeletionService won't 
delete the running LRS logs. 

2. We apply the same retention policy at both sides, using the time to 
determine what log files need to be deleted.

3. For scalability consideration, let's keep the criteria of the number of logs 
per app, in case the rolling interval is small and too many configuration files 
are generated. But let's keep the config private to AppLogAggregatorImpl.

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs

2014-10-01 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155638#comment-14155638
 ] 

Joep Rottinghuis commented on YARN-1414:


@sandyr could we get some love on this jira ? We're essentially running with a 
forked Fairscheduler and would like to reduce tech-debt each time we uprev to a 
newer version.

> with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
> -
>
> Key: YARN-1414
> URL: https://issues.apache.org/jira/browse/YARN-1414
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
> Fix For: 2.2.0
>
> Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2616) Add CLI client to the registry to list/view entries

2014-10-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155624#comment-14155624
 ] 

Steve Loughran commented on YARN-2616:
--

features of 003 patch
# registry instance created via factory
# uses configuration instance built up on command line (though it is also 
creating a {{YarnConfiguration()}} around that.
# pulls out all exception-to-error-text mapping to single method
# covered the current set of errors
# and also log @ debug if enabled.


> Add CLI client to the registry to list/view entries
> ---
>
> Key: YARN-2616
> URL: https://issues.apache.org/jira/browse/YARN-2616
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Reporter: Steve Loughran
>Assignee: Akshay Radia
> Attachments: YARN-2616-003.patch, yarn-2616-v1.patch, 
> yarn-2616-v2.patch
>
>
> registry needs a CLI interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free

2014-10-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155584#comment-14155584
 ] 

Jian He commented on YARN-2628:
---

looks good, one minor comment in the test case:
- the following assertion depends on timing, as the allocation happens 
asynchronously, it might fail. could you use a loop to check if the container 
is allocated, otherwise timeout.
{code}
Thread.sleep(1000);
allocResponse = am1.schedule();
Assert.assertEquals(1, allocResponse.getAllocatedContainers().size());
{code}

> Capacity scheduler with DominantResourceCalculator carries out reservation 
> even though slots are free
> -
>
> Key: YARN-2628
> URL: https://issues.apache.org/jira/browse/YARN-2628
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.5.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2628.0.patch
>
>
> We've noticed that if you run the CapacityScheduler with the 
> DominantResourceCalculator, sometimes apps will end up with containers in a 
> reserved state even though free slots are available.
> The root cause seems to be this piece of code from CapacityScheduler.java -
> {noformat}
> // Try to schedule more if there are no reservations to fulfill
> if (node.getReservedContainer() == null) {
>   if (Resources.greaterThanOrEqual(calculator, getClusterResource(),
>   node.getAvailableResource(), minimumAllocation)) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Trying to schedule on node: " + node.getNodeName() +
>   ", available: " + node.getAvailableResource());
> }
> root.assignContainers(clusterResource, node, false);
>   }
> } else {
>   LOG.info("Skipping scheduling since node " + node.getNodeID() + 
>   " is reserved by application " + 
>   
> node.getReservedContainer().getContainerId().getApplicationAttemptId()
>   );
> }
> {noformat}
> The code is meant to check if a node has any slots available for containers . 
> Since it uses the greaterThanOrEqual function, we end up in situation where 
> greaterThanOrEqual returns true, even though we may not have enough CPU or 
> memory to actually run the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.

2014-10-01 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155571#comment-14155571
 ] 

Karthik Kambatla commented on YARN-2254:


+1, pending Jenkins. 

I ll commit this later today. 

> change TestRMWebServicesAppsModification to support FairScheduler.
> --
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
> YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155565#comment-14155565
 ] 

Hadoop QA commented on YARN-2630:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672374/YARN-2630.4.patch
  against trunk revision 1f5b42a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5204//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5204//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5204//console

This message is automatically generated.

> TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
> -
>
> Key: YARN-2630
> URL: https://issues.apache.org/jira/browse/YARN-2630
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, 
> YARN-2630.4.patch
>
>
> The problem is that after YARN-1372, in work-preserving AM restart, the 
> re-launched AM will also receive previously failed AM container. But 
> DistributedShell logic is not expecting this extra completed container.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1879:
-
Attachment: YARN-1879.18.patch

Rebased on trunk.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.2-wip.patch, 
> YARN-1879.2.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, 
> YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.

2014-10-01 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1418#comment-1418
 ] 

zhihai xu commented on YARN-2254:
-

Hi [~kasha], Good suggestion, I upload a new patch YARN-2254.004.patch to 
address the comments. 
thanks

> change TestRMWebServicesAppsModification to support FairScheduler.
> --
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
> YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.

2014-10-01 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2254:

Attachment: YARN-2254.004.patch

> change TestRMWebServicesAppsModification to support FairScheduler.
> --
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
> YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2616) Add CLI client to the registry to list/view entries

2014-10-01 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2616:
-
Attachment: YARN-2616-003.patch

> Add CLI client to the registry to list/view entries
> ---
>
> Key: YARN-2616
> URL: https://issues.apache.org/jira/browse/YARN-2616
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Reporter: Steve Loughran
>Assignee: Akshay Radia
> Attachments: YARN-2616-003.patch, yarn-2616-v1.patch, 
> yarn-2616-v2.patch
>
>
> registry needs a CLI interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2616) Add CLI client to the registry to list/view entries

2014-10-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155530#comment-14155530
 ] 

Steve Loughran commented on YARN-2616:
--

the patch I just posted doesn't {{stop()}} the registry service, so will leak a 
curator instance/threads.

> Add CLI client to the registry to list/view entries
> ---
>
> Key: YARN-2616
> URL: https://issues.apache.org/jira/browse/YARN-2616
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Reporter: Steve Loughran
>Assignee: Akshay Radia
> Attachments: yarn-2616-v1.patch, yarn-2616-v2.patch
>
>
> registry needs a CLI interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-01 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:

Attachment: YARN-913-016.patch

patch -016: includes registry cli patch (-002) of YARN-2616

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
> YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
> YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
> YARN-913-016.patch, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155481#comment-14155481
 ] 

Hadoop QA commented on YARN-1879:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672394/YARN-1879.17.patch
  against trunk revision 875aa79.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5206//console

This message is automatically generated.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.2-wip.patch, YARN-1879.2.patch, 
> YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, 
> YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155477#comment-14155477
 ] 

Hadoop QA commented on YARN-2617:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672391/YARN-2617.5.patch
  against trunk revision 1f5b42a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager
  org.apache.hadoop.yarn.server.nodemanager.TestEventFlow
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5205//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5205//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5205//console

This message is automatically generated.

> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.5.patch, YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155473#comment-14155473
 ] 

Tsuyoshi OZAWA commented on YARN-2312:
--

I cannot reproduce the findbugs warning. Let me check the reason on Jenkins.

> Marking ContainerId#getId as deprecated
> ---
>
> Key: YARN-2312
> URL: https://issues.apache.org/jira/browse/YARN-2312
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, 
> YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch
>
>
> {{ContainerId#getId}} will only return partial value of containerId, only 
> sequence number of container id without epoch, after YARN-2229. We should 
> mark {{ContainerId#getId}} as deprecated and use 
> {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.

2014-10-01 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155467#comment-14155467
 ] 

Karthik Kambatla commented on YARN-2254:


Patch looks mostly good. One nit: Can we rename ALLOC_FILE to FS_ALLOC_FILE and 
"test-queues.xml" to "test-fs-queues.xml" to clarify the files are used only 
for FairScheduler? 

> change TestRMWebServicesAppsModification to support FairScheduler.
> --
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
> YARN-2254.002.patch, YARN-2254.003.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2312) Marking ContainerId#getId as deprecated

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2312:
-
Attachment: YARN-2312.2-3.patch

> Marking ContainerId#getId as deprecated
> ---
>
> Key: YARN-2312
> URL: https://issues.apache.org/jira/browse/YARN-2312
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, 
> YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch
>
>
> {{ContainerId#getId}} will only return partial value of containerId, only 
> sequence number of container id without epoch, after YARN-2229. We should 
> mark {{ContainerId#getId}} as deprecated and use 
> {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1879:
-
Attachment: YARN-1879.17.patch

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.2-wip.patch, YARN-1879.2.patch, 
> YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, 
> YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155407#comment-14155407
 ] 

Tsuyoshi OZAWA commented on YARN-1879:
--

{quote}
>APIs that added trigger flag.
APIs that added Idempotent/AtOnce annotation?
{quote}

I think ">APIs that are added trigger flag." is correct, so updating it.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, 
> YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, 
> YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155400#comment-14155400
 ] 

Tsuyoshi OZAWA commented on YARN-1879:
--

About the release audit warning, it's also not related.

{quote}
 !? 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/hadoop-hdfs-project/hadoop-hdfs/.gitattributes
Lines that start with ? in the release audit report indicate files that do 
not have an Apache license header
{quote}

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, 
> YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, 
> YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free

2014-10-01 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155398#comment-14155398
 ] 

Varun Vasudev commented on YARN-2628:
-

The release audit error is from a hdfs file and unrelated.

> Capacity scheduler with DominantResourceCalculator carries out reservation 
> even though slots are free
> -
>
> Key: YARN-2628
> URL: https://issues.apache.org/jira/browse/YARN-2628
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.5.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2628.0.patch
>
>
> We've noticed that if you run the CapacityScheduler with the 
> DominantResourceCalculator, sometimes apps will end up with containers in a 
> reserved state even though free slots are available.
> The root cause seems to be this piece of code from CapacityScheduler.java -
> {noformat}
> // Try to schedule more if there are no reservations to fulfill
> if (node.getReservedContainer() == null) {
>   if (Resources.greaterThanOrEqual(calculator, getClusterResource(),
>   node.getAvailableResource(), minimumAllocation)) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Trying to schedule on node: " + node.getNodeName() +
>   ", available: " + node.getAvailableResource());
> }
> root.assignContainers(clusterResource, node, false);
>   }
> } else {
>   LOG.info("Skipping scheduling since node " + node.getNodeID() + 
>   " is reserved by application " + 
>   
> node.getReservedContainer().getContainerId().getApplicationAttemptId()
>   );
> }
> {noformat}
> The code is meant to check if a node has any slots available for containers . 
> Since it uses the greaterThanOrEqual function, we end up in situation where 
> greaterThanOrEqual returns true, even though we may not have enough CPU or 
> memory to actually run the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155399#comment-14155399
 ] 

Tsuyoshi OZAWA commented on YARN-1879:
--

Sorry for the delay and thanks for updating the patch, [~adhoot]. About the 
test failure, it looks not related to the patch. Let me attach the patch which 
includes your comment changes.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, 
> YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, 
> YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-01 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2617:
--
Attachment: YARN-2617.5.patch

just added one more log statement myself, pending jenkins

> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.5.patch, YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155393#comment-14155393
 ] 

Hadoop QA commented on YARN-2628:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12672381/apache-yarn-2628.0.patch
  against trunk revision 1f5b42a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5202//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5202//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5202//console

This message is automatically generated.

> Capacity scheduler with DominantResourceCalculator carries out reservation 
> even though slots are free
> -
>
> Key: YARN-2628
> URL: https://issues.apache.org/jira/browse/YARN-2628
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.5.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2628.0.patch
>
>
> We've noticed that if you run the CapacityScheduler with the 
> DominantResourceCalculator, sometimes apps will end up with containers in a 
> reserved state even though free slots are available.
> The root cause seems to be this piece of code from CapacityScheduler.java -
> {noformat}
> // Try to schedule more if there are no reservations to fulfill
> if (node.getReservedContainer() == null) {
>   if (Resources.greaterThanOrEqual(calculator, getClusterResource(),
>   node.getAvailableResource(), minimumAllocation)) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Trying to schedule on node: " + node.getNodeName() +
>   ", available: " + node.getAvailableResource());
> }
> root.assignContainers(clusterResource, node, false);
>   }
> } else {
>   LOG.info("Skipping scheduling since node " + node.getNodeID() + 
>   " is reserved by application " + 
>   
> node.getReservedContainer().getContainerId().getApplicationAttemptId()
>   );
> }
> {noformat}
> The code is meant to check if a node has any slots available for containers . 
> Since it uses the greaterThanOrEqual function, we end up in situation where 
> greaterThanOrEqual returns true, even though we may not have enough CPU or 
> memory to actually run the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >