date:20141029

[jira] [Commented] (YARN-2712) Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart

2014-10-29 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189720#comment-14189720
 ] 

Tsuyoshi OZAWA commented on YARN-2712:
--

[~adhoot] [~kkambatl] [~jianhe] do you have additional comments?

> Adding tests about FSQueue and headroom of FairScheduler to 
> TestWorkPreservingRMRestart
> ---
>
> Key: YARN-2712
> URL: https://issues.apache.org/jira/browse/YARN-2712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2712.1.patch, YARN-2712.2.patch
>
>
> TestWorkPreservingRMRestart#testSchedulerRecovery doesn't have test cases 
> about FairScheduler partially. We should support them.
> {code}
>// Until YARN-1959 is resolved
>if (scheduler.getClass() != FairScheduler.class) {
>  assertEquals(availableResources, schedulerAttempt.getHeadroom());
>}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager

2014-10-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189707#comment-14189707
 ] 

Hadoop QA commented on YARN-2753:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678125/YARN-2753.005.patch
  against trunk revision 0126cf1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5637//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5637//console

This message is automatically generated.

> Fix potential issues and code clean up for *NodeLabelsManager
> -
>
> Key: YARN-2753
> URL: https://issues.apache.org/jira/browse/YARN-2753
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2753.000.patch, YARN-2753.001.patch, 
> YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch, 
> YARN-2753.005.patch
>
>
> Issues include:
> * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value 
> in labelCollections if the key already exists otherwise the Label.resource 
> will be changed(reset).
> * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of 
> CommonNodeLabelsManager.
> ** because when a Node is created, Node.labels can be null.
> ** In this case, nm.labels; may be null. So we need check originalLabels not 
> null before use it(originalLabels.containsAll).
> * addToCluserNodeLabels should be protected by writeLock in 
> RMNodeLabelsManager.java. because we should protect labelCollections in 
> RMNodeLabelsManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2588) Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.

2014-10-29 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189698#comment-14189698
 ] 

Karthik Kambatla commented on YARN-2588:


Agree, we will need to move things around a little to get it right. 

> Standby RM does not transitionToActive if previous transitionToActive is 
> failed with ZK exception.
> --
>
> Key: YARN-2588
> URL: https://issues.apache.org/jira/browse/YARN-2588
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.6.0, 2.5.1
>Reporter: Rohith
>Assignee: Rohith
> Fix For: 2.6.0
>
> Attachments: YARN-2588.1.patch, YARN-2588.2.patch, YARN-2588.patch
>
>
> Consider scenario where, StandBy RM is failed to transition to Active because 
> of ZK exception(connectionLoss or SessionExpired). Then any further 
> transition to Active for same RM does not move RM to Active state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2770) Timeline delegation tokens need to be automatically renewed by the RM

2014-10-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189657#comment-14189657
 ] 

Hadoop QA commented on YARN-2770:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678042/YARN-2770.1.patch
  against trunk revision 0126cf1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5635//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5635//console

This message is automatically generated.

> Timeline delegation tokens need to be automatically renewed by the RM
> -
>
> Key: YARN-2770
> URL: https://issues.apache.org/jira/browse/YARN-2770
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.5.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
> Attachments: YARN-2770.1.patch
>
>
> YarnClient will automatically grab a timeline DT for the application and pass 
> it to the app AM. Now the timeline DT renew is still dummy. If an app is 
> running for more than 24h (default DT expiry time), the app AM is no longer 
> able to use the expired DT to communicate with the timeline server. Since RM 
> will cache the credentials of each app, and renew the DTs for the running 
> app. We should provider renew hooks similar to what HDFS DT has for RM, and 
> set RM user as the renewer when grabbing the timeline DT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager

2014-10-29 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2753:

Attachment: YARN-2753.005.patch

> Fix potential issues and code clean up for *NodeLabelsManager
> -
>
> Key: YARN-2753
> URL: https://issues.apache.org/jira/browse/YARN-2753
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2753.000.patch, YARN-2753.001.patch, 
> YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch, 
> YARN-2753.005.patch
>
>
> Issues include:
> * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value 
> in labelCollections if the key already exists otherwise the Label.resource 
> will be changed(reset).
> * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of 
> CommonNodeLabelsManager.
> ** because when a Node is created, Node.labels can be null.
> ** In this case, nm.labels; may be null. So we need check originalLabels not 
> null before use it(originalLabels.containsAll).
> * addToCluserNodeLabels should be protected by writeLock in 
> RMNodeLabelsManager.java. because we should protect labelCollections in 
> RMNodeLabelsManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager

2014-10-29 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2753:

Attachment: (was: YARN-2753.005.patch)

> Fix potential issues and code clean up for *NodeLabelsManager
> -
>
> Key: YARN-2753
> URL: https://issues.apache.org/jira/browse/YARN-2753
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2753.000.patch, YARN-2753.001.patch, 
> YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch, 
> YARN-2753.005.patch
>
>
> Issues include:
> * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value 
> in labelCollections if the key already exists otherwise the Label.resource 
> will be changed(reset).
> * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of 
> CommonNodeLabelsManager.
> ** because when a Node is created, Node.labels can be null.
> ** In this case, nm.labels; may be null. So we need check originalLabels not 
> null before use it(originalLabels.containsAll).
> * addToCluserNodeLabels should be protected by writeLock in 
> RMNodeLabelsManager.java. because we should protect labelCollections in 
> RMNodeLabelsManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2772) DistributedShell's timeline related options are not clear

2014-10-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189652#comment-14189652
 ] 

Hadoop QA commented on YARN-2772:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678074/YARN-2772.1.patch
  against trunk revision 0126cf1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5636//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5636//console

This message is automatically generated.

> DistributedShell's timeline related options are not clear
> -
>
> Key: YARN-2772
> URL: https://issues.apache.org/jira/browse/YARN-2772
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: YARN-2772.1.patch
>
>
> The new options "domain" and "create" options - they are not descriptive at 
> all. It is also not clear when view_acls and modify_acls need to be set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2588) Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.

2014-10-29 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189600#comment-14189600
 ] 

Rohith commented on YARN-2588:
--

bq. but I would like for us to call transitionToStandby in the catch-block 
instead of explicitly calling the contents of transitionToStandby
As I understand the comment, Is expecting change is like below..? CMIIAW, If 
yes, transitionToStandby return in intial state check itself. And end up in 
without creating active services and resetting dispatcher!!!
{code}
try {
  startActiveServices();
  return null;
} catch (Exception e) {
  transitionToStandby(true);
  throw e;
}
{code}


> Standby RM does not transitionToActive if previous transitionToActive is 
> failed with ZK exception.
> --
>
> Key: YARN-2588
> URL: https://issues.apache.org/jira/browse/YARN-2588
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.6.0, 2.5.1
>Reporter: Rohith
>Assignee: Rohith
> Fix For: 2.6.0
>
> Attachments: YARN-2588.1.patch, YARN-2588.2.patch, YARN-2588.patch
>
>
> Consider scenario where, StandBy RM is failed to transition to Active because 
> of ZK exception(connectionLoss or SessionExpired). Then any further 
> transition to Active for same RM does not move RM to Active state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-10-29 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189583#comment-14189583
 ] 

Rohith commented on YARN-2579:
--

Thanks Karthink!! 
bq. (Service)Dispatcher.stop() wait for draining out RMFatalEventDispatcher 
event
I was meant to say that drained event i.e RMFatalEvent is been waiting to be 
finished at {{rmDispatcher.stop()}}  in {{eventHandlerThread.join}}.

bq. {{dispatch(event)}} in AsyncDispatcher#createThread doesn't have a 
try-catch block 
{{dispatch(event)}}  method catch throwable and exit the JVM. But I see if 
handler's are not registered , then we must have try-catch block. do you meant 
for this scenario?

bq. {{eventHandlerThread.join}} in serviceStop should take a timeout as well
+1 for this approach too, this also fixes hang problem. The attached patch too 
does not bring Rm to hang in a kind of deadlock mode.

bq. With the current patch, I wonder if there are any unexpected side-effects
I have verified many switching scenarios as I mentioned in previous comment and 
more deployed in real cluster. It is working fine with work preserving restart 
too.

> Both RM's state is Active , but 1 RM is not really active.
> --
>
> Key: YARN-2579
> URL: https://issues.apache.org/jira/browse/YARN-2579
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Rohith
>Assignee: Rohith
>Priority: Blocker
> Attachments: YARN-2579.patch, YARN-2579.patch
>
>
> I encountered a situaltion where both RM's web page was able to access and 
> its state displayed as Active. But One of the RM's ActiveServices were 
> stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189488#comment-14189488
 ] 

Hadoop QA commented on YARN-2698:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677981/YARN-2698-20141029-2.patch
  against trunk revision 6f5f604.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter
  
org.apache.hadoop.mapreduce.v2.TestMRAMWithNonNormalizedCapabilities
  org.apache.hadoop.mapreduce.TestMapReduceLazyOutput
  org.apache.hadoop.mapreduce.v2.TestNonExistentJob
  org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser
  org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner
  org.apache.hadoop.mapreduce.v2.TestUberAM
  org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler
  org.apache.hadoop.mapreduce.v2.TestMRJobs
  org.apache.hadoop.mapreduce.v2.TestRMNMInfo
  org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution
  org.apache.hadoop.mapreduce.v2.TestMROldApiJobs
  org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService
  org.apache.hadoop.mapreduce.TestLargeSort
  org.apache.hadoop.mapred.TestClusterMRNotification

  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5634//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5634//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5634//console

This message is automatically generated.

> Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
> RMAdminCLI
> ---
>
> Key: YARN-2698
> URL: https://issues.apache.org/jira/browse/YARN-2698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, 
> YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, 
> YARN-2698-20141029-2.patch
>
>
> YARN RMAdminCLI and AdminService should have write API only, for other read 
> APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node

2014-10-29 Thread Robert Kanter (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter reassigned YARN-2604:
---

Assignee: Robert Kanter  (was: Karthik Kambatla)

> Scheduler should consider max-allocation-* in conjunction with the largest 
> node
> ---
>
> Key: YARN-2604
> URL: https://issues.apache.org/jira/browse/YARN-2604
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.5.1
>Reporter: Karthik Kambatla
>Assignee: Robert Kanter
>
> If the scheduler max-allocation-* values are larger than the resources 
> available on the largest node in the cluster, an application requesting 
> resources between the two values will be accepted by the scheduler but the 
> requests will never be satisfied. The app essentially hangs forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2774) shared cache uploader service should authorize notify calls properly

2014-10-29 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2774:
--
Issue Type: Sub-task  (was: Task)
Parent: YARN-1492

> shared cache uploader service should authorize notify calls properly
> 
>
> Key: YARN-2774
> URL: https://issues.apache.org/jira/browse/YARN-2774
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sangjin Lee
>
> The shared cache manager (SCM) uploader service (done in YARN-2186) currently 
> does not authorize calls to notify the SCM on newly uploaded resource. Proper 
> security/authorization needs to be done in this RPC call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2774) shared cache uploader service should authorize notify calls properly

2014-10-29 Thread Sangjin Lee (JIRA)

Sangjin Lee created YARN-2774:
-

 Summary: shared cache uploader service should authorize notify 
calls properly
 Key: YARN-2774
 URL: https://issues.apache.org/jira/browse/YARN-2774
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Sangjin Lee


The shared cache manager (SCM) uploader service (done in YARN-2186) currently 
does not authorize calls to notify the SCM on newly uploaded resource. Proper 
security/authorization needs to be done in this RPC call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2772) DistributedShell's timeline related options are not clear

2014-10-29 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2772:
--
Attachment: YARN-2772.1.patch

> DistributedShell's timeline related options are not clear
> -
>
> Key: YARN-2772
> URL: https://issues.apache.org/jira/browse/YARN-2772
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: YARN-2772.1.patch
>
>
> The new options "domain" and "create" options - they are not descriptive at 
> all. It is also not clear when view_acls and modify_acls need to be set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2772) DistributedShell's timeline related options are not clear

2014-10-29 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189384#comment-14189384
 ] 

Zhijie Shen commented on YARN-2772:
---

[~vinodkv], thanks for your proposal.

1. I prefer "create_timeline_domain" over "should_create_timeline_domain", as 
it is an option without arg. So there will not be true/false for it.

2. I'd like to enforce the validation logic (see the existing code comment). 
However, as we're lacking timeline client query APIs. It will involve more 
steps to send http requests and parse JSON response. I prefer to do it after 
YARN-2423.
{code}
try {
  //TODO: we need to check and combine the existing timeline domain ACLs,
  //but let's do it once we have client java library to query domains.
  TimelineDomain domain = new TimelineDomain();
{code}

Otherwise, I've addressed the other comments and made a patch.

> DistributedShell's timeline related options are not clear
> -
>
> Key: YARN-2772
> URL: https://issues.apache.org/jira/browse/YARN-2772
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>
> The new options "domain" and "create" options - they are not descriptive at 
> all. It is also not clear when view_acls and modify_acls need to be set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2588) Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.

2014-10-29 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189344#comment-14189344
 ] 

Karthik Kambatla commented on YARN-2588:


Thanks Jian for pointing me to this. Patch fixes an important issue, but I 
would like for us to call transitionToStandby in the catch-block instead of 
explicitly calling the contents of transitionToStandby. I ll fix this up in 
YARN-2010. 

> Standby RM does not transitionToActive if previous transitionToActive is 
> failed with ZK exception.
> --
>
> Key: YARN-2588
> URL: https://issues.apache.org/jira/browse/YARN-2588
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.6.0, 2.5.1
>Reporter: Rohith
>Assignee: Rohith
> Fix For: 2.6.0
>
> Attachments: YARN-2588.1.patch, YARN-2588.2.patch, YARN-2588.patch
>
>
> Consider scenario where, StandBy RM is failed to transition to Active because 
> of ZK exception(connectionLoss or SessionExpired). Then any further 
> transition to Active for same RM does not move RM to Active state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2186) Node Manager uploader service for cache manager

2014-10-29 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189307#comment-14189307
 ] 

Karthik Kambatla commented on YARN-2186:


Thanks Sangjin. Looks mostly good, but for some minor comments: 
# How about renaming NMUploaderSerivceSCMProtocol to SharedCacheUploader (after 
ResourceTracker) or SharedCacheUploaderProtocol? Accordingly, rename all other 
related classes and proto files? 
# Instead of {{yarn.sharedcache.nodemanager.}}, we should probably call it 
{{yarn.sharedcache.uploader}} to avoid confusion? 
# As per our offline discussions, it would be nice to add a way for the NM to 
ask the SCM whether it should upload a resource to the shared-cache or not. For 
now, this could be always yes. In the future, we can add a pluggable policy 
that the SCM would consult to answer the NM.
# NMCacheUploaderSCMProtocolPBClientImpl#close should set {{this.proxy}} to 
null after calling stopProxy.
# NMCacheUploaderSCMProtocolService:
## TODOs should have an associated follow-up JIRA and reference in the code so 
we don't forget
## serviceStop should set {{this.server}} to null after calling 
{{this.server.stop()}}

> Node Manager uploader service for cache manager
> ---
>
> Key: YARN-2186
> URL: https://issues.apache.org/jira/browse/YARN-2186
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2186-trunk-v1.patch, YARN-2186-trunk-v2.patch, 
> YARN-2186-trunk-v3.patch, YARN-2186-trunk-v4.patch
>
>
> Implement the node manager uploader service for the cache manager. This 
> service is responsible for communicating with the node manager when it 
> uploads resources to the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2771) DistributedShell's DSConstants are badly named

2014-10-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189300#comment-14189300
 ] 

Hadoop QA commented on YARN-2771:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678058/YARN-2771.1.patch
  against trunk revision 6f5f604.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5633//console

This message is automatically generated.

> DistributedShell's DSConstants are badly named
> --
>
> Key: YARN-2771
> URL: https://issues.apache.org/jira/browse/YARN-2771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: YARN-2771.1.patch
>
>
> I'd rather have underscores (DISTRIBUTED_SHELL_TIMELINE_DOMAIN instead of 
> DISTRIBUTEDSHELLTIMELINEDOMAIN).
> DISTRIBUTEDSHELLTIMELINEDOMAIN is added in this release, can we rename it to 
> be DISTRIBUTED_SHELL_TIMELINE_DOMAIN?
> For the old envs, we can just add new envs that point to the old-one and 
> deprecate the old ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2771) DistributedShell's DSConstants are badly named

2014-10-29 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2771:
--
Attachment: YARN-2771.1.patch

While I was aware of the bad naming, I decided to follow the pattern of the 
existing constants in DSConstants to be consistent. Anyway, I've uploaded a 
patch to fix all these constants.

DS is not a serious computation framework, the env var name change is 
transparent to the CLI user, hence it should not breaking anything.

> DistributedShell's DSConstants are badly named
> --
>
> Key: YARN-2771
> URL: https://issues.apache.org/jira/browse/YARN-2771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: YARN-2771.1.patch
>
>
> I'd rather have underscores (DISTRIBUTED_SHELL_TIMELINE_DOMAIN instead of 
> DISTRIBUTEDSHELLTIMELINEDOMAIN).
> DISTRIBUTEDSHELLTIMELINEDOMAIN is added in this release, can we rename it to 
> be DISTRIBUTED_SHELL_TIMELINE_DOMAIN?
> For the old envs, we can just add new envs that point to the old-one and 
> deprecate the old ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2766) ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers

2014-10-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189247#comment-14189247
 ] 

Hadoop QA commented on YARN-2766:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678034/YARN-2766.patch
  against trunk revision 3ae84e1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5632//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5632//console

This message is automatically generated.

>  ApplicationHistoryManager is expected to return a sorted list of 
> apps/attempts/containers
> --
>
> Key: YARN-2766
> URL: https://issues.apache.org/jira/browse/YARN-2766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch
>
>
> {{TestApplicationHistoryClientService.testContainers}} and 
> {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
> because the test assertions are assuming a returned Collection is in a 
> certain order.  The collection comes from a HashMap, so the order is not 
> guaranteed, plus, according to [this 
> page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
>  there are situations where the iteration order of a HashMap will be 
> different between Java 7 and 8.
> We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2766) ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers

2014-10-29 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2766:
--
Summary:  ApplicationHistoryManager is expected to return a sorted list of 
apps/attempts/containers  (was: [JDK 8] TestApplicationHistoryClientService 
fails)

>  ApplicationHistoryManager is expected to return a sorted list of 
> apps/attempts/containers
> --
>
> Key: YARN-2766
> URL: https://issues.apache.org/jira/browse/YARN-2766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch
>
>
> {{TestApplicationHistoryClientService.testContainers}} and 
> {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
> because the test assertions are assuming a returned Collection is in a 
> certain order.  The collection comes from a HashMap, so the order is not 
> guaranteed, plus, according to [this 
> page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
>  there are situations where the iteration order of a HashMap will be 
> different between Java 7 and 8.
> We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-10-29 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2579:
---
Priority: Blocker  (was: Major)
Target Version/s: 2.6.0

> Both RM's state is Active , but 1 RM is not really active.
> --
>
> Key: YARN-2579
> URL: https://issues.apache.org/jira/browse/YARN-2579
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Rohith
>Assignee: Rohith
>Priority: Blocker
> Attachments: YARN-2579.patch, YARN-2579.patch
>
>
> I encountered a situaltion where both RM's web page was able to access and 
> its state displayed as Active. But One of the RM's ActiveServices were 
> stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails

2014-10-29 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2766:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-321

> [JDK 8] TestApplicationHistoryClientService fails
> -
>
> Key: YARN-2766
> URL: https://issues.apache.org/jira/browse/YARN-2766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch
>
>
> {{TestApplicationHistoryClientService.testContainers}} and 
> {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
> because the test assertions are assuming a returned Collection is in a 
> certain order.  The collection comes from a HashMap, so the order is not 
> guaranteed, plus, according to [this 
> page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
>  there are situations where the iteration order of a HashMap will be 
> different between Java 7 and 8.
> We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails

2014-10-29 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2766:
--
Issue Type: Bug  (was: Sub-task)
Parent: (was: YARN-1530)

> [JDK 8] TestApplicationHistoryClientService fails
> -
>
> Key: YARN-2766
> URL: https://issues.apache.org/jira/browse/YARN-2766
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch
>
>
> {{TestApplicationHistoryClientService.testContainers}} and 
> {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
> because the test assertions are assuming a returned Collection is in a 
> certain order.  The collection comes from a HashMap, so the order is not 
> guaranteed, plus, according to [this 
> page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
>  there are situations where the iteration order of a HashMap will be 
> different between Java 7 and 8.
> We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-10-29 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189195#comment-14189195
 ] 

Karthik Kambatla commented on YARN-2579:


Thanks, [~rohithsharma]. Looking at the tests and your explanation, I think I 
see what you are saying. 

However, looking into the code, I am not convinced it is draining out that is 
causing this issue. {{rmDispatcher}} is an {{AsyncDispatcher}}, with 
{{drainEventsOnStop}} always false. So, {{rmDispatcher.stop()}} shouldn't lead 
to any draining of events. I noticed a couple of other issues in the 
AsyncDispatcher code:
# {{eventHandlerThread.join}} in serviceStop should take a timeout as well
# {{dispatch(event)}} in AsyncDispatcher#createThread doesn't have a try-catch 
block 

With the current patch, I wonder if there are any unexpected side-effects. 

> Both RM's state is Active , but 1 RM is not really active.
> --
>
> Key: YARN-2579
> URL: https://issues.apache.org/jira/browse/YARN-2579
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Rohith
>Assignee: Rohith
> Attachments: YARN-2579.patch, YARN-2579.patch
>
>
> I encountered a situaltion where both RM's web page was able to access and 
> its state displayed as Active. But One of the RM's ActiveServices were 
> stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails

2014-10-29 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189194#comment-14189194
 ] 

Zhijie Shen commented on YARN-2766:
---

I think we need to change ApplicationContext -> ApplicationHistoryManager -> 
ApplicationHistoryManagerOnTimelineStore. Modifying the protobuf message will 
not help the web services.

> [JDK 8] TestApplicationHistoryClientService fails
> -
>
> Key: YARN-2766
> URL: https://issues.apache.org/jira/browse/YARN-2766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch
>
>
> {{TestApplicationHistoryClientService.testContainers}} and 
> {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
> because the test assertions are assuming a returned Collection is in a 
> certain order.  The collection comes from a HashMap, so the order is not 
> guaranteed, plus, according to [this 
> page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
>  there are situations where the iteration order of a HashMap will be 
> different between Java 7 and 8.
> We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2770) Timeline delegation tokens need to be automatically renewed by the RM

2014-10-29 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2770:
--
Attachment: YARN-2770.1.patch

Created a patch:

* Add two timeline client APIs - renew/cancel delegation token
* Make TimelineDelegationTokenIdentifier.Renewer extend TokenRenewer and 
implement renew and cancel logic by using timeline client APIs
* Change YarnClientImpl to set the renewer of the timeline DT to the user of RM 
daemon.
* Add the test cases to validate renew/cancel APIs
* Have done end-to-end test to verify that the automatic DT renew works in a 
secure cluster.

> Timeline delegation tokens need to be automatically renewed by the RM
> -
>
> Key: YARN-2770
> URL: https://issues.apache.org/jira/browse/YARN-2770
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.5.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
> Attachments: YARN-2770.1.patch
>
>
> YarnClient will automatically grab a timeline DT for the application and pass 
> it to the app AM. Now the timeline DT renew is still dummy. If an app is 
> running for more than 24h (default DT expiry time), the app AM is no longer 
> able to use the expired DT to communicate with the timeline server. Since RM 
> will cache the credentials of each app, and renew the DTs for the running 
> app. We should provider renew hooks similar to what HDFS DT has for RM, and 
> set RM user as the renewer when grabbing the timeline DT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails

2014-10-29 Thread Robert Kanter (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2766:

Attachment: YARN-2766.patch

New patch fixes findbugs warnings

> [JDK 8] TestApplicationHistoryClientService fails
> -
>
> Key: YARN-2766
> URL: https://issues.apache.org/jira/browse/YARN-2766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch
>
>
> {{TestApplicationHistoryClientService.testContainers}} and 
> {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
> because the test assertions are assuming a returned Collection is in a 
> certain order.  The collection comes from a HashMap, so the order is not 
> guaranteed, plus, according to [this 
> page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
>  there are situations where the iteration order of a HashMap will be 
> different between Java 7 and 8.
> We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server

2014-10-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189143#comment-14189143
 ] 

Hadoop QA commented on YARN-2556:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678020/yarn2556.patch
  against trunk revision d33e07d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5631//console

This message is automatically generated.

> Tool to measure the performance of the timeline server
> --
>
> Key: YARN-2556
> URL: https://issues.apache.org/jira/browse/YARN-2556
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: chang li
> Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
> yarn2556.patch, yarn2556_wip.patch
>
>
> We need to be able to understand the capacity model for the timeline server 
> to give users the tools they need to deploy a timeline server with the 
> correct capacity.
> I propose we create a mapreduce job that can measure timeline server write 
> and read performance. Transactions per second, I/O for both read and write 
> would be a good start.
> This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor

2014-10-29 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189117#comment-14189117
 ] 

Allen Wittenauer commented on YARN-2701:


OK, this compiled without incident, so I'm +1 now.  Thanks!

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch, 
> YARN-2701.addendum.1.patch, YARN-2701.addendum.2.patch, 
> YARN-2701.addendum.3.patch, YARN-2701.addendum.4.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server

2014-10-29 Thread chang li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chang li updated YARN-2556:
---
Attachment: yarn2556.patch

Cleaned up my patch, welcome to review. I have used this application to test 
the timeline server throughput on local mode by launching 4 mappers and each 
will put an entity larger than 100 kbs and iterate for 1000 times. Here is my 
measure result, on my local machine, the timeline server can provide about 
10Mbs io rate for write. There is some deviation from the write throughput for 
leveldb. People are welcome to try this tool and comment about it.  

> Tool to measure the performance of the timeline server
> --
>
> Key: YARN-2556
> URL: https://issues.apache.org/jira/browse/YARN-2556
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: chang li
> Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
> yarn2556.patch, yarn2556_wip.patch
>
>
> We need to be able to understand the capacity model for the timeline server 
> to give users the tools they need to deploy a timeline server with the 
> correct capacity.
> I propose we create a mapreduce job that can measure timeline server write 
> and read performance. Transactions per second, I/O for both read and write 
> would be a good start.
> This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_ dirs after YARN-661

2014-10-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189101#comment-14189101
 ] 

Hadoop QA commented on YARN-2755:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678008/YARN-2755.v4.patch
  against trunk revision d33e07d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5630//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5630//console

This message is automatically generated.

> NM fails to clean up usercache_DEL_ dirs after YARN-661
> --
>
> Key: YARN-2755
> URL: https://issues.apache.org/jira/browse/YARN-2755
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, 
> YARN-2755.v3.patch, YARN-2755.v4.patch
>
>
> When NM restarts frequently due to some reason, a large number of directories 
> like these left in /data/disk$num/yarn/local/:
> /data/disk1/yarn/local/usercache_DEL_1414372756105
> /data/disk1/yarn/local/usercache_DEL_1413557901696
> /data/disk1/yarn/local/usercache_DEL_1413657004894
> /data/disk1/yarn/local/usercache_DEL_1413675321860
> /data/disk1/yarn/local/usercache_DEL_1414093167936
> /data/disk1/yarn/local/usercache_DEL_1413565841271
> These directories are empty, but take up 100M+ due to the number of them. 
> There were 38714 on the machine I looked at per data disk.
> It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails

2014-10-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189088#comment-14189088
 ] 

Hadoop QA commented on YARN-2766:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678001/YARN-2766.patch
  against trunk revision d33e07d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 6 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5629//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5629//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5629//console

This message is automatically generated.

> [JDK 8] TestApplicationHistoryClientService fails
> -
>
> Key: YARN-2766
> URL: https://issues.apache.org/jira/browse/YARN-2766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-2766.patch, YARN-2766.patch
>
>
> {{TestApplicationHistoryClientService.testContainers}} and 
> {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
> because the test assertions are assuming a returned Collection is in a 
> certain order.  The collection comes from a HashMap, so the order is not 
> guaranteed, plus, according to [this 
> page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
>  there are situations where the iteration order of a HashMap will be 
> different between Java 7 and 8.
> We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2773) ReservationSystem's use of Queue names vs paths is inconsistent for CapacityReservationSystem and FairReservationSystem

2014-10-29 Thread Anubhav Dhoot (JIRA)

Anubhav Dhoot created YARN-2773:
---

 Summary: ReservationSystem's use of Queue names vs paths is 
inconsistent for CapacityReservationSystem and FairReservationSystem  
 Key: YARN-2773
 URL: https://issues.apache.org/jira/browse/YARN-2773
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Anubhav Dhoot
Priority: Minor


Reservation system requires use the ReservationDefinition to use a queue name 
to choose which reservation queue is being used. CapacityScheduler does not 
allow duplicate leaf queue names. Because of this we can refer to a unique leaf 
queue by simply using its name and not full path (which includes parentName + 
"."). FairScheduler allows duplicate leaf queue names because of which one 
needs to refer to the full queue name to identify a queue uniquely. This is 
inconsistent for the implementation of the AbstractReservationSystem where one 
implementation of getQueuePath will do conversion (CapacityReservationSystem) 
while the FairReservationSystem will return the same value back 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_ dirs after YARN-661

2014-10-29 Thread Siqi Li (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189026#comment-14189026
 ] 

Siqi Li commented on YARN-2755:
---

Thanks for you feedback [~jlowe]. I have updated the patch with proper fix

> NM fails to clean up usercache_DEL_ dirs after YARN-661
> --
>
> Key: YARN-2755
> URL: https://issues.apache.org/jira/browse/YARN-2755
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, 
> YARN-2755.v3.patch, YARN-2755.v4.patch
>
>
> When NM restarts frequently due to some reason, a large number of directories 
> like these left in /data/disk$num/yarn/local/:
> /data/disk1/yarn/local/usercache_DEL_1414372756105
> /data/disk1/yarn/local/usercache_DEL_1413557901696
> /data/disk1/yarn/local/usercache_DEL_1413657004894
> /data/disk1/yarn/local/usercache_DEL_1413675321860
> /data/disk1/yarn/local/usercache_DEL_1414093167936
> /data/disk1/yarn/local/usercache_DEL_1413565841271
> These directories are empty, but take up 100M+ due to the number of them. 
> There were 38714 on the machine I looked at per data disk.
> It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2755) NM fails to clean up usercache_DEL_ dirs after YARN-661

2014-10-29 Thread Siqi Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2755:
--
Attachment: YARN-2755.v4.patch

> NM fails to clean up usercache_DEL_ dirs after YARN-661
> --
>
> Key: YARN-2755
> URL: https://issues.apache.org/jira/browse/YARN-2755
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, 
> YARN-2755.v3.patch, YARN-2755.v4.patch
>
>
> When NM restarts frequently due to some reason, a large number of directories 
> like these left in /data/disk$num/yarn/local/:
> /data/disk1/yarn/local/usercache_DEL_1414372756105
> /data/disk1/yarn/local/usercache_DEL_1413557901696
> /data/disk1/yarn/local/usercache_DEL_1413657004894
> /data/disk1/yarn/local/usercache_DEL_1413675321860
> /data/disk1/yarn/local/usercache_DEL_1414093167936
> /data/disk1/yarn/local/usercache_DEL_1413565841271
> These directories are empty, but take up 100M+ due to the number of them. 
> There were 38714 on the machine I looked at per data disk.
> It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_ dirs after YARN-661

2014-10-29 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189015#comment-14189015
 ] 

Jason Lowe commented on YARN-2755:
--

Thanks for the patch, Siqi.

userDirStatus can be null if userDirPath is not a directory, so we should avoid 
the potential NPE and check for {{userDirStatus != null && 
userDirStatus.hasNext()}}


> NM fails to clean up usercache_DEL_ dirs after YARN-661
> --
>
> Key: YARN-2755
> URL: https://issues.apache.org/jira/browse/YARN-2755
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, 
> YARN-2755.v3.patch
>
>
> When NM restarts frequently due to some reason, a large number of directories 
> like these left in /data/disk$num/yarn/local/:
> /data/disk1/yarn/local/usercache_DEL_1414372756105
> /data/disk1/yarn/local/usercache_DEL_1413557901696
> /data/disk1/yarn/local/usercache_DEL_1413657004894
> /data/disk1/yarn/local/usercache_DEL_1413675321860
> /data/disk1/yarn/local/usercache_DEL_1414093167936
> /data/disk1/yarn/local/usercache_DEL_1413565841271
> These directories are empty, but take up 100M+ due to the number of them. 
> There were 38714 on the machine I looked at per data disk.
> It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails

2014-10-29 Thread Robert Kanter (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2766:

Attachment: YARN-2766.patch

That makes sense.  

I wasn't able to trace the code back to ApplicationHistoryManager, but I did 
find where the lists are created, so I put the sorting calls there.  

> [JDK 8] TestApplicationHistoryClientService fails
> -
>
> Key: YARN-2766
> URL: https://issues.apache.org/jira/browse/YARN-2766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-2766.patch, YARN-2766.patch
>
>
> {{TestApplicationHistoryClientService.testContainers}} and 
> {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
> because the test assertions are assuming a returned Collection is in a 
> certain order.  The collection comes from a HashMap, so the order is not 
> guaranteed, plus, according to [this 
> page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
>  there are situations where the iteration order of a HashMap will be 
> different between Java 7 and 8.
> We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2772) DistributedShell's timeline related options are not clear

2014-10-29 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188969#comment-14188969
 ] 

Vinod Kumar Vavilapalli commented on YARN-2772:
---

 I propose the following:
 - Rename the "domain" and "create" options to be "timeline_domain_id" and 
"should_create_timeline_domain" respectively.
 - Modify option description of view_acls and modify_acls to say that they are 
only needed if should_create_timeline_domain is true
 - Modify description of {{timeline_domain_id}} to say that it is optional and 
the it will use the "DEFAULT" timeline-domain by default
 - If {{should_create_timeline_domain}} is off, we should validate on the 
client to see if the domain really exists or not and fail the submission if not 
with a message saying "The passed timeline-domain doesn't exist. Either pass an 
existing timeline-domain_id or set  should_create_timeline_domain to true".
 - If {{should_create_timeline_domain}} is on, and the user passes an existing 
timeline-domain-id, we should fail the submission and say "The passed 
timeline-domain already exists. Either pass an new timeline-domain_id or set  
should_create_timeline_domain to false"

> DistributedShell's timeline related options are not clear
> -
>
> Key: YARN-2772
> URL: https://issues.apache.org/jira/browse/YARN-2772
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>
> The new options "domain" and "create" options - they are not descriptive at 
> all. It is also not clear when view_acls and modify_acls need to be set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2772) DistributedShell's timeline related options are not clear

2014-10-29 Thread Vinod Kumar Vavilapalli (JIRA)

Vinod Kumar Vavilapalli created YARN-2772:
-

 Summary: DistributedShell's timeline related options are not clear
 Key: YARN-2772
 URL: https://issues.apache.org/jira/browse/YARN-2772
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen


The new options "domain" and "create" options - they are not descriptive at 
all. It is also not clear when view_acls and modify_acls need to be set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2771) DistributedShell's DSConstants are badly named

2014-10-29 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2771:
--
Component/s: applications/distributed-shell

> DistributedShell's DSConstants are badly named
> --
>
> Key: YARN-2771
> URL: https://issues.apache.org/jira/browse/YARN-2771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>
> I'd rather have underscores (DISTRIBUTED_SHELL_TIMELINE_DOMAIN instead of 
> DISTRIBUTEDSHELLTIMELINEDOMAIN).
> DISTRIBUTEDSHELLTIMELINEDOMAIN is added in this release, can we rename it to 
> be DISTRIBUTED_SHELL_TIMELINE_DOMAIN?
> For the old envs, we can just add new envs that point to the old-one and 
> deprecate the old ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-29 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188925#comment-14188925
 ] 

Wangda Tan commented on YARN-2698:
--

Hi [~vinodkv],
bq. YarnClient usually has simpler APIs (like returning a map) instead of 
directly exposing the response objects, let’s do that.
Addressed
bq. bin/yarn needs to be updated to use the new CLI
Addressed
bq. Overall, I didn’t realize we already have a node CLI already: Let’s just 
move the node to labels mappings to that CLI. We could keep the all-nodes 
mapping though.
The node CLI is major get labels from NodeReport, they're all running NMs, I 
suggest to keep node to labels mapping in node-labels CLI (as its name), and in 
the future we can add a "labels" field in NodeReport and nodeCLI
bq. “will return all labels in the cluster” -> “will return all accessible 
labels in the cluster”
I changed it to be ".. return all node labels" to make it consistent with java 
API names, please let me know if you disagree
bq. CLI for "node-labels -list” should drop the prefix “Node-labels=“
Addressed
bq. CLI for “node-labels -list -nodeId all”: Say Node instead of Host? And then 
simply make it “Node:nm:5432 -> label1, label2”
Addressed
bq. Move the node-cli tests into their own TestNodeLabelsCLI
Addressed
bq. Validate the help message for the new CLI.
Addressed 



> Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
> RMAdminCLI
> ---
>
> Key: YARN-2698
> URL: https://issues.apache.org/jira/browse/YARN-2698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, 
> YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, 
> YARN-2698-20141029-2.patch
>
>
> YARN RMAdminCLI and AdminService should have write API only, for other read 
> APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2014-10-29 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2495:

Attachment: YARN-2495.20141030-1.patch

Hi [~wangda],
I am uploading a patch with all the review comments fixed and with test cases, 
but i need to rebase it based on the latest code in trunk which i will do it 
tomorrow morning . With this patch you can review and if fine will submit the 
patch after re basing tomorrow


> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml or using script 
> suggested by [~aw])
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2698:
-
Attachment: YARN-2698-20141029-2.patch

> Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
> RMAdminCLI
> ---
>
> Key: YARN-2698
> URL: https://issues.apache.org/jira/browse/YARN-2698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, 
> YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, 
> YARN-2698-20141029-2.patch
>
>
> YARN RMAdminCLI and AdminService should have write API only, for other read 
> APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188891#comment-14188891
 ] 

Hadoop QA commented on YARN-2698:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677927/YARN-2698-20141029-1.patch
  against trunk revision ec63a3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapred.TestMRTimelineEventHandling

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5628//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5628//console

This message is automatically generated.

> Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
> RMAdminCLI
> ---
>
> Key: YARN-2698
> URL: https://issues.apache.org/jira/browse/YARN-2698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, 
> YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch
>
>
> YARN RMAdminCLI and AdminService should have write API only, for other read 
> APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188830#comment-14188830
 ] 

Hudson commented on YARN-2769:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6385 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6385/])
YARN-2769. Fixed the problem that timeline domain is not set in distributed 
shell AM when using shell_command on Windows. Contributed by Varun Vasudev. 
(zjshen: rev a8c120222047280234c3411ce1c1c9b17f08c851)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* hadoop-yarn-project/CHANGES.txt


> Timeline server domain not set correctly when using shell_command on Windows
> 
>
> Key: YARN-2769
> URL: https://issues.apache.org/jira/browse/YARN-2769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.6.0
>
> Attachments: apache-yarn-2769.0.patch
>
>
> The bug is caught by one of the unit tests which fails.
> {noformat}
> Running 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut
> testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 37.366 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
> at org.junit.Assert.assertEquals(Assert.java:115)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows

2014-10-29 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188796#comment-14188796
 ] 

Zhijie Shen commented on YARN-2769:
---

+1. The fix makes sense, and we have the test to cover the code path on 
windows. Will commit the patch.

> Timeline server domain not set correctly when using shell_command on Windows
> 
>
> Key: YARN-2769
> URL: https://issues.apache.org/jira/browse/YARN-2769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2769.0.patch
>
>
> The bug is caught by one of the unit tests which fails.
> {noformat}
> Running 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut
> testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 37.366 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
> at org.junit.Assert.assertEquals(Assert.java:115)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler

2014-10-29 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188787#comment-14188787
 ] 

Karthik Kambatla commented on YARN-2738:


Do we want to make it configurable per-queue from the beginning? How about just 
starting with global settings for all queues, and adding per-queue configs 
depending on usecases and user feedback? 

Comments on the patch itself:
# FairReservationSystem: The TODO is not clear to me. IAC, we should avoid 
orphan TODOs - can we file a follow-up JIRA and add a reference at the TODO.
# Spurious import changes in a couple of files.

> Add FairReservationSystem for FairScheduler
> ---
>
> Key: YARN-2738
> URL: https://issues.apache.org/jira/browse/YARN-2738
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2738.001.patch
>
>
> Need to create a FairReservationSystem that will implement ReservationSystem 
> for FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2771) DistributedShell's DSConstants are badly named

2014-10-29 Thread Vinod Kumar Vavilapalli (JIRA)

Vinod Kumar Vavilapalli created YARN-2771:
-

 Summary: DistributedShell's DSConstants are badly named
 Key: YARN-2771
 URL: https://issues.apache.org/jira/browse/YARN-2771
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen


I'd rather have underscores (DISTRIBUTED_SHELL_TIMELINE_DOMAIN instead of 
DISTRIBUTEDSHELLTIMELINEDOMAIN).

DISTRIBUTEDSHELLTIMELINEDOMAIN is added in this release, can we rename it to be 
DISTRIBUTED_SHELL_TIMELINE_DOMAIN?

For the old envs, we can just add new envs that point to the old-one and 
deprecate the old ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2765) Add leveldb-based implementation for RMStateStore

2014-10-29 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188755#comment-14188755
 ] 

Jason Lowe commented on YARN-2765:
--

I agree that the timeline server seems like a worthy candidate for rocksdb.  
IIUC rocksdb's main use-case over leveldb is better performance when the 
database is larger than the node's RAM, which is likely in the case of the 
timeline server.

bq. And one other merit I've heard about rocksdb is that it can ride on HDFS.

This is news to me.  I knew rocksdb could be used as a cache of data that came 
from HDFS or could be backed-up to HDFS, but I didn't think it could read/write 
directly to it as part of normal operations.

bq. There's a rocksdb jni which seems to have windows support: 
https://github.com/fusesource/rocksdbjni

Awesome, thanks for finding that.  I was looking at the standard org.rocksdb 
package.  Only concern with the fusesource option would be if it starts to 
diverge significantly from the standard one.  The API is already slightly 
different between the two, and the fusesource one hasn't been touched in a year 
while the org.rocksdb package was updated just last week.

Probably best to continue this conversation in a separate JIRA proposing we 
consider rocksdb for the timeline server.  If it works well there it should be 
very straightforward to provide store backends for the RM, NM, and JHS if it 
makes sense for them as well.

> Add leveldb-based implementation for RMStateStore
> -
>
> Key: YARN-2765
> URL: https://issues.apache.org/jira/browse/YARN-2765
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2765.patch, YARN-2765v2.patch
>
>
> It would be nice to have a leveldb option to the resourcemanager recovery 
> store. Leveldb would provide some benefits over the existing filesystem store 
> such as better support for atomic operations, fewer I/O ops per state update, 
> and far fewer total files on the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type

2014-10-29 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188739#comment-14188739
 ] 

Karthik Kambatla commented on YARN-2690:


Looks mostly good. Can we look into the javadoc warnings? 

Few minor comments:
# Rename ReservationSchedulerConfiguration to ReservationConfiguration? Not 
sure the Scheduler in there is adding much information. 
# Make ReservationConfiguration an abstract class that extends Configuration 
instead of an interface, so it can implement some of the getters at least those 
for which it carries defaults.
# Nit: The time defaults should be product of numbers instead of the result. 
e.g. {{24 * 60 * 60 * 1000}} instead of 8640L. 



> Make ReservationSystem and its dependent classes independent of Scheduler 
> type  
> 
>
> Key: YARN-2690
> URL: https://issues.apache.org/jira/browse/YARN-2690
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2690.001.patch, YARN-2690.002.patch, 
> YARN-2690.002.patch, YARN-2690.003.patch
>
>
> A lot of common reservation classes depend on CapacityScheduler and 
> specifically its configuration. This jira is to make them ready for other 
> Schedulers by abstracting out the configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_ dirs after YARN-661

2014-10-29 Thread Siqi Li (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188728#comment-14188728
 ] 

Siqi Li commented on YARN-2755:
---

Hi [~jlowe] can you take a look at this?

> NM fails to clean up usercache_DEL_ dirs after YARN-661
> --
>
> Key: YARN-2755
> URL: https://issues.apache.org/jira/browse/YARN-2755
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, 
> YARN-2755.v3.patch
>
>
> When NM restarts frequently due to some reason, a large number of directories 
> like these left in /data/disk$num/yarn/local/:
> /data/disk1/yarn/local/usercache_DEL_1414372756105
> /data/disk1/yarn/local/usercache_DEL_1413557901696
> /data/disk1/yarn/local/usercache_DEL_1413657004894
> /data/disk1/yarn/local/usercache_DEL_1413675321860
> /data/disk1/yarn/local/usercache_DEL_1414093167936
> /data/disk1/yarn/local/usercache_DEL_1413565841271
> These directories are empty, but take up 100M+ due to the number of them. 
> There were 38714 on the machine I looked at per data disk.
> It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails

2014-10-29 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188656#comment-14188656
 ] 

Zhijie Shen commented on YARN-2766:
---

[~rkanter], thanks for reporting the test failure. I can reproduce the same 
failure with JDK 8, but think about the problem again: it seems to be useless 
to return the map collection from ApplicationHistoryManager. And it creates the 
problem that we simply call .value() to get all report objects, making the 
order of the report objects are unpredictable on CLI or web services output. 
IMHO, ApplicationHistoryManager should return a sorted list directly.

> [JDK 8] TestApplicationHistoryClientService fails
> -
>
> Key: YARN-2766
> URL: https://issues.apache.org/jira/browse/YARN-2766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-2766.patch
>
>
> {{TestApplicationHistoryClientService.testContainers}} and 
> {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
> because the test assertions are assuming a returned Collection is in a 
> certain order.  The collection comes from a HashMap, so the order is not 
> guaranteed, plus, according to [this 
> page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
>  there are situations where the iteration order of a HashMap will be 
> different between Java 7 and 8.
> We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2742) FairSchedulerConfiguration should allow extra spaces between value and unit

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188639#comment-14188639
 ] 

Hudson commented on YARN-2742:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6382 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6382/])
YARN-2742. FairSchedulerConfiguration should allow extra spaces between value 
and unit. (Wei Yan via kasha) (kasha: rev 
782971ae7a0247bcf5920e10b434b7e0954dd868)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerConfiguration.java


> FairSchedulerConfiguration should allow extra spaces between value and unit
> ---
>
> Key: YARN-2742
> URL: https://issues.apache.org/jira/browse/YARN-2742
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.4.0
>Reporter: Sangjin Lee
>Assignee: Wei Yan
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2742-1.patch, YARN-2742-2.patch
>
>
> FairSchedulerConfiguration is very strict about the number of space 
> characters between the value and the unit: 0 or 1 space.
> For example, for values like the following:
> {noformat}
> 4096  mb, 2 vcores
> {noformat}
> (note 2 spaces)
> This above line fails to parse:
> {noformat}
> 2014-10-24 22:56:40,802 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService:
>  Failed to reload fair scheduler config file - will use existing allocations.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException:
>  Missing resource: mb
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2742) FairSchedulerConfiguration should allow extra spaces between value and unit

2014-10-29 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2742:
---
Summary: FairSchedulerConfiguration should allow extra spaces between value 
and unit  (was: FairSchedulerConfiguration fails to parse if there is extra 
space between value and unit)

> FairSchedulerConfiguration should allow extra spaces between value and unit
> ---
>
> Key: YARN-2742
> URL: https://issues.apache.org/jira/browse/YARN-2742
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.4.0
>Reporter: Sangjin Lee
>Assignee: Wei Yan
>Priority: Minor
> Attachments: YARN-2742-1.patch, YARN-2742-2.patch
>
>
> FairSchedulerConfiguration is very strict about the number of space 
> characters between the value and the unit: 0 or 1 space.
> For example, for values like the following:
> {noformat}
> 4096  mb, 2 vcores
> {noformat}
> (note 2 spaces)
> This above line fails to parse:
> {noformat}
> 2014-10-24 22:56:40,802 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService:
>  Failed to reload fair scheduler config file - will use existing allocations.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException:
>  Missing resource: mb
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2742) FairSchedulerConfiguration fails to parse if there is extra space between value and unit

2014-10-29 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188614#comment-14188614
 ] 

Karthik Kambatla commented on YARN-2742:


+1

> FairSchedulerConfiguration fails to parse if there is extra space between 
> value and unit
> 
>
> Key: YARN-2742
> URL: https://issues.apache.org/jira/browse/YARN-2742
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.4.0
>Reporter: Sangjin Lee
>Assignee: Wei Yan
>Priority: Minor
> Attachments: YARN-2742-1.patch, YARN-2742-2.patch
>
>
> FairSchedulerConfiguration is very strict about the number of space 
> characters between the value and the unit: 0 or 1 space.
> For example, for values like the following:
> {noformat}
> 4096  mb, 2 vcores
> {noformat}
> (note 2 spaces)
> This above line fails to parse:
> {noformat}
> 2014-10-24 22:56:40,802 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService:
>  Failed to reload fair scheduler config file - will use existing allocations.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException:
>  Missing resource: mb
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-2765) Add leveldb-based implementation for RMStateStore

2014-10-29 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188595#comment-14188595
 ] 

Zhijie Shen edited comment on YARN-2765 at 10/29/14 5:22 PM:
-

bq. This should work if the leveldb database is on a network store like a filer.

Thanks for sharing. This is an interesting use case that I'm not aware of 
before.

bq. I briefly considered using rocksdb for this but decided against it for a 
couple of reasons:

It's not particularly related to this Jira, but I just want to think it out 
loudly. It seems that rocksdb claims to have better performance in terms of I/O 
than leveldb, while their APIs are very similar to each other. After we have 
the leveldb impl, it shouldn't be that difficult to make a rocksdb impl. 
Probably leveldb is enough to serve as the state store for RM/NM/JHS, but the 
timeline server may want a stronger one. Rocksdb may be a compromise before 
migrating to fully distributed storage solution based on HBase. And one other 
merit I've heard about rocksdb is that it can ride on HDFS. Correct me if I'm 
wrong, but it seems that rocksdb can also help to scale out the storage problem 
as well as support RM HA deployment in a shared nothing environment (e.g. 
without a network storage).

I'm not saying we should go with rocksdb now instead of leveldb, as we know it 
has been used for other components already. I'm trying to propose if we can 
think of rocksdb, which looks stronger but still reasonably simple alternate.

There's a rocksdb jni which seems to have windows support: 
https://github.com/fusesource/rocksdbjni

It should be the same org whose leveldbjni is currently used by us.


was (Author: zjshen):
bq. This should work if the leveldb database is on a network store like a filer.

Thanks for sharing. This is an interesting use case that I'm not aware of 
before.

bq. I briefly considered using rocksdb for this but decided against it for a 
couple of reasons:

It's not particularly related to this Jira, but I just want to think it out 
loudly. It seems that rocksdb claims to have better performance in terms of I/O 
than leveldb, while their APIs are very similar to each other. After we have 
the leveldb impl, it shouldn't be that difficult to make a rocksdb impl. 
Probably leveldb is enough to serve as the state store for RM/NM/JHS, but the 
timeline server may want a stronger one. Rocksdb may be a compromise before 
migrating to fully distributed storage solution based on HBase. And one other 
merit I've heard about rocksdb is that it can ride on HDFS. Correct me if I'm 
wrong, but it seems that rocksdb can also help to scale out the storage problem 
as well as support RM HA deployment in a shared nothing environment (e.g. 
without a network storage).

I'm not saying we should go with rocksdb now instead of leveldb, as we know it 
has been used for other components already. I'm trying to propose if we can 
think of rocksdb, which looks stronger but still reasonably simple alternate.

> Add leveldb-based implementation for RMStateStore
> -
>
> Key: YARN-2765
> URL: https://issues.apache.org/jira/browse/YARN-2765
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2765.patch, YARN-2765v2.patch
>
>
> It would be nice to have a leveldb option to the resourcemanager recovery 
> store. Leveldb would provide some benefits over the existing filesystem store 
> such as better support for atomic operations, fewer I/O ops per state update, 
> and far fewer total files on the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2770) Timeline delegation tokens need to be automatically renewed by the RM

2014-10-29 Thread Zhijie Shen (JIRA)

Zhijie Shen created YARN-2770:
-

 Summary: Timeline delegation tokens need to be automatically 
renewed by the RM
 Key: YARN-2770
 URL: https://issues.apache.org/jira/browse/YARN-2770
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.5.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical


YarnClient will automatically grab a timeline DT for the application and pass 
it to the app AM. Now the timeline DT renew is still dummy. If an app is 
running for more than 24h (default DT expiry time), the app AM is no longer 
able to use the expired DT to communicate with the timeline server. Since RM 
will cache the credentials of each app, and renew the DTs for the running app. 
We should provider renew hooks similar to what HDFS DT has for RM, and set RM 
user as the renewer when grabbing the timeline DT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2765) Add leveldb-based implementation for RMStateStore

2014-10-29 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188595#comment-14188595
 ] 

Zhijie Shen commented on YARN-2765:
---

bq. This should work if the leveldb database is on a network store like a filer.

Thanks for sharing. This is an interesting use case that I'm not aware of 
before.

bq. I briefly considered using rocksdb for this but decided against it for a 
couple of reasons:

It's not particularly related to this Jira, but I just want to think it out 
loudly. It seems that rocksdb claims to have better performance in terms of I/O 
than leveldb, while their APIs are very similar to each other. After we have 
the leveldb impl, it shouldn't be that difficult to make a rocksdb impl. 
Probably leveldb is enough to serve as the state store for RM/NM/JHS, but the 
timeline server may want a stronger one. Rocksdb may be a compromise before 
migrating to fully distributed storage solution based on HBase. And one other 
merit I've heard about rocksdb is that it can ride on HDFS. Correct me if I'm 
wrong, but it seems that rocksdb can also help to scale out the storage problem 
as well as support RM HA deployment in a shared nothing environment (e.g. 
without a network storage).

I'm not saying we should go with rocksdb now instead of leveldb, as we know it 
has been used for other components already. I'm trying to propose if we can 
think of rocksdb, which looks stronger but still reasonably simple alternate.

> Add leveldb-based implementation for RMStateStore
> -
>
> Key: YARN-2765
> URL: https://issues.apache.org/jira/browse/YARN-2765
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2765.patch, YARN-2765v2.patch
>
>
> It would be nice to have a leveldb option to the resourcemanager recovery 
> store. Leveldb would provide some benefits over the existing filesystem store 
> such as better support for atomic operations, fewer I/O ops per state update, 
> and far fewer total files on the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows

2014-10-29 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2769:

Description: 
The bug is caught by one of the unit tests which fails.
{noformat}
Running 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec <<< 
FAILURE! - in org.apache.hadoop.yarn.applications.distribut
testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 37.366 sec  <<< FAILURE!
org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
{noformat}

  was:
{noformat}
Running 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec <<< 
FAILURE! - in org.apache.hadoop.yarn.applications.distribut
testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 37.366 sec  <<< FAILURE!
org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
{noformat}


> Timeline server domain not set correctly when using shell_command on Windows
> 
>
> Key: YARN-2769
> URL: https://issues.apache.org/jira/browse/YARN-2769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2769.0.patch
>
>
> The bug is caught by one of the unit tests which fails.
> {noformat}
> Running 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut
> testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 37.366 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
> at org.junit.Assert.assertEquals(Assert.java:115)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2765) Add leveldb-based implementation for RMStateStore

2014-10-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188589#comment-14188589
 ] 

Hadoop QA commented on YARN-2765:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677911/YARN-2765v2.patch
  against trunk revision ec63a3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5626//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5626//console

This message is automatically generated.

> Add leveldb-based implementation for RMStateStore
> -
>
> Key: YARN-2765
> URL: https://issues.apache.org/jira/browse/YARN-2765
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2765.patch, YARN-2765v2.patch
>
>
> It would be nice to have a leveldb option to the resourcemanager recovery 
> store. Leveldb would provide some benefits over the existing filesystem store 
> such as better support for atomic operations, fewer I/O ops per state update, 
> and far fewer total files on the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows

2014-10-29 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2769:

Summary: Timeline server domain not set correctly when using shell_command 
on Windows  (was: TestDistributedShell#testDSShell fails on Windows)

> Timeline server domain not set correctly when using shell_command on Windows
> 
>
> Key: YARN-2769
> URL: https://issues.apache.org/jira/browse/YARN-2769
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: applications/distributed-shell
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2769.0.patch
>
>
> {noformat}
> Running 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut
> testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 37.366 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
> at org.junit.Assert.assertEquals(Assert.java:115)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows

2014-10-29 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2769:

Issue Type: Bug  (was: Test)

> Timeline server domain not set correctly when using shell_command on Windows
> 
>
> Key: YARN-2769
> URL: https://issues.apache.org/jira/browse/YARN-2769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2769.0.patch
>
>
> {noformat}
> Running 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut
> testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 37.366 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
> at org.junit.Assert.assertEquals(Assert.java:115)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2698:
-
Attachment: YARN-2698-20141029-1.patch

> Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
> RMAdminCLI
> ---
>
> Key: YARN-2698
> URL: https://issues.apache.org/jira/browse/YARN-2698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, 
> YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch
>
>
> YARN RMAdminCLI and AdminService should have write API only, for other read 
> APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188530#comment-14188530
 ] 

Varun Vasudev commented on YARN-2769:
-

I haven't included any test since this is a fix for a test failing on Windows.

> TestDistributedShell#testDSShell fails on Windows
> -
>
> Key: YARN-2769
> URL: https://issues.apache.org/jira/browse/YARN-2769
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: applications/distributed-shell
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2769.0.patch
>
>
> {noformat}
> Running 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut
> testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 37.366 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
> at org.junit.Assert.assertEquals(Assert.java:115)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188526#comment-14188526
 ] 

Hadoop QA commented on YARN-2769:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677908/apache-yarn-2769.0.patch
  against trunk revision ec63a3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5627//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5627//console

This message is automatically generated.

> TestDistributedShell#testDSShell fails on Windows
> -
>
> Key: YARN-2769
> URL: https://issues.apache.org/jira/browse/YARN-2769
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: applications/distributed-shell
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2769.0.patch
>
>
> {noformat}
> Running 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut
> testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 37.366 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
> at org.junit.Assert.assertEquals(Assert.java:115)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2014-10-29 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188516#comment-14188516
 ] 

Bikas Saha commented on YARN-1902:
--

bq. Given a ContainerRequest x with Resource y, when addContainerRequest is 
called z times with x, allocate is called and at least one of the z allocated 
containers is started, then if another addContainerRequest call is done and 
subsequently an allocate call to the RM, (z+1) containers will be allocated, 
where 1 container is expected.

Firstly, I am not sure if the same ContainerRequest object can be passed 
multiple times in addContainerRequest. It should be different objects each time 
(even if they point to the same resource). This might have something to do with 
the internal book-keeping done for matching requests.

Secondly, after z requests are made and 1 allocation is received then z-1 
requests remain. If you are using AMRMClientImpl then its your (users) 
responsibility to call removeContainerRequest() for the request that was 
matched to this container. The AMRMClient does not know which of your z 
requests could be assigned to this container. So in the general case, it cannot 
automatically remove a request from the internal table because it does not know 
which request to remove. If the javadocs dont clarify these semantics then we 
can improve the javadocs.

Thirdly, the protocol between the AMRMClient and the RM has an inherent race. 
So if the client had earlier asked for z containers and in the next heartbeat 
reduces that to z-1, the RM may actually return z containers to it because it 
had already allocated them to this client before the client updated the RM with 
the new value.

> Allocation of too many containers when a second request is done with the same 
> resource capability
> -
>
> Key: YARN-1902
> URL: https://issues.apache.org/jira/browse/YARN-1902
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: Sietse T. Au
>  Labels: client
> Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch
>
>
> Regarding AMRMClientImpl
> Scenario 1:
> Given a ContainerRequest x with Resource y, when addContainerRequest is 
> called z times with x, allocate is called and at least one of the z allocated 
> containers is started, then if another addContainerRequest call is done and 
> subsequently an allocate call to the RM, (z+1) containers will be allocated, 
> where 1 container is expected.
> Scenario 2:
> No containers are started between the allocate calls. 
> Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
> are requested in both scenarios, but that only in the second scenario, the 
> correct behavior is observed.
> Looking at the implementation I have found that this (z+1) request is caused 
> by the structure of the remoteRequestsTable. The consequence of Map ResourceRequestInfo> is that ResourceRequestInfo does not hold any 
> information about whether a request has been sent to the RM yet or not.
> There are workarounds for this, such as releasing the excess containers 
> received.
> The solution implemented is to initialize a new ResourceRequest in 
> ResourceRequestInfo when a request has been successfully sent to the RM.
> The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2765) Add leveldb-based implementation for RMStateStore

2014-10-29 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-2765:
-
Attachment: YARN-2765v2.patch

Thanks for the review, Tsuyoshi!

bq. How about adding helper methods like getKeyPrefix/getNodePath for getting 
key prefix and node path?

Sure, added some helper methods to compute leveldb keys for various things.

bq. I found that the patch includes lots hard-coded "/". I think it's better to 
have private field SEPARATOR = "/". 

IMHO this makes the code less readable, similar to a code style like {{final 
int ONE = 1}}.  But I don't care too strongly about it and changed all 
occurrences to SEPARATOR.

For Zhijie's comments:

bq. One drawback I can think of is that while LeveldbRMStateStore is 
lightweight for single RM restarting, multiple RMs of HA are not able to share 
this single-host database.

This should work if the leveldb database is on a network store like a filer.  
Leveldb uses locks to prevent multiple processes from trying to access the 
database simultaneously, so there's a little bit of help for the fencing 
scenarios.  However the fencing script actions would have to do some extra work 
to force a poorly-behaving resourcemanager to let go of the locks so a standby 
RM can open the store and become active.

bq. Did you have a chance to think of an enhanced k/v db: rocksdb?

I briefly considered using rocksdb for this but decided against it for a couple 
of reasons:

* leveldb is already used by the timeline server and nodemanager, and I would 
rather avoid adding yet another new dependency for this
* leveldb supports win32/win64, but it doesn't appear that the standard 
rocksdbjni distribution has support for Windows.

> Add leveldb-based implementation for RMStateStore
> -
>
> Key: YARN-2765
> URL: https://issues.apache.org/jira/browse/YARN-2765
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2765.patch, YARN-2765v2.patch
>
>
> It would be nice to have a leveldb option to the resourcemanager recovery 
> store. Leveldb would provide some benefits over the existing filesystem store 
> such as better support for atomic operations, fewer I/O ops per state update, 
> and far fewer total files on the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2711) TestDefaultContainerExecutor#testContainerLaunchError fails on Windows

2014-10-29 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188472#comment-14188472
 ] 

Junping Du commented on YARN-2711:
--

Thanks [~vvasudev] for the patch and [~cwelch] for review! 
Patch looks good to me. Will commit it shortly.

> TestDefaultContainerExecutor#testContainerLaunchError fails on Windows
> --
>
> Key: YARN-2711
> URL: https://issues.apache.org/jira/browse/YARN-2711
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2711.0.patch
>
>
> The testContainerLaunchError test fails on Windows with the following error -
> {noformat}
> java.io.FileNotFoundException: File file:/bin/echo does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at org.apache.hadoop.fs.FilterFs.getFileStatus(FilterFs.java:120)
>   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1117)
>   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1113)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1113)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2019)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1978)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:145)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor.testContainerLaunchError(TestDefaultContainerExecutor.java:289)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2769:

Issue Type: Test  (was: Bug)

> TestDistributedShell#testDSShell fails on Windows
> -
>
> Key: YARN-2769
> URL: https://issues.apache.org/jira/browse/YARN-2769
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: applications/distributed-shell
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2769.0.patch
>
>
> {noformat}
> Running 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut
> testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 37.366 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
> at org.junit.Assert.assertEquals(Assert.java:115)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2769:

Description: 
{noformat}
Running 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec <<< 
FAILURE! - in org.apache.hadoop.yarn.applications.distribut
testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 37.366 sec  <<< FAILURE!
org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
{noformat}

  was:
Running 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec <<< 
FAILURE! - in org.apache.hadoop.yarn.applications.distribut
testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 37.366 sec  <<< FAILURE!
org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)


> TestDistributedShell#testDSShell fails on Windows
> -
>
> Key: YARN-2769
> URL: https://issues.apache.org/jira/browse/YARN-2769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2769.0.patch
>
>
> {noformat}
> Running 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut
> testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 37.366 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
> at org.junit.Assert.assertEquals(Assert.java:115)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188464#comment-14188464
 ] 

Varun Vasudev commented on YARN-2769:
-

Since we use shell_command in the test,
{noformat}
if (envs.containsKey(DSConstants.DISTRIBUTEDSHELLSCRIPTLOCATION)) {
{noformat}
is false on Windows(but true on Linux). Just moving the domain id setting out 
of this if-condition fixes the bug.

> TestDistributedShell#testDSShell fails on Windows
> -
>
> Key: YARN-2769
> URL: https://issues.apache.org/jira/browse/YARN-2769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2769.0.patch
>
>
> Running 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut
> testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 37.366 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
> at org.junit.Assert.assertEquals(Assert.java:115)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188465#comment-14188465
 ] 

Varun Vasudev commented on YARN-2769:
-

Attached fix.

> TestDistributedShell#testDSShell fails on Windows
> -
>
> Key: YARN-2769
> URL: https://issues.apache.org/jira/browse/YARN-2769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2769.0.patch
>
>
> Running 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut
> testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 37.366 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
> at org.junit.Assert.assertEquals(Assert.java:115)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2769:

Attachment: apache-yarn-2769.0.patch

> TestDistributedShell#testDSShell fails on Windows
> -
>
> Key: YARN-2769
> URL: https://issues.apache.org/jira/browse/YARN-2769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2769.0.patch
>
>
> Running 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut
> testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 37.366 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
> at org.junit.Assert.assertEquals(Assert.java:115)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2769:

Description: 
Running 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec <<< 
FAILURE! - in org.apache.hadoop.yarn.applications.distribut
testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 37.366 sec  <<< FAILURE!
org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)

> TestDistributedShell#testDSShell fails on Windows
> -
>
> Key: YARN-2769
> URL: https://issues.apache.org/jira/browse/YARN-2769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>
> Running 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut
> testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 37.366 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]>
> at org.junit.Assert.assertEquals(Assert.java:115)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Varun Vasudev (JIRA)

Varun Vasudev created YARN-2769:
---

 Summary: TestDistributedShell#testDSShell fails on Windows
 Key: YARN-2769
 URL: https://issues.apache.org/jira/browse/YARN-2769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Varun Vasudev
Assignee: Varun Vasudev






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188375#comment-14188375
 ] 

Hudson commented on YARN-2758:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/])
YARN-2758. Update TestApplicationHistoryClientService to use the new generic 
history store. Contributed by Zhijie Shen (xgong: rev 
69f79bee8b3da07bf42e22e35e58c7719782e31f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java


> Update TestApplicationHistoryClientService to use the new generic history 
> store
> ---
>
> Key: YARN-2758
> URL: https://issues.apache.org/jira/browse/YARN-2758
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2758.1.patch
>
>
> TestApplicationHistoryClientService is still testing against the mock data in 
> the old MemoryApplicationHistoryStore. hence it needs to be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188384#comment-14188384
 ] 

Hudson commented on YARN-2741:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/])
YARN-2741. Made NM web UI serve logs on the drive other than C: on Windows. 
Contributed by Craig Welch. (zjshen: rev 
8984e9b1774033e379b57da1bd30a5c81888c7a3)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ContainerLogsUtils.java


> Windows: Node manager cannot serve up log files via the web user interface 
> when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the 
> drive that nodemanager is running on)
> --
>
> Key: YARN-2741
> URL: https://issues.apache.org/jira/browse/YARN-2741
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
> Environment: Windows
>Reporter: Craig Welch
>Assignee: Craig Welch
> Fix For: 2.6.0
>
> Attachments: YARN-2741.1.patch, YARN-2741.6.patch
>
>
> PROBLEM: User is getting "No Logs available for Container Container_" 
> when setting the yarn.nodemanager.log-dirs to any drive letter other than C:
> STEPS TO REPRODUCE:
> On Windows
> 1) Run NodeManager on C:
> 2) Create two local drive partitions D: and E:
> 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs
> 4) Run a MR job that will last at least 5 minutes
> 5) While the job is in flight, log into the Yarn web ui , 
> /cluster
> 6) Click on the application_id
> 7) Click on the logs link, you will get "No Logs available for Container 
> Container_"
> ACTUAL BEHAVIOR: Getting an error message when viewing the container logs
> EXPECTED BEHAVIOR: Able to use different drive letters in 
> yarn.nodemanager.log-dirs and not get error
> NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able 
> to see the container logs while the MR job is in flight.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2747) TestAggregatedLogFormat fails in trunk

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188379#comment-14188379
 ] 

Hudson commented on YARN-2747:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/])
YARN-2747. Fixed the test failure of TestAggregatedLogFormat when native I/O is 
enabled. Contributed by Xuan Gong. (zjshen: rev 
ec63a3ffbd9413e7434594682fdbbd36eef7413c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java
* hadoop-yarn-project/CHANGES.txt


> TestAggregatedLogFormat fails in trunk
> --
>
> Key: YARN-2747
> URL: https://issues.apache.org/jira/browse/YARN-2747
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2747.1.patch
>
>
> Running org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.105 sec <<< 
> FAILURE! - in org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat
> testContainerLogsFileAccess(org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat)
>   Time elapsed: 0.047 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat.testContainerLogsFileAccess(TestAggregatedLogFormat.java:346)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2503) Changes in RM Web UI to better show labels to end users

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188374#comment-14188374
 ] 

Hudson commented on YARN-2503:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/])
YARN-2503. Added node lablels in web UI. Contributed by Wangda Tan (jianhe: rev 
d5e0a09721a5156fa2ee51ac1c32fbfd9905b8fb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
Missing CHANGES.txt for YARN-2503. (jianhe: rev 
0782f602881272392381486bcc749850f96acd22)
* hadoop-yarn-project/CHANGES.txt


> Changes in RM Web UI to better show labels to end users
> ---
>
> Key: YARN-2503
> URL: https://issues.apache.org/jira/browse/YARN-2503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.6.0
>
> Attachments: YARN-2503-20141022-1.patch, YARN-2503-20141028-1.patch, 
> YARN-2503.patch
>
>
> Include but not limited to:
> - Show labels of nodes in RM/nodes page
> - Show labels of queue in RM/scheduler page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188386#comment-14188386
 ] 

Hudson commented on YARN-2760:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/])
YARN-2760. Remove 'experimental' from FairScheduler docs. (Harsh J via kasha) 
(kasha: rev ade3727ecb092935dcc0f1291c1e6cf43d764a03)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm
* hadoop-yarn-project/CHANGES.txt


> Completely remove word 'experimental' from FairScheduler docs
> -
>
> Key: YARN-2760
> URL: https://issues.apache.org/jira/browse/YARN-2760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.1.0-beta
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Fix For: 2.6.0
>
> Attachments: YARN-2760.patch, YARN-2760.patch
>
>
> After YARN-1034, FairScheduler has not been 'experimental' in any aspect of 
> use, but the doc change done in that did not entirely cover removal of that 
> word, leaving a remnant in the preemption sub-point. This needs to be removed 
> as well, as the feature has been good to use for a long time now, and is not 
> experimental.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188343#comment-14188343
 ] 

Hudson commented on YARN-2760:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/])
YARN-2760. Remove 'experimental' from FairScheduler docs. (Harsh J via kasha) 
(kasha: rev ade3727ecb092935dcc0f1291c1e6cf43d764a03)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm
* hadoop-yarn-project/CHANGES.txt


> Completely remove word 'experimental' from FairScheduler docs
> -
>
> Key: YARN-2760
> URL: https://issues.apache.org/jira/browse/YARN-2760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.1.0-beta
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Fix For: 2.6.0
>
> Attachments: YARN-2760.patch, YARN-2760.patch
>
>
> After YARN-1034, FairScheduler has not been 'experimental' in any aspect of 
> use, but the doc change done in that did not entirely cover removal of that 
> word, leaving a remnant in the preemption sub-point. This needs to be removed 
> as well, as the feature has been good to use for a long time now, and is not 
> experimental.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188332#comment-14188332
 ] 

Hudson commented on YARN-2758:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/])
YARN-2758. Update TestApplicationHistoryClientService to use the new generic 
history store. Contributed by Zhijie Shen (xgong: rev 
69f79bee8b3da07bf42e22e35e58c7719782e31f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java
* hadoop-yarn-project/CHANGES.txt


> Update TestApplicationHistoryClientService to use the new generic history 
> store
> ---
>
> Key: YARN-2758
> URL: https://issues.apache.org/jira/browse/YARN-2758
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2758.1.patch
>
>
> TestApplicationHistoryClientService is still testing against the mock data in 
> the old MemoryApplicationHistoryStore. hence it needs to be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2503) Changes in RM Web UI to better show labels to end users

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188331#comment-14188331
 ] 

Hudson commented on YARN-2503:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/])
YARN-2503. Added node lablels in web UI. Contributed by Wangda Tan (jianhe: rev 
d5e0a09721a5156fa2ee51ac1c32fbfd9905b8fb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerQueueInfo.java
Missing CHANGES.txt for YARN-2503. (jianhe: rev 
0782f602881272392381486bcc749850f96acd22)
* hadoop-yarn-project/CHANGES.txt


> Changes in RM Web UI to better show labels to end users
> ---
>
> Key: YARN-2503
> URL: https://issues.apache.org/jira/browse/YARN-2503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.6.0
>
> Attachments: YARN-2503-20141022-1.patch, YARN-2503-20141028-1.patch, 
> YARN-2503.patch
>
>
> Include but not limited to:
> - Show labels of nodes in RM/nodes page
> - Show labels of queue in RM/scheduler page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188341#comment-14188341
 ] 

Hudson commented on YARN-2741:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/])
YARN-2741. Made NM web UI serve logs on the drive other than C: on Windows. 
Contributed by Craig Welch. (zjshen: rev 
8984e9b1774033e379b57da1bd30a5c81888c7a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ContainerLogsUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java
* hadoop-yarn-project/CHANGES.txt


> Windows: Node manager cannot serve up log files via the web user interface 
> when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the 
> drive that nodemanager is running on)
> --
>
> Key: YARN-2741
> URL: https://issues.apache.org/jira/browse/YARN-2741
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
> Environment: Windows
>Reporter: Craig Welch
>Assignee: Craig Welch
> Fix For: 2.6.0
>
> Attachments: YARN-2741.1.patch, YARN-2741.6.patch
>
>
> PROBLEM: User is getting "No Logs available for Container Container_" 
> when setting the yarn.nodemanager.log-dirs to any drive letter other than C:
> STEPS TO REPRODUCE:
> On Windows
> 1) Run NodeManager on C:
> 2) Create two local drive partitions D: and E:
> 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs
> 4) Run a MR job that will last at least 5 minutes
> 5) While the job is in flight, log into the Yarn web ui , 
> /cluster
> 6) Click on the application_id
> 7) Click on the logs link, you will get "No Logs available for Container 
> Container_"
> ACTUAL BEHAVIOR: Getting an error message when viewing the container logs
> EXPECTED BEHAVIOR: Able to use different drive letters in 
> yarn.nodemanager.log-dirs and not get error
> NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able 
> to see the container logs while the MR job is in flight.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2747) TestAggregatedLogFormat fails in trunk

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188336#comment-14188336
 ] 

Hudson commented on YARN-2747:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/])
YARN-2747. Fixed the test failure of TestAggregatedLogFormat when native I/O is 
enabled. Contributed by Xuan Gong. (zjshen: rev 
ec63a3ffbd9413e7434594682fdbbd36eef7413c)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java


> TestAggregatedLogFormat fails in trunk
> --
>
> Key: YARN-2747
> URL: https://issues.apache.org/jira/browse/YARN-2747
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2747.1.patch
>
>
> Running org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.105 sec <<< 
> FAILURE! - in org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat
> testContainerLogsFileAccess(org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat)
>   Time elapsed: 0.047 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat.testContainerLogsFileAccess(TestAggregatedLogFormat.java:346)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2503) Changes in RM Web UI to better show labels to end users

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188265#comment-14188265
 ] 

Hudson commented on YARN-2503:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/727/])
YARN-2503. Added node lablels in web UI. Contributed by Wangda Tan (jianhe: rev 
d5e0a09721a5156fa2ee51ac1c32fbfd9905b8fb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
Missing CHANGES.txt for YARN-2503. (jianhe: rev 
0782f602881272392381486bcc749850f96acd22)
* hadoop-yarn-project/CHANGES.txt


> Changes in RM Web UI to better show labels to end users
> ---
>
> Key: YARN-2503
> URL: https://issues.apache.org/jira/browse/YARN-2503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.6.0
>
> Attachments: YARN-2503-20141022-1.patch, YARN-2503-20141028-1.patch, 
> YARN-2503.patch
>
>
> Include but not limited to:
> - Show labels of nodes in RM/nodes page
> - Show labels of queue in RM/scheduler page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188275#comment-14188275
 ] 

Hudson commented on YARN-2741:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/727/])
YARN-2741. Made NM web UI serve logs on the drive other than C: on Windows. 
Contributed by Craig Welch. (zjshen: rev 
8984e9b1774033e379b57da1bd30a5c81888c7a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ContainerLogsUtils.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java


> Windows: Node manager cannot serve up log files via the web user interface 
> when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the 
> drive that nodemanager is running on)
> --
>
> Key: YARN-2741
> URL: https://issues.apache.org/jira/browse/YARN-2741
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
> Environment: Windows
>Reporter: Craig Welch
>Assignee: Craig Welch
> Fix For: 2.6.0
>
> Attachments: YARN-2741.1.patch, YARN-2741.6.patch
>
>
> PROBLEM: User is getting "No Logs available for Container Container_" 
> when setting the yarn.nodemanager.log-dirs to any drive letter other than C:
> STEPS TO REPRODUCE:
> On Windows
> 1) Run NodeManager on C:
> 2) Create two local drive partitions D: and E:
> 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs
> 4) Run a MR job that will last at least 5 minutes
> 5) While the job is in flight, log into the Yarn web ui , 
> /cluster
> 6) Click on the application_id
> 7) Click on the logs link, you will get "No Logs available for Container 
> Container_"
> ACTUAL BEHAVIOR: Getting an error message when viewing the container logs
> EXPECTED BEHAVIOR: Able to use different drive letters in 
> yarn.nodemanager.log-dirs and not get error
> NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able 
> to see the container logs while the MR job is in flight.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188277#comment-14188277
 ] 

Hudson commented on YARN-2760:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/727/])
YARN-2760. Remove 'experimental' from FairScheduler docs. (Harsh J via kasha) 
(kasha: rev ade3727ecb092935dcc0f1291c1e6cf43d764a03)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm
* hadoop-yarn-project/CHANGES.txt


> Completely remove word 'experimental' from FairScheduler docs
> -
>
> Key: YARN-2760
> URL: https://issues.apache.org/jira/browse/YARN-2760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.1.0-beta
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Fix For: 2.6.0
>
> Attachments: YARN-2760.patch, YARN-2760.patch
>
>
> After YARN-1034, FairScheduler has not been 'experimental' in any aspect of 
> use, but the doc change done in that did not entirely cover removal of that 
> word, leaving a remnant in the preemption sub-point. This needs to be removed 
> as well, as the feature has been good to use for a long time now, and is not 
> experimental.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188266#comment-14188266
 ] 

Hudson commented on YARN-2758:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/727/])
YARN-2758. Update TestApplicationHistoryClientService to use the new generic 
history store. Contributed by Zhijie Shen (xgong: rev 
69f79bee8b3da07bf42e22e35e58c7719782e31f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java


> Update TestApplicationHistoryClientService to use the new generic history 
> store
> ---
>
> Key: YARN-2758
> URL: https://issues.apache.org/jira/browse/YARN-2758
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2758.1.patch
>
>
> TestApplicationHistoryClientService is still testing against the mock data in 
> the old MemoryApplicationHistoryStore. hence it needs to be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2747) TestAggregatedLogFormat fails in trunk

2014-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188270#comment-14188270
 ] 

Hudson commented on YARN-2747:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/727/])
YARN-2747. Fixed the test failure of TestAggregatedLogFormat when native I/O is 
enabled. Contributed by Xuan Gong. (zjshen: rev 
ec63a3ffbd9413e7434594682fdbbd36eef7413c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java
* hadoop-yarn-project/CHANGES.txt


> TestAggregatedLogFormat fails in trunk
> --
>
> Key: YARN-2747
> URL: https://issues.apache.org/jira/browse/YARN-2747
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2747.1.patch
>
>
> Running org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.105 sec <<< 
> FAILURE! - in org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat
> testContainerLogsFileAccess(org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat)
>   Time elapsed: 0.047 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat.testContainerLogsFileAccess(TestAggregatedLogFormat.java:346)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188222#comment-14188222
 ] 

Hadoop QA commented on YARN-2768:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677855/YARN-2768.patch
  against trunk revision ec63a3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5625//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5625//console

This message is automatically generated.

> optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
> of computing time of update thread
> 
>
> Key: YARN-2768
> URL: https://issues.apache.org/jira/browse/YARN-2768
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
> Attachments: YARN-2768.patch, profiling_FairScheduler_update.png
>
>
> See the attached picture of profiling result. The clone of Resource object 
> within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
> function FairScheduler.update().
> The code of FSAppAttempt.updateDemand:
> {code}
> public void updateDemand() {
> demand = Resources.createResource(0);
> // Demand is current consumption plus outstanding requests
> Resources.addTo(demand, app.getCurrentConsumption());
> // Add up outstanding resource requests
> synchronized (app) {
>   for (Priority p : app.getPriorities()) {
> for (ResourceRequest r : app.getResourceRequests(p).values()) {
>   Resource total = Resources.multiply(r.getCapability(), 
> r.getNumContainers());
>   Resources.addTo(demand, total);
> }
>   }
> }
>   }
> {code}
> The code of Resources.multiply:
> {code}
> public static Resource multiply(Resource lhs, double by) {
> return multiplyTo(clone(lhs), by);
> }
> {code}
> The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2014-10-29 Thread Yogesh Sobale (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188190#comment-14188190
 ] 

Yogesh Sobale commented on YARN-1902:
-

Can someone please update ?

> Allocation of too many containers when a second request is done with the same 
> resource capability
> -
>
> Key: YARN-1902
> URL: https://issues.apache.org/jira/browse/YARN-1902
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: Sietse T. Au
>  Labels: client
> Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch
>
>
> Regarding AMRMClientImpl
> Scenario 1:
> Given a ContainerRequest x with Resource y, when addContainerRequest is 
> called z times with x, allocate is called and at least one of the z allocated 
> containers is started, then if another addContainerRequest call is done and 
> subsequently an allocate call to the RM, (z+1) containers will be allocated, 
> where 1 container is expected.
> Scenario 2:
> No containers are started between the allocate calls. 
> Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
> are requested in both scenarios, but that only in the second scenario, the 
> correct behavior is observed.
> Looking at the implementation I have found that this (z+1) request is caused 
> by the structure of the remoteRequestsTable. The consequence of Map ResourceRequestInfo> is that ResourceRequestInfo does not hold any 
> information about whether a request has been sent to the RM yet or not.
> There are workarounds for this, such as releasing the excess containers 
> received.
> The solution implemented is to initialize a new ResourceRequest in 
> ResourceRequestInfo when a request has been successfully sent to the RM.
> The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2014-10-29 Thread Yogesh Sobale (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188188#comment-14188188
 ] 

Yogesh Sobale commented on YARN-1902:
-

Can some please update ?

> Allocation of too many containers when a second request is done with the same 
> resource capability
> -
>
> Key: YARN-1902
> URL: https://issues.apache.org/jira/browse/YARN-1902
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: Sietse T. Au
>  Labels: client
> Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch
>
>
> Regarding AMRMClientImpl
> Scenario 1:
> Given a ContainerRequest x with Resource y, when addContainerRequest is 
> called z times with x, allocate is called and at least one of the z allocated 
> containers is started, then if another addContainerRequest call is done and 
> subsequently an allocate call to the RM, (z+1) containers will be allocated, 
> where 1 container is expected.
> Scenario 2:
> No containers are started between the allocate calls. 
> Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
> are requested in both scenarios, but that only in the second scenario, the 
> correct behavior is observed.
> Looking at the implementation I have found that this (z+1) request is caused 
> by the structure of the remoteRequestsTable. The consequence of Map ResourceRequestInfo> is that ResourceRequestInfo does not hold any 
> information about whether a request has been sent to the RM yet or not.
> There are workarounds for this, such as releasing the excess containers 
> received.
> The solution implemented is to initialize a new ResourceRequest in 
> ResourceRequestInfo when a request has been successfully sent to the RM.
> The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2767) RM web services - add test case to ensure the http static user can kill or submit apps in secure mode

2014-10-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188182#comment-14188182
 ] 

Hadoop QA commented on YARN-2767:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677848/apache-yarn-2767.1.patch
  against trunk revision ec63a3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5623//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5623//console

This message is automatically generated.

> RM web services - add test case to ensure the http static user can kill or 
> submit apps in secure mode
> -
>
> Key: YARN-2767
> URL: https://issues.apache.org/jira/browse/YARN-2767
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2767.0.patch, apache-yarn-2767.1.patch
>
>
> We should add a test to ensure that the http static user used to access the 
> RM web interface can't submit or kill apps if the cluster is running in 
> secure mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hong Zhiguo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2768:
--
Attachment: YARN-2768.patch

Avoid the clone by adding a ternary operator Resources.multiplyAndAddTo.
After this optimization, the average time costed by FairScheduler.update (a 
TestCase with 10k apps) is reduced 40%.

I'm not sure whether it's better to have such test cases also submitted.

> optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
> of computing time of update thread
> 
>
> Key: YARN-2768
> URL: https://issues.apache.org/jira/browse/YARN-2768
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
> Attachments: YARN-2768.patch, profiling_FairScheduler_update.png
>
>
> See the attached picture of profiling result. The clone of Resource object 
> within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
> function FairScheduler.update().
> The code of FSAppAttempt.updateDemand:
> {code}
> public void updateDemand() {
> demand = Resources.createResource(0);
> // Demand is current consumption plus outstanding requests
> Resources.addTo(demand, app.getCurrentConsumption());
> // Add up outstanding resource requests
> synchronized (app) {
>   for (Priority p : app.getPriorities()) {
> for (ResourceRequest r : app.getResourceRequests(p).values()) {
>   Resource total = Resources.multiply(r.getCapability(), 
> r.getNumContainers());
>   Resources.addTo(demand, total);
> }
>   }
> }
>   }
> {code}
> The code of Resources.multiply:
> {code}
> public static Resource multiply(Resource lhs, double by) {
> return multiplyTo(clone(lhs), by);
> }
> {code}
> The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188146#comment-14188146
 ] 

Hadoop QA commented on YARN-2768:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677853/profiling_FairScheduler_update.png
  against trunk revision ec63a3f.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5624//console

This message is automatically generated.

> optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
> of computing time of update thread
> 
>
> Key: YARN-2768
> URL: https://issues.apache.org/jira/browse/YARN-2768
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
> Attachments: profiling_FairScheduler_update.png
>
>
> See the attached picture of profiling result. The clone of Resource object 
> within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
> function FairScheduler.update().
> The code of FSAppAttempt.updateDemand:
> {code}
> public void updateDemand() {
> demand = Resources.createResource(0);
> // Demand is current consumption plus outstanding requests
> Resources.addTo(demand, app.getCurrentConsumption());
> // Add up outstanding resource requests
> synchronized (app) {
>   for (Priority p : app.getPriorities()) {
> for (ResourceRequest r : app.getResourceRequests(p).values()) {
>   Resource total = Resources.multiply(r.getCapability(), 
> r.getNumContainers());
>   Resources.addTo(demand, total);
> }
>   }
> }
>   }
> {code}
> The code of Resources.multiply:
> {code}
> public static Resource multiply(Resource lhs, double by) {
> return multiplyTo(clone(lhs), by);
> }
> {code}
> The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hong Zhiguo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2768:
--
Description: 
See the attached picture of profiling result. The clone of Resource object 
within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
function FairScheduler.update().

The code of FSAppAttempt.updateDemand:
{code}
public void updateDemand() {
demand = Resources.createResource(0);
// Demand is current consumption plus outstanding requests
Resources.addTo(demand, app.getCurrentConsumption());

// Add up outstanding resource requests
synchronized (app) {
  for (Priority p : app.getPriorities()) {
for (ResourceRequest r : app.getResourceRequests(p).values()) {
  Resource total = Resources.multiply(r.getCapability(), 
r.getNumContainers());
  Resources.addTo(demand, total);
}
  }
}
  }
{code}

The code of Resources.multiply:
{code}
public static Resource multiply(Resource lhs, double by) {
return multiplyTo(clone(lhs), by);
}
{code}

The clone could be skipped by directly update the value of this.demand.

  was:
See the attached picture of profiling result. The clone of Resource object 
within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
function FairScheduler.update().

The code of FSAppAttempt.updateDemand:
{code}
public void updateDemand() {
demand = Resources.createResource(0);
// Demand is current consumption plus outstanding requests
Resources.addTo(demand, app.getCurrentConsumption());

// Add up outstanding resource requests
synchronized (app) {
  for (Priority p : app.getPriorities()) {
for (ResourceRequest r : app.getResourceRequests(p).values()) {
  Resource total = Resources.**multiply**(r.getCapability(), 
r.getNumContainers());
  Resources.addTo(demand, total);
}
  }
}
  }
{code}

The code of Resources.multiply:
{code}
public static Resource multiply(Resource lhs, double by) {
return multiplyTo(**clone**(lhs), by);
}
{code}

The clone could be skipped by directly update the value of this.demand.


> optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
> of computing time of update thread
> 
>
> Key: YARN-2768
> URL: https://issues.apache.org/jira/browse/YARN-2768
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
> Attachments: profiling_FairScheduler_update.png
>
>
> See the attached picture of profiling result. The clone of Resource object 
> within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
> function FairScheduler.update().
> The code of FSAppAttempt.updateDemand:
> {code}
> public void updateDemand() {
> demand = Resources.createResource(0);
> // Demand is current consumption plus outstanding requests
> Resources.addTo(demand, app.getCurrentConsumption());
> // Add up outstanding resource requests
> synchronized (app) {
>   for (Priority p : app.getPriorities()) {
> for (ResourceRequest r : app.getResourceRequests(p).values()) {
>   Resource total = Resources.multiply(r.getCapability(), 
> r.getNumContainers());
>   Resources.addTo(demand, total);
> }
>   }
> }
>   }
> {code}
> The code of Resources.multiply:
> {code}
> public static Resource multiply(Resource lhs, double by) {
> return multiplyTo(clone(lhs), by);
> }
> {code}
> The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hong Zhiguo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2768:
--
Attachment: profiling_FairScheduler_update.png

> optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
> of computing time of update thread
> 
>
> Key: YARN-2768
> URL: https://issues.apache.org/jira/browse/YARN-2768
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
> Attachments: profiling_FairScheduler_update.png
>
>
> See the attached picture of profiling result. The clone of Resource object 
> within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
> function FairScheduler.update().
> The code of FSAppAttempt.updateDemand:
> {code}
> public void updateDemand() {
> demand = Resources.createResource(0);
> // Demand is current consumption plus outstanding requests
> Resources.addTo(demand, app.getCurrentConsumption());
> // Add up outstanding resource requests
> synchronized (app) {
>   for (Priority p : app.getPriorities()) {
> for (ResourceRequest r : app.getResourceRequests(p).values()) {
>   Resource total = Resources.multiply(r.getCapability(), 
> r.getNumContainers());
>   Resources.addTo(demand, total);
> }
>   }
> }
>   }
> {code}
> The code of Resources.multiply:
> {code}
> public static Resource multiply(Resource lhs, double by) {
> return multiplyTo(clone(lhs), by);
> }
> {code}
> The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hong Zhiguo (JIRA)

Hong Zhiguo created YARN-2768:
-

 Summary: optimize FSAppAttempt.updateDemand by avoid clone of 
Resource which takes 85% of computing time of update thread
 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor


See the attached picture of profiling result. The clone of Resource object 
within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
function FairScheduler.update().

The code of FSAppAttempt.updateDemand:
{code}
public void updateDemand() {
demand = Resources.createResource(0);
// Demand is current consumption plus outstanding requests
Resources.addTo(demand, app.getCurrentConsumption());

// Add up outstanding resource requests
synchronized (app) {
  for (Priority p : app.getPriorities()) {
for (ResourceRequest r : app.getResourceRequests(p).values()) {
  Resource total = Resources.**multiply**(r.getCapability(), 
r.getNumContainers());
  Resources.addTo(demand, total);
}
  }
}
  }
{code}

The code of Resources.multiply:
{code}
public static Resource multiply(Resource lhs, double by) {
return multiplyTo(**clone**(lhs), by);
}
{code}

The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 105 matches

Mail list logo