date:20131017

[jira] [Updated] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-17 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1172:
-

Attachment: YARN-1172.7.patch

> Convert *SecretManagers in the RM to services
> -
>
> Key: YARN-1172
> URL: https://issues.apache.org/jira/browse/YARN-1172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch, 
> YARN-1172.4.patch, YARN-1172.5.patch, YARN-1172.6.patch, YARN-1172.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798793#comment-13798793
 ] 

Hadoop QA commented on YARN-1172:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609091/YARN-1172.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 21 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2217//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2217//console

This message is automatically generated.

> Convert *SecretManagers in the RM to services
> -
>
> Key: YARN-1172
> URL: https://issues.apache.org/jira/browse/YARN-1172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch, 
> YARN-1172.4.patch, YARN-1172.5.patch, YARN-1172.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-17 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1172:
-

Attachment: YARN-1172.6.patch

Fixed to pass tests.

> Convert *SecretManagers in the RM to services
> -
>
> Key: YARN-1172
> URL: https://issues.apache.org/jira/browse/YARN-1172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch, 
> YARN-1172.4.patch, YARN-1172.5.patch, YARN-1172.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798727#comment-13798727
 ] 

Hadoop QA commented on YARN-1185:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609080/YARN-1185.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestRMStateStore

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2216//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2216//console

This message is automatically generated.

> FileSystemRMStateStore can leave partial files that prevent subsequent 
> recovery
> ---
>
> Key: YARN-1185
> URL: https://issues.apache.org/jira/browse/YARN-1185
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1185.1.patch, YARN-1185.2.patch
>
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partially-written file and either outright 
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to 
> the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1305) RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException

2013-10-17 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798714#comment-13798714
 ] 

Bikas Saha commented on YARN-1305:
--

Showing the invalid value will help debugging
{noformat}
2013-10-16 17:44:40,467 INFO org.apache.hadoop.service.AbstractService: Service 
RMHAProtocolService failed in state INITED; cause: 
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Invalid configuration! 
Invalid value of yarn.resourcemanager.ha.id
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Invalid configuration! 
Invalid value of yarn.resourcemanager.ha.id
at 
org.apache.hadoop.yarn.conf.HAUtil.throwBadConfigurationException(HAUtil.java:48)
{noformat}

Should it be an error that HA is enabled but RM_HA_IDS is not set to have 
multiple values of RM Id's?
{code}
+// simulate the case YarnConfiguration.RM_HA_IDS is not set
+conf.clear();
+conf.set(YarnConfiguration.RM_HA_ID, RM1_NODE_ID);
+for (String confKey : HAUtil.RPC_ADDRESS_CONF_KEYS) {
+  conf.set(HAUtil.addSuffix(confKey, RM1_NODE_ID), RM1_ADDRESS);
+}
+try {
+  HAUtil.setAllRpcAddresses(conf);
+} catch (Exception e) {
+  fail("Should not throw any exception" +
+"even if YarnConfiguration.RM_HA_IDS is not set");
{code}

> RMHAProtocolService#serviceInit should handle HAUtil's 
> IllegalArgumentException
> ---
>
> Key: YARN-1305
> URL: https://issues.apache.org/jira/browse/YARN-1305
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.1
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
>  Labels: ha
> Attachments: YARN-1305.1.patch, YARN-1305.2.patch, YARN-1305.3.patch
>
>
> When yarn.resourcemanager.ha.enabled is true, RMHAProtocolService#serviceInit 
> calls HAUtil.setAllRpcAddresses. If the configuration values are null, it 
> just throws IllegalArgumentException.
> It's messy to analyse which keys are null, so we should handle it and log the 
> name of keys which are null.
> A current log dump is as follows:
> {code}
> 2013-10-15 06:24:53,431 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered 
> UNIX signal handlers for [TERM, HUP, INT]
> 2013-10-15 06:24:54,203 INFO org.apache.hadoop.service.AbstractService: 
> Service RMHAProtocolService failed in state INITED; cause: 
> java.lang.IllegalArgumentException: Property value must not be null
> java.lang.IllegalArgumentException: Property value must not be null
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:816)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:798)
> at org.apache.hadoop.yarn.conf.HAUtil.setConfValue(HAUtil.java:100)
> at 
> org.apache.hadoop.yarn.conf.HAUtil.setAllRpcAddresses(HAUtil.java:105)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceInit(RMHAProtocolService.java:60)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:940)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-17 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798711#comment-13798711
 ] 

Omkar Vinit Joshi commented on YARN-1185:
-

Thanks [~vinodkv] and [~jianhe].

bq. Can you please rip apart TestRMStateStore into two tests (files) - 
TestFileSystemRMStateStore and TestZKRMStateStore but use common code?
done.
bq. Also, to indicate corruption, instead of .tmp file, we can try to a 
state-store write with a partial record and try to recover from that.
I am already doing this.
bq. The test case may also better to assert in the end that the corrupted 
application/attempt is not loaded back in RMState and doesn't exist in 
FileSystem
Done.

Attaching a new patch.

> FileSystemRMStateStore can leave partial files that prevent subsequent 
> recovery
> ---
>
> Key: YARN-1185
> URL: https://issues.apache.org/jira/browse/YARN-1185
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1185.1.patch, YARN-1185.2.patch
>
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partially-written file and either outright 
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to 
> the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-17 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1185:


Attachment: YARN-1185.2.patch

> FileSystemRMStateStore can leave partial files that prevent subsequent 
> recovery
> ---
>
> Key: YARN-1185
> URL: https://issues.apache.org/jira/browse/YARN-1185
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1185.1.patch, YARN-1185.2.patch
>
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partially-written file and either outright 
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to 
> the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798705#comment-13798705
 ] 

Hadoop QA commented on YARN-1172:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609072/YARN-1172.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  org.apache.hadoop.yarn.server.resourcemanager.TestRM
  
org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
  org.apache.hadoop.yarn.server.TestContainerManagerSecurity
  org.apache.hadoop.yarn.server.TestRMNMSecretKeys
  org.apache.hadoop.yarn.server.TestDiskFailures

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2215//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2215//console

This message is automatically generated.

> Convert *SecretManagers in the RM to services
> -
>
> Key: YARN-1172
> URL: https://issues.apache.org/jira/browse/YARN-1172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch, 
> YARN-1172.4.patch, YARN-1172.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1305) RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException

2013-10-17 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798694#comment-13798694
 ] 

Tsuyoshi OZAWA commented on YARN-1305:
--

I appreciate if someone can review this JIRA.

> RMHAProtocolService#serviceInit should handle HAUtil's 
> IllegalArgumentException
> ---
>
> Key: YARN-1305
> URL: https://issues.apache.org/jira/browse/YARN-1305
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.1
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
>  Labels: ha
> Attachments: YARN-1305.1.patch, YARN-1305.2.patch, YARN-1305.3.patch
>
>
> When yarn.resourcemanager.ha.enabled is true, RMHAProtocolService#serviceInit 
> calls HAUtil.setAllRpcAddresses. If the configuration values are null, it 
> just throws IllegalArgumentException.
> It's messy to analyse which keys are null, so we should handle it and log the 
> name of keys which are null.
> A current log dump is as follows:
> {code}
> 2013-10-15 06:24:53,431 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered 
> UNIX signal handlers for [TERM, HUP, INT]
> 2013-10-15 06:24:54,203 INFO org.apache.hadoop.service.AbstractService: 
> Service RMHAProtocolService failed in state INITED; cause: 
> java.lang.IllegalArgumentException: Property value must not be null
> java.lang.IllegalArgumentException: Property value must not be null
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:816)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:798)
> at org.apache.hadoop.yarn.conf.HAUtil.setConfValue(HAUtil.java:100)
> at 
> org.apache.hadoop.yarn.conf.HAUtil.setAllRpcAddresses(HAUtil.java:105)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceInit(RMHAProtocolService.java:60)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:940)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1139) [Umbrella] Convert all RM components to Services

2013-10-17 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798690#comment-13798690
 ] 

Tsuyoshi OZAWA commented on YARN-1139:
--

Thank you for sharing the knowledge, [~ste...@apache.org]! I created patch on 
YARN-1172 based on the design you mentioned - overriding 
serviceInit()/serviceStart()/serviceStop(). I'll also take this approach on 
this JIRA, because of the reason you mentioned(e.g. easy error handling).

> [Umbrella] Convert all RM components to Services
> 
>
> Key: YARN-1139
> URL: https://issues.apache.org/jira/browse/YARN-1139
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
>
> Some of the RM components - state store, scheduler etc. are not services. 
> Converting them to services goes well with the "Always On" and "Active" 
> service separation proposed on YARN-1098.
> Given that some of them already have start(), stop() methods, it should not 
> be too hard to convert them to services.
> That would also be a cleaner way of addressing YARN-1125.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-17 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1172:
-

Attachment: YARN-1172.5.patch

Updated a patch to pass compile

> Convert *SecretManagers in the RM to services
> -
>
> Key: YARN-1172
> URL: https://issues.apache.org/jira/browse/YARN-1172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch, 
> YARN-1172.4.patch, YARN-1172.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798679#comment-13798679
 ] 

Hadoop QA commented on YARN-415:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609068/YARN-415--n8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2213//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2213//console

This message is automatically generated.

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n2.patch, YARN-415--n3.patch, 
> YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, 
> YARN-415--n7.patch, YARN-415--n8.patch, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798676#comment-13798676
 ] 

Hadoop QA commented on YARN-1172:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609070/YARN-1172.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2214//console

This message is automatically generated.

> Convert *SecretManagers in the RM to services
> -
>
> Key: YARN-1172
> URL: https://issues.apache.org/jira/browse/YARN-1172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch, 
> YARN-1172.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-17 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798673#comment-13798673
 ] 

Tsuyoshi OZAWA commented on YARN-1172:
--

I noticed that I forgot to add some test related file.

> Convert *SecretManagers in the RM to services
> -
>
> Key: YARN-1172
> URL: https://issues.apache.org/jira/browse/YARN-1172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch, 
> YARN-1172.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-17 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1172:
-

Attachment: YARN-1172.4.patch

Updated diff format.

> Convert *SecretManagers in the RM to services
> -
>
> Key: YARN-1172
> URL: https://issues.apache.org/jira/browse/YARN-1172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch, 
> YARN-1172.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-10-17 Thread Andrey Klochkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated YARN-415:
-

Attachment: YARN-415--n8.patch

Adding changes in REST API docs to the patch.

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n2.patch, YARN-415--n3.patch, 
> YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, 
> YARN-415--n7.patch, YARN-415--n8.patch, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-891) Store completed application information in RM state store

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798652#comment-13798652
 ] 

Hadoop QA commented on YARN-891:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609062/YARN-891.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2211//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2211//console

This message is automatically generated.

> Store completed application information in RM state store
> -
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-891.patch, YARN-891.patch, YARN-891.patch, 
> YARN-891.patch, YARN-891.patch, YARN-891.patch
>
>
> Add information like exit status etc for the completed attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798640#comment-13798640
 ] 

Hadoop QA commented on YARN-1172:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609064/YARN-1172.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2212//console

This message is automatically generated.

> Convert *SecretManagers in the RM to services
> -
>
> Key: YARN-1172
> URL: https://issues.apache.org/jira/browse/YARN-1172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-17 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1172:
-

Attachment: YARN-1172.3.patch

Updated the design and changed *SecretManager to init/start/stop via 
initService/serviceStart/serviceStop by using addService.

In this patch, code duplication is restricted by using SecretManagerService and 
DelegationTokenSecretManagerService. These classes are base classes to convert 
*SecretManager to Service.  SecretManagerService and 
DelegationTokenSecretManagerService accepts ServiceHandler instance from child 
classes to callback serviceInit/serviceStart/serviceStop.

> Convert *SecretManagers in the RM to services
> -
>
> Key: YARN-1172
> URL: https://issues.apache.org/jira/browse/YARN-1172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1288) Make Fair Scheduler ACLs more user friendly

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798621#comment-13798621
 ] 

Hadoop QA commented on YARN-1288:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609055/YARN-1288-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2210//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2210//console

This message is automatically generated.

> Make Fair Scheduler ACLs more user friendly
> ---
>
> Key: YARN-1288
> URL: https://issues.apache.org/jira/browse/YARN-1288
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1288-1.patch, YARN-1288-2.patch, YARN-1288.patch
>
>
> The Fair Scheduler currently defaults the root queue's acl to empty and all 
> other queues' acl to "*".  Now that YARN-1258 enables configuring the root 
> queue, we should reverse this.  This will also bring the Fair Scheduler in 
> line with the Capacity Scheduler.
> We should also not trim the acl strings, which makes it impossible to only 
> specify groups in an acl.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-891) Store completed application information in RM state store

2013-10-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-891:
-

Attachment: YARN-891.patch

> Store completed application information in RM state store
> -
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-891.patch, YARN-891.patch, YARN-891.patch, 
> YARN-891.patch, YARN-891.patch, YARN-891.patch
>
>
> Add information like exit status etc for the completed attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

2013-10-17 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798620#comment-13798620
 ] 

Omkar Vinit Joshi commented on YARN-1210:
-

Summarizing current patch.
* After RMAppAttempts are recovered then all of the attempts are moved into 
LAUNCHED state. After YARN-891 we will know the state of the earlier finished 
application attempts; so then based on that we can decide where the current app 
attempt should transition to. On RECOVER event
** It will move to LAUNCHED state if it is was the last running app attempt
** It will move to FAILED / KILLED /..other terminal application attempt state.
* When NM RESYNCs containers will be killed and then NM will re-register with 
RM passing already running containers. On RM side if any of the container turns 
out to be earlier AM container then we will fail that app attempt and 
immediately start new app attempt. However if we don't get AM's finished 
containerId during furture NM register then after some time AMLivelinessMonitor 
will expire and will fail the running app attempt and start a new one.


> During RM restart, RM should start a new attempt only when previous attempt 
> exits for real
> --
>
> Key: YARN-1210
> URL: https://issues.apache.org/jira/browse/YARN-1210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1210.1.patch
>
>
> When RM recovers, it can wait for existing AMs to contact RM back and then 
> kill them forcefully before even starting a new AM. Worst case, RM will start 
> a new AppAttempt after waiting for 10 mins ( the expiry interval). This way 
> we'll minimize multiple AMs racing with each other. This can help issues with 
> downstream components like Pig, Hive and Oozie during RM restart.
> In the mean while, new apps will proceed as usual as existing apps wait for 
> recovery.
> This can continue to be useful after work-preserving restart, so that AMs 
> which can properly sync back up with RM can continue to run and those that 
> don't are guaranteed to be killed before starting a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1288) Make Fair Scheduler ACLs more user friendly

2013-10-17 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798601#comment-13798601
 ] 

Alejandro Abdelnur commented on YARN-1288:
--

+1. Would be possible in the docs to make it more explicit that the behavior is 
an OR of all the ACLs from the root to the leaf queue?

> Make Fair Scheduler ACLs more user friendly
> ---
>
> Key: YARN-1288
> URL: https://issues.apache.org/jira/browse/YARN-1288
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1288-1.patch, YARN-1288-2.patch, YARN-1288.patch
>
>
> The Fair Scheduler currently defaults the root queue's acl to empty and all 
> other queues' acl to "*".  Now that YARN-1258 enables configuring the root 
> queue, we should reverse this.  This will also bring the Fair Scheduler in 
> line with the Capacity Scheduler.
> We should also not trim the acl strings, which makes it impossible to only 
> specify groups in an acl.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1288) Make Fair Scheduler ACLs more user friendly

2013-10-17 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798586#comment-13798586
 ] 

Sandy Ryza commented on YARN-1288:
--

For the record, the old default behavior was that the root queue would have 
ACLs that defaulted to nobody and all other queues would default to everybody.  
The patch switches this, so now the root queue defaults to everybody and all 
other queues default to nobody.

> Make Fair Scheduler ACLs more user friendly
> ---
>
> Key: YARN-1288
> URL: https://issues.apache.org/jira/browse/YARN-1288
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1288-1.patch, YARN-1288-2.patch, YARN-1288.patch
>
>
> The Fair Scheduler currently defaults the root queue's acl to empty and all 
> other queues' acl to "*".  Now that YARN-1258 enables configuring the root 
> queue, we should reverse this.  This will also bring the Fair Scheduler in 
> line with the Capacity Scheduler.
> We should also not trim the acl strings, which makes it impossible to only 
> specify groups in an acl.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1288) Make Fair Scheduler ACLs more user friendly

2013-10-17 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1288:
-

Attachment: YARN-1288-2.patch

> Make Fair Scheduler ACLs more user friendly
> ---
>
> Key: YARN-1288
> URL: https://issues.apache.org/jira/browse/YARN-1288
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1288-1.patch, YARN-1288-2.patch, YARN-1288.patch
>
>
> The Fair Scheduler currently defaults the root queue's acl to empty and all 
> other queues' acl to "*".  Now that YARN-1258 enables configuring the root 
> queue, we should reverse this.  This will also bring the Fair Scheduler in 
> line with the Capacity Scheduler.
> We should also not trim the acl strings, which makes it impossible to only 
> specify groups in an acl.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

2013-10-17 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1210:


Attachment: YARN-1210.1.patch

> During RM restart, RM should start a new attempt only when previous attempt 
> exits for real
> --
>
> Key: YARN-1210
> URL: https://issues.apache.org/jira/browse/YARN-1210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1210.1.patch
>
>
> When RM recovers, it can wait for existing AMs to contact RM back and then 
> kill them forcefully before even starting a new AM. Worst case, RM will start 
> a new AppAttempt after waiting for 10 mins ( the expiry interval). This way 
> we'll minimize multiple AMs racing with each other. This can help issues with 
> downstream components like Pig, Hive and Oozie during RM restart.
> In the mean while, new apps will proceed as usual as existing apps wait for 
> recovery.
> This can continue to be useful after work-preserving restart, so that AMs 
> which can properly sync back up with RM can continue to run and those that 
> don't are guaranteed to be killed before starting a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-10-17 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798535#comment-13798535
 ] 

Jason Lowe commented on YARN-415:
-

The REST API doc change is a good catch, Kendall.  The source for that doc is 
at 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm
 and should be updated as part of this patch.

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n2.patch, YARN-415--n3.patch, 
> YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, 
> YARN-415--n7.patch, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-891) Store completed application information in RM state store

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798538#comment-13798538
 ] 

Hadoop QA commented on YARN-891:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609033/YARN-891.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2209//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2209//console

This message is automatically generated.

> Store completed application information in RM state store
> -
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-891.patch, YARN-891.patch, YARN-891.patch, 
> YARN-891.patch, YARN-891.patch
>
>
> Add information like exit status etc for the completed attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798532#comment-13798532
 ] 

Hadoop QA commented on YARN-415:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609031/YARN-415--n7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2208//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2208//console

This message is automatically generated.

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n2.patch, YARN-415--n3.patch, 
> YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, 
> YARN-415--n7.patch, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-10-17 Thread Andrey Klochkov (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798526#comment-13798526
 ] 

Andrey Klochkov commented on YARN-415:
--

Kendall, sure, I may update the Wiki as soon as the patch is committed. BTW how 
can I get write access to the Wiki?

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n2.patch, YARN-415--n3.patch, 
> YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, 
> YARN-415--n7.patch, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-10-17 Thread Kendall Thrapp (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798511#comment-13798511
 ] 

Kendall Thrapp commented on YARN-415:
-

Thanks Andrey for implementing this.  I'm looking forward to being able to use 
it.  Just a reminder to also update the REST API docs 
(http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API).

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n2.patch, YARN-415--n3.patch, 
> YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, 
> YARN-415--n7.patch, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-891) Store completed application information in RM state store

2013-10-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-891:
-

Attachment: YARN-891.patch

New patch added one more test case that If Attempt failed while app is still 
running, attempt final state should be saved, but app final state is not saved.

> Store completed application information in RM state store
> -
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-891.patch, YARN-891.patch, YARN-891.patch, 
> YARN-891.patch, YARN-891.patch
>
>
> Add information like exit status etc for the completed attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

2013-10-17 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798485#comment-13798485
 ] 

Omkar Vinit Joshi commented on YARN-1210:
-

taking it over.

> During RM restart, RM should start a new attempt only when previous attempt 
> exits for real
> --
>
> Key: YARN-1210
> URL: https://issues.apache.org/jira/browse/YARN-1210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
>
> When RM recovers, it can wait for existing AMs to contact RM back and then 
> kill them forcefully before even starting a new AM. Worst case, RM will start 
> a new AppAttempt after waiting for 10 mins ( the expiry interval). This way 
> we'll minimize multiple AMs racing with each other. This can help issues with 
> downstream components like Pig, Hive and Oozie during RM restart.
> In the mean while, new apps will proceed as usual as existing apps wait for 
> recovery.
> This can continue to be useful after work-preserving restart, so that AMs 
> which can properly sync back up with RM can continue to run and those that 
> don't are guaranteed to be killed before starting a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

2013-10-17 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi reassigned YARN-1210:
---

Assignee: Omkar Vinit Joshi  (was: Jian He)

> During RM restart, RM should start a new attempt only when previous attempt 
> exits for real
> --
>
> Key: YARN-1210
> URL: https://issues.apache.org/jira/browse/YARN-1210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
>
> When RM recovers, it can wait for existing AMs to contact RM back and then 
> kill them forcefully before even starting a new AM. Worst case, RM will start 
> a new AppAttempt after waiting for 10 mins ( the expiry interval). This way 
> we'll minimize multiple AMs racing with each other. This can help issues with 
> downstream components like Pig, Hive and Oozie during RM restart.
> In the mean while, new apps will proceed as usual as existing apps wait for 
> recovery.
> This can continue to be useful after work-preserving restart, so that AMs 
> which can properly sync back up with RM can continue to run and those that 
> don't are guaranteed to be killed before starting a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-10-17 Thread Andrey Klochkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated YARN-415:
-

Attachment: YARN-415--n7.patch

Thanks Jason. Attaching a fixed patch.

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n2.patch, YARN-415--n3.patch, 
> YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, 
> YARN-415--n7.patch, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1288) Make Fair Scheduler ACLs more user friendly

2013-10-17 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798460#comment-13798460
 ] 

Sandy Ryza commented on YARN-1288:
--

bq. Would this be an incompatible change? If so, can the configuration be set 
to have the previous behavior? If so, that should be the default setting.
Yes it is an incompatible change.  The commit concerns the default settings and 
thus we can't leave the previous behavior, which is incorrect, as the default.

bq. Documentation is missing.
Will add some doc

> Make Fair Scheduler ACLs more user friendly
> ---
>
> Key: YARN-1288
> URL: https://issues.apache.org/jira/browse/YARN-1288
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1288-1.patch, YARN-1288.patch
>
>
> The Fair Scheduler currently defaults the root queue's acl to empty and all 
> other queues' acl to "*".  Now that YARN-1258 enables configuring the root 
> queue, we should reverse this.  This will also bring the Fair Scheduler in 
> line with the Capacity Scheduler.
> We should also not trim the acl strings, which makes it impossible to only 
> specify groups in an acl.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-10-17 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798432#comment-13798432
 ] 

Jason Lowe commented on YARN-415:
-

Thanks for the update Andrey.  This change should resolve my concerns about the 
running container leaks, but there are some points that need to be addressed 
with respect to logging in cleanupRunningContainers:

* If an application with many containers in-flight simply unregisters and exits 
expecting the RM to clean up the mess or the application simply crashes, we're 
going to log a lot of messages for all those containers.  Currently the RM 
kills all current containers of an application already, so we're talking about 
being incorrect on the order of a few milliseconds for a sane RM.  I think this 
should be an INFO rather than a WARN.  Also we probably want to log a single 
message per application, stating how many containers were affected rather than 
specific ones, since we don't currently expose container-specific metrics 
anyway.
* There's a "new memSec" log message that appears to be a debugging artifact 
that was left in the patch

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n2.patch, YARN-415--n3.patch, 
> YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1317) Make Queue, QueueACLs and QueueMetrics first class citizens in YARN

2013-10-17 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798397#comment-13798397
 ] 

Sandy Ryza commented on YARN-1317:
--

To add, YARN-1052 covers handling queue ACLs centrally.

> Make Queue, QueueACLs and QueueMetrics first class citizens in YARN
> ---
>
> Key: YARN-1317
> URL: https://issues.apache.org/jira/browse/YARN-1317
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Today, we are duplicating the exact same code in all the schedulers. Queue is 
> a top class concept - clientService, web-services etc already recognize queue 
> as a top level concept.
> We need to move Queue, QueueMetrics and QueueACLs to be top level.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798399#comment-13798399
 ] 

Hadoop QA commented on YARN-1222:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609006/yarn-1222-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2207//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2207//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2207//console

This message is automatically generated.

> Make improvements in ZKRMStateStore for fencing
> ---
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-10-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798372#comment-13798372
 ] 

Karthik Kambatla commented on YARN-1222:


Thanks [~bikassaha]. The patch is exactly along the lines of your suggestion. 
{#doMultiWithRetries} handles the creation and deletion of fencing node and the 
logical store operations are oblivious to this. The patch calls the 
{{#takeOwnership()}} you propose {{fence()}} and is called while creating the 
root-dirs, before {{#loadState()}}.

> Make improvements in ZKRMStateStore for fencing
> ---
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-10-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798367#comment-13798367
 ] 

Karthik Kambatla commented on YARN-1068:


Thanks [~vinodkv]. I do agree that we should avoid creating new RPC servers and 
add more ports to the configuration.

Created YARN-1318 to make AdminService an Always-On service. Unfortunately, 
that requires significant changes to ResourceManager and RMContext as well. 
https://issues.apache.org/jira/browse/YARN-1318?focusedCommentId=13798358&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13798358
 provides a gist of the required changes. 

[~vinodkv], [~bikassaha] can you please take a look at YARN-1318 as well and 
provide your thoughts. 

> Add admin support for HA operations
> ---
>
> Key: YARN-1068
> URL: https://issues.apache.org/jira/browse/YARN-1068
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Attachments: yarn-1068-10.patch, yarn-1068-11.patch, 
> yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, 
> yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, 
> yarn-1068-9.patch, yarn-1068-prelim.patch
>
>
> Support HA admin operations to facilitate transitioning the RM to Active and 
> Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-10-17 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798363#comment-13798363
 ] 

Bikas Saha commented on YARN-1222:
--

I havent seen the patch. I would expect that logical state store operations 
like storeApplication, storeToken, removeToken etc dont have to worry about 
these details. They just create a list of ZK ops which then get funneled 
through a common executor. This executor adds the fencing op, combines all ops 
into a single zk-multi-operation and executes it. Is that how it works?
Also, we would need a takeOwnership feature that would be called before 
loadState() and would perform ACL change operation to take ownership and also 
setup the fencing node for subsequent operations.

> Make improvements in ZKRMStateStore for fencing
> ---
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1318) Promote AdminService to an Always-On service

2013-10-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798358#comment-13798358
 ] 

Karthik Kambatla commented on YARN-1318:


This is more complicated than just moving the AdminService to the Always-On 
services list. (Has been a while since I worked on the first patch of 
YARN-1068, and forgot the complexity of this alternative approach.)

AdminService constructor declaration looks like
{code}
  public AdminService(Configuration conf, ResourceScheduler scheduler, 
  RMContext rmContext, NodesListManager nodesListManager, 
  ClientRMService clientRMService, 
  ApplicationMasterService applicationMasterService,
  ResourceTrackerService resourceTrackerService) {
{code}

Even if we move ClientRMService, ApplicationMasterService and 
ResourceTrackerService to Always-On, we still need the ResourceScheduler and 
NodesListManager which are ActiveServices. I think the best way forward is the 
constructor should only take RMContext and everything else should be picked 
from there. 

RMContext constructor currently looks like:
{code}
  public RMContextImpl(Dispatcher rmDispatcher,
  RMStateStore store,
  ContainerAllocationExpirer containerAllocationExpirer,
  AMLivelinessMonitor amLivelinessMonitor,
  AMLivelinessMonitor amFinishingMonitor,
  DelegationTokenRenewer delegationTokenRenewer,
  AMRMTokenSecretManager amRMTokenSecretManager,
  RMContainerTokenSecretManager containerTokenSecretManager,
  NMTokenSecretManagerInRM nmTokenSecretManager,
  ClientToAMTokenSecretManagerInRM clientToAMTokenSecretManager) {
{code}
RMContext should be initialized before any of the other ActiveServices are, 
which requires switching over completely to a create RMContext first up, and 
set each field when the corresponding RM service/component is created. 

Does this seem like a reasonable approach?

> Promote AdminService to an Always-On service
> 
>
> Key: YARN-1318
> URL: https://issues.apache.org/jira/browse/YARN-1318
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
>
> Per discussion in YARN-1068, we want AdminService to handle HA-admin 
> operations in addition to the regular non-HA admin operations. To facilitate 
> this, we need to move AdminService an Always-On service. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1303) Allow multiple commands separating with ";" in distributed-shell

2013-10-17 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1303:
--

Summary: Allow multiple commands separating with ";" in distributed-shell  
(was: Allow multiple commands separating with ;)

> Allow multiple commands separating with ";" in distributed-shell
> 
>
> Key: YARN-1303
> URL: https://issues.apache.org/jira/browse/YARN-1303
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications/distributed-shell
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
> Fix For: 2.2.1
>
> Attachments: YARN-1303.1.patch, YARN-1303.2.patch, YARN-1303.3.patch, 
> YARN-1303.3.patch, YARN-1303.4.patch, YARN-1303.4.patch, YARN-1303.5.patch, 
> YARN-1303.6.patch
>
>
> In shell, we can do "ls; ls" to run 2 commands at once. 
> In distributed shell, this is not working. We should improve to allow this to 
> occur. There are practical use cases that I know of to run multiple commands 
> or to set environment variables before a command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-10-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798325#comment-13798325
 ] 

Karthik Kambatla commented on YARN-1222:


Have tested this manually on a cluster and verified an RM kicks out any other 
RMs using the store. Kicking Jenkins to see if it shows any particular issues 
with the patch, while I work on unit tests. 

> Make improvements in ZKRMStateStore for fencing
> ---
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1318) Promote AdminService to an Always-On service

2013-10-17 Thread Karthik Kambatla (JIRA)

Karthik Kambatla created YARN-1318:
--

 Summary: Promote AdminService to an Always-On service
 Key: YARN-1318
 URL: https://issues.apache.org/jira/browse/YARN-1318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


Per discussion in YARN-1068, we want AdminService to handle HA-admin operations 
in addition to the regular non-HA admin operations. To facilitate this, we need 
to move AdminService an Always-On service. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-10-17 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1222:
---

Attachment: yarn-1222-2.patch

Here is an updated patch on recent trunk. I am yet to add unit tests for the 
patch, posting it for a high-level design review.

High-level approach:
# Every write operation is padded between creation and deletion of a fencing 
node.
# The root node is governed by special ACLs that gives the RM exclusive 
create-delete access, and shared read-write-admin.
# Client can set ACLs for the root node. If client doesn't explicitly set, the 
default fencing mode kicks in - we generate the root-node ACLs from the 
store-ACLs by stripping create-delete perms of all users but the RM. RM 
identifies itself through the RM_ADDRESS and clusterTimeStamp.

> Make improvements in ZKRMStateStore for fencing
> ---
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798329#comment-13798329
 ] 

Hadoop QA commented on YARN-1303:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609003/YARN-1303.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2206//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2206//console

This message is automatically generated.

> Allow multiple commands separating with ;
> -
>
> Key: YARN-1303
> URL: https://issues.apache.org/jira/browse/YARN-1303
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications/distributed-shell
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
> Fix For: 2.2.1
>
> Attachments: YARN-1303.1.patch, YARN-1303.2.patch, YARN-1303.3.patch, 
> YARN-1303.3.patch, YARN-1303.4.patch, YARN-1303.4.patch, YARN-1303.5.patch, 
> YARN-1303.6.patch
>
>
> In shell, we can do "ls; ls" to run 2 commands at once. 
> In distributed shell, this is not working. We should improve to allow this to 
> occur. There are practical use cases that I know of to run multiple commands 
> or to set environment variables before a command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1303) Allow multiple commands separating with ;

2013-10-17 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1303:


Attachment: YARN-1303.6.patch

> Allow multiple commands separating with ;
> -
>
> Key: YARN-1303
> URL: https://issues.apache.org/jira/browse/YARN-1303
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications/distributed-shell
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
> Fix For: 2.2.1
>
> Attachments: YARN-1303.1.patch, YARN-1303.2.patch, YARN-1303.3.patch, 
> YARN-1303.3.patch, YARN-1303.4.patch, YARN-1303.4.patch, YARN-1303.5.patch, 
> YARN-1303.6.patch
>
>
> In shell, we can do "ls; ls" to run 2 commands at once. 
> In distributed shell, this is not working. We should improve to allow this to 
> occur. There are practical use cases that I know of to run multiple commands 
> or to set environment variables before a command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-891) Store completed application information in RM state store

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798303#comment-13798303
 ] 

Hadoop QA commented on YARN-891:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608998/YARN-891.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2205//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2205//console

This message is automatically generated.

> Store completed application information in RM state store
> -
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-891.patch, YARN-891.patch, YARN-891.patch, 
> YARN-891.patch
>
>
> Add information like exit status etc for the completed attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-891) Store completed application information in RM state store

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798276#comment-13798276
 ] 

Hadoop QA commented on YARN-891:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608993/YARN-891.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestRMStateStore

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2204//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2204//console

This message is automatically generated.

> Store completed application information in RM state store
> -
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-891.patch, YARN-891.patch, YARN-891.patch, 
> YARN-891.patch
>
>
> Add information like exit status etc for the completed attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA

2013-10-17 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798273#comment-13798273
 ] 

Tsuyoshi OZAWA commented on YARN-1307:
--

OK, the design by Bikas looks good to me. I'll resume to write a patch for this 
JIRA when we decide the sequencing of YARN-1222 and YARN-1307 to do.

> Rethink znode structure for RM HA
> -
>
> Key: YARN-1307
> URL: https://issues.apache.org/jira/browse/YARN-1307
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
>
> Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, 
> YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in 
> YARN-1222:
> {quote}
> We should move to creating a node hierarchy for apps such that all znodes for 
> an app are stored under an app znode instead of the app root znode. This will 
> help in removeApplication and also in scaling better on ZK. The earlier code 
> was written this way to ensure create/delete happens under a root znode for 
> fencing. But given that we have moved to multi-operations globally, this isnt 
> required anymore.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-10-17 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798270#comment-13798270
 ] 

Tsuyoshi OZAWA commented on YARN-1222:
--

In YARN-1307, [~kkambatl] mentioned that the patch for this JIRA simplifies 
some of the ZK-operations. If so, we should wait for this tickets. Thoughts?

> Make improvements in ZKRMStateStore for fencing
> ---
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-891) Store completed application information in RM state store

2013-10-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-891:
-

Attachment: YARN-891.patch

> Store completed application information in RM state store
> -
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-891.patch, YARN-891.patch, YARN-891.patch, 
> YARN-891.patch
>
>
> Add information like exit status etc for the completed attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-891) Store completed application information in RM state store

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798267#comment-13798267
 ] 

Hadoop QA commented on YARN-891:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608991/YARN-891.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestRMStateStore

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2203//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2203//console

This message is automatically generated.

> Store completed application information in RM state store
> -
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-891.patch, YARN-891.patch, YARN-891.patch
>
>
> Add information like exit status etc for the completed attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-891) Store completed application information in RM state store

2013-10-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-891:
-

Attachment: YARN-891.patch

upload a patch, details

- Add a new field finalState in ApplicationState and ApplicationAttemptState in 
RMStateStore.
- All app transitions go through RMAppFinalStateSavingTransition waiting for 
final state to be stored and then do corresponding final state transition in 
RMAppFinalStateSavedTransition
- Similarly, All atttempt transitions go through 
AttemptFinalStateSavingTransition waiting for final attempt state to be stored 
and then go through AttemptFinalStateSavedTransition
- Changed the recovery logic to not start terminated apps/attempts and populate 
the final state.
- corresponding PB changes.
- TestRMAppTransitions / TestRMAppAttemptTransitions test the tranistion logic.
- TestRMRestart#testRMRestart test the saving and loading of finished apps in 
RM restart logic. 
- Tested on single node cluster with HDFS store.

To do:
- state store clean up thread.
- test with zk store.

> Store completed application information in RM state store
> -
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-891.patch, YARN-891.patch
>
>
> Add information like exit status etc for the completed attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-891) Store completed application information in RM state store

2013-10-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-891:
-

Attachment: YARN-891.patch

New patch with minor fix

> Store completed application information in RM state store
> -
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-891.patch, YARN-891.patch, YARN-891.patch
>
>
> Add information like exit status etc for the completed attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-956) [YARN-321] Add a testable in-memory HistoryStorage

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798174#comment-13798174
 ] 

Hadoop QA commented on YARN-956:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608973/YARN-956.4.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2202//console

This message is automatically generated.

> [YARN-321] Add a testable in-memory HistoryStorage 
> ---
>
> Key: YARN-956
> URL: https://issues.apache.org/jira/browse/YARN-956
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Fix For: YARN-321
>
> Attachments: YARN-956-1.patch, YARN-956-2.patch, YARN-956-3.patch, 
> YARN-956.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1317) Make Queue, QueueACLs and QueueMetrics first class citizens in YARN

2013-10-17 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798175#comment-13798175
 ] 

Vinod Kumar Vavilapalli commented on YARN-1317:
---

Here are the specific details:
 - Today we force each scheduler to update QueueMetrics. Schedulers only deal 
with AppAttempts today (YARN-1311), so QueueMetrics sometimes aren't updated 
correctly (YARN-1166).
 - Every scheduler does the same ACL checks when we don't need to. There are 
inconsistencies in the way QueueACLs are handled (YARN-1288) which can be 
completely done away with once we have ACLs checked by a top level 
QueueACLsManager.
 - Queue Configuration itself is wildly scheduler specific. We reconciled that 
during 0.21 via MAPREDUCE-861.
  -- This is a much larger effort with compatibility implications, but this is 
something that we need to think about may be separately
  -- One other side benefit of MAPREDUCE-861 is that hierarchical queues are so 
much easier to configure in a hierarchical conf file.
 - Managing apps by queues(YARN-807). Every scheduler essentially manages the 
same detail. A top level view manipulated by individual schedules eases this 
pain.

> Make Queue, QueueACLs and QueueMetrics first class citizens in YARN
> ---
>
> Key: YARN-1317
> URL: https://issues.apache.org/jira/browse/YARN-1317
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Today, we are duplicating the exact same code in all the schedulers. Queue is 
> a top class concept - clientService, web-services etc already recognize queue 
> as a top level concept.
> We need to move Queue, QueueMetrics and QueueACLs to be top level.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-947) Defining the history data classes for the implementation of the reading/writing interface

2013-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798157#comment-13798157
 ] 

Hadoop QA commented on YARN-947:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608975/YARN-947.4.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2201//console

This message is automatically generated.

> Defining the history data classes for the implementation of the 
> reading/writing interface
> -
>
> Key: YARN-947
> URL: https://issues.apache.org/jira/browse/YARN-947
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: YARN-321
>
> Attachments: YARN-947.1.patch, YARN-947.2.patch, YARN-947.3.patch, 
> YARN-947.4.patch
>
>
> We need to define the history data classes have the exact fields to be 
> stored. Therefore, all the implementations don't need to have the duplicate 
> logic to exact the required information from RMApp, RMAppAttempt and 
> RMContainer.
> We use protobuf to define these classes, such that they can be ser/des 
> to/from bytes, which are easier for persistence.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1317) Make Queue, QueueACLs and QueueMetrics first class citizens in YARN

2013-10-17 Thread Vinod Kumar Vavilapalli (JIRA)

Vinod Kumar Vavilapalli created YARN-1317:
-

 Summary: Make Queue, QueueACLs and QueueMetrics first class 
citizens in YARN
 Key: YARN-1317
 URL: https://issues.apache.org/jira/browse/YARN-1317
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Today, we are duplicating the exact same code in all the schedulers. Queue is a 
top class concept - clientService, web-services etc already recognize queue as 
a top level concept.

We need to move Queue, QueueMetrics and QueueACLs to be top level.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-947) Defining the history data classes for the implementation of the reading/writing interface

2013-10-17 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-947:
-

Attachment: YARN-947.4.patch

Update the patch again to fix the bug in newInstance of 
ApplicationAttemptFinishData.

bq. But I'll hold off on commit till we have all the pieces working together 
and we have tests showing the on-disk and in-memory foot prints of the the 
completed jobs.

Sure

> Defining the history data classes for the implementation of the 
> reading/writing interface
> -
>
> Key: YARN-947
> URL: https://issues.apache.org/jira/browse/YARN-947
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: YARN-321
>
> Attachments: YARN-947.1.patch, YARN-947.2.patch, YARN-947.3.patch, 
> YARN-947.4.patch
>
>
> We need to define the history data classes have the exact fields to be 
> stored. Therefore, all the implementations don't need to have the duplicate 
> logic to exact the required information from RMApp, RMAppAttempt and 
> RMContainer.
> We use protobuf to define these classes, such that they can be ser/des 
> to/from bytes, which are easier for persistence.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

2013-10-17 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798154#comment-13798154
 ] 

Vinod Kumar Vavilapalli commented on YARN-807:
--

Filed YARN-1317 for the Queue stuff.

> When querying apps by queue, iterating over all apps is inefficient and 
> limiting 
> -
>
> Key: YARN-807
> URL: https://issues.apache.org/jira/browse/YARN-807
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.4-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-807.patch
>
>
> The question "which apps are in queue x" can be asked via the RM REST APIs, 
> through the ClientRMService, and through the command line.  In all these 
> cases, the question is answered by scanning through every RMApp and filtering 
> by the app's queue name.
> All schedulers maintain a mapping of queues to applications.  I think it 
> would make more sense to ask the schedulers which applications are in a given 
> queue. This is what was done in MR1. This would also have the advantage of 
> allowing a parent queue to return all the applications on leaf queues under 
> it, and allow queue name aliases, as in the way that "root.default" and 
> "default" refer to the same queue in the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-956) [YARN-321] Add a testable in-memory HistoryStorage

2013-10-17 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-956:
-

Attachment: YARN-956.4.patch

I uploaded a new patch, which completely rewrites the 
MemoryApplicationHistoryStore. The major changes are listed bellow:

1. Make MemoryApplicationHistoryStore extends AbstractService, and make the 
constructor public, as RM can uniformly use Java reflection to construct any 
implementation of ApplicationHistoryStore.

2. Make the in-memory store the static objects, thus it will not be erased when 
MemoryApplicationHistoryStore instance is destroyed. RM can create one instance 
to use the writer interface to add history data, while AHS can also create 
another instance to use the reader interface to read history data. The static 
references were the common place where all instances can access, like the root 
dir in the file system for FileSystemApplicationHistoryStore.

3. Make containerData have the two-level lookup index as well, because multiple 
containers may belongs to one application attempt. It can speed up the reader 
interface: Map  
getContainers(ApplicationAttemptId appAttemptId)

4. Keep storing HistoryData instead of storing StartData and 
FinishData. When StartData is to be written, HistoryData will be 
created, and copy the fields in StartData into those in HistoryData. 
When FinishData is to be written, merge fields in FinishData into the 
existing HistoryData. This helps to simplify the implementation of the 
reader interface and improve its performance.

5. Set the rule of not overriding the existing history data, because the each 
record is supposed to be unique and just written once. An IOException will be 
thrown if  it happens. In addition, StartData is supposed to be written 
before FinishData. Otherwise, an IOException will be thrown as well.

6. Rewrite the test cases to verify both the correct read/write procedures, and 
the incorrect writing operations.

> [YARN-321] Add a testable in-memory HistoryStorage 
> ---
>
> Key: YARN-956
> URL: https://issues.apache.org/jira/browse/YARN-956
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Fix For: YARN-321
>
> Attachments: YARN-956-1.patch, YARN-956-2.patch, YARN-956-3.patch, 
> YARN-956.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA

2013-10-17 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798122#comment-13798122
 ] 

Bikas Saha commented on YARN-1307:
--

Jira messed up the tree in previous comment. Here it is again.

{noformat}
ROOT_DIR_PATH
 |--- VERSION_INFO
 |--- RM_APP_ROOT
 | |- (#ApplicationId1)
 | ||- (#ApplicationAttemptIds)
 | | 
 | |- (#ApplicationId2)
 | |- (#ApplicationAttemptIds)
 |  
 |--- RM_DT_SECRET_MANAGER_ROOT
   |- RMDTSequenceNumber
   |- RMDelegationToken
{noformat}

> Rethink znode structure for RM HA
> -
>
> Key: YARN-1307
> URL: https://issues.apache.org/jira/browse/YARN-1307
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
>
> Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, 
> YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in 
> YARN-1222:
> {quote}
> We should move to creating a node hierarchy for apps such that all znodes for 
> an app are stored under an app znode instead of the app root znode. This will 
> help in removeApplication and also in scaling better on ZK. The earlier code 
> was written this way to ensure create/delete happens under a root znode for 
> fencing. But given that we have moved to multi-operations globally, this isnt 
> required anymore.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA

2013-10-17 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798120#comment-13798120
 ] 

Bikas Saha commented on YARN-1307:
--

I dont see much value in creating extra hierarchy for attempt Id's. We need a 
version info field at the top for upgrades. We should avoid encoding sequence 
numbers in the znode names since listing of paths is less protected in znode 
and the info may be visible to those who dont have other access. Name encoding 
was used for HDFS since over-write operation is expensive. In ZK, over-write is 
easy and so the data can be stored in the znode instead of its name wherever 
possible.

ROOT_DIR_PATH
 |--- VERSION_INFO
 |--- RM_APP_ROOT
 | |- (#ApplicationId1)
 | | |- (#ApplicationAttemptIds)
 | | 
 | |- (#ApplicationId2)
 |  |- (#ApplicationAttemptIds)
 |  
 |--- RM_DT_SECRET_MANAGER_ROOT
   |- RMDTSequenceNumber
   |- RMDelegationToken

> Rethink znode structure for RM HA
> -
>
> Key: YARN-1307
> URL: https://issues.apache.org/jira/browse/YARN-1307
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
>
> Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, 
> YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in 
> YARN-1222:
> {quote}
> We should move to creating a node hierarchy for apps such that all znodes for 
> an app are stored under an app znode instead of the app root znode. This will 
> help in removeApplication and also in scaling better on ZK. The earlier code 
> was written this way to ensure create/delete happens under a root znode for 
> fencing. But given that we have moved to multi-operations globally, this isnt 
> required anymore.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA

2013-10-17 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798067#comment-13798067
 ] 

Tsuyoshi OZAWA commented on YARN-1307:
--

Thanks for your review, Vinod and Jian!

> RMDelegationToken is not application specific, user can also explicitly say 
> getDelegationToken, should not be stored along with app info.

I overlooked this. Then, the following znode structure is correct one.

{code}
ROOT_DIR_PATH
 |--- RM_APP_ROOT
 | |- (#ApplicationId1)
 | ||- ATTEMPT_IDS
 | | |- (#ApplicationAttemptIds)
 | | 
 | |- (#ApplicationId2)
 |  |- ATTEMPT_IDS
 |  |- (#ApplicationAttemptIds)
 |  
 |--- RM_DT_SECRET_MANAGER_ROOT
   |- RMDTSequenceNumber_(SequenceNumber)
   |- RMDelegationToken_(#SequenceNumber)
{code}

As you mentioned, ZK does NOT support 'directory' removal. We can implement it 
with ZK's getChildren and multi delete APIs.

> Rethink znode structure for RM HA
> -
>
> Key: YARN-1307
> URL: https://issues.apache.org/jira/browse/YARN-1307
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
>
> Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, 
> YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in 
> YARN-1222:
> {quote}
> We should move to creating a node hierarchy for apps such that all znodes for 
> an app are stored under an app znode instead of the app root znode. This will 
> help in removeApplication and also in scaling better on ZK. The earlier code 
> was written this way to ensure create/delete happens under a root znode for 
> fencing. But given that we have moved to multi-operations globally, this isnt 
> required anymore.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA

2013-10-17 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798070#comment-13798070
 ] 

Tsuyoshi OZAWA commented on YARN-1307:
--

[~kkambatl], sure, no problem.

> Rethink znode structure for RM HA
> -
>
> Key: YARN-1307
> URL: https://issues.apache.org/jira/browse/YARN-1307
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
>
> Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, 
> YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in 
> YARN-1222:
> {quote}
> We should move to creating a node hierarchy for apps such that all znodes for 
> an app are stored under an app znode instead of the app root znode. This will 
> help in removeApplication and also in scaling better on ZK. The earlier code 
> was written this way to ensure create/delete happens under a root znode for 
> fencing. But given that we have moved to multi-operations globally, this isnt 
> required anymore.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-10-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798062#comment-13798062
 ] 

Karthik Kambatla commented on YARN-1222:


Yep. It is better to work on them sequentially. Just pinged [~ozawa] on the 
other ticket to see if he has already started working on it. Otherwise, can 
follow up here quickly with patches. 

> Make improvements in ZKRMStateStore for fencing
> ---
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA

2013-10-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798061#comment-13798061
 ] 

Karthik Kambatla commented on YARN-1307:


[~ozawa], if you haven't already started coding this up, do you mind waiting 
until YARN-1222 is checked in. The patch there simplifies some of the 
ZK-operations. 

> Rethink znode structure for RM HA
> -
>
> Key: YARN-1307
> URL: https://issues.apache.org/jira/browse/YARN-1307
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
>
> Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, 
> YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in 
> YARN-1222:
> {quote}
> We should move to creating a node hierarchy for apps such that all znodes for 
> an app are stored under an app znode instead of the app root znode. This will 
> help in removeApplication and also in scaling better on ZK. The earlier code 
> was written this way to ensure create/delete happens under a root znode for 
> fencing. But given that we have moved to multi-operations globally, this isnt 
> required anymore.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-10-17 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798043#comment-13798043
 ] 

Vinod Kumar Vavilapalli commented on YARN-1068:
---

bq. IIUC, the suggestion is to use the RPC server from AdminService. 
AdminService currently is an Active service and not an Always-On service, so 
doesn't start until the RM transitions to Active. Moving the AdminService to 
Always-On requires defining the semantics when the RM is Standby.
I'd do this instead of adding a new service. For now as well as long term, we 
want to deny all the existing AdminService operations on Standby. Doing it via 
not stopping the server or explicitly rejecting the requests is an 
implementation detail and not a big change in semantics.

Repeating what I said, we originally added AdminService separately from 
client-service only for prioritizing admin operations. No need for a new server 
for this.

bq. To do this, we need to have RMAdminCLI extend HAAdmin, and augment the 
run() method to call super.run() when applicable, and the usage needs to be 
augmented to include the HAAdmin usage.
Yes. I guess there is no argument here other than stating the obvious.

bq. YARN expects the actual PB/PBImpl files to be at a particular location, and 
can't find the corresponding files when using HAServiceProtocol from common. 
Hence, had to use PB interfaces.
HAServiceProtocolPB is the PB interface and there seems to be no PBImpl as 
Common/HDFS follow a different pattern from YARN's and the last I heard they 
liked YARN's PB impl stuff. In any case, +1 to skip using YARNRPC for that 
reason.

bq. The patch primarily adds command line support for HA transitions. Have 
tested this manually several times on a real cluster. 
We need junit tests for everything. We can skip down to manual tests for 
hard-to-test race conditions or security features that are not possible to 
address otherwise. Manual testing is not a substitute for junit tests.

> Add admin support for HA operations
> ---
>
> Key: YARN-1068
> URL: https://issues.apache.org/jira/browse/YARN-1068
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Attachments: yarn-1068-10.patch, yarn-1068-11.patch, 
> yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, 
> yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, 
> yarn-1068-9.patch, yarn-1068-prelim.patch
>
>
> Support HA admin operations to facilitate transitioning the RM to Active and 
> Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1139) [Umbrella] Convert all RM components to Services

2013-10-17 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797804#comment-13797804
 ] 

Steve Loughran commented on YARN-1139:
--

# you don't need to convert any exceptions now, because the inner 
{{serviceStart()/serviceStop()}} methods throw exceptions. Just pass them up. 
The only reason the existing services didn't have their exception catch/wrap 
logic changed as part of YARN-117 is that I didn't want to add extra changes

# AbstractService catches a failure and relays to noteFailure(), which, for the 
first exception caught, gets saved away; {{getFailureCause()}} and 
{{getFailureState()}} returns that exception and the state when it happened.
# when an exception is caught during state changes, it triggers a 
{{Service.stop()}} action -which is why it is required to be a best-effort 
operation & do its best even when trying to stop a partially inited or started 
service
# it then calls {{ServiceStateException.convert(e);}} to convert the exception 
into a RuntimeException; if it is one it is left alone, otherwise it is 
surrounded by a ServiceStateException.

# The composite service runs through its children starting each one in turn. 
The first one that fails by throwing a runtime exception will trigger the 
noteFailure operation on the parent, then the composite service's stop() 
operation -which then walks back through all inited services (but not the 
UNINITED ones -things failed when we tried that), stopping them in turn.

What that means is that if a child service fails, the composite should pick 
that up and save it as its own failure cause. 

I've actually done a couple more child-holding services for my own work, which 
I'd happily push back into trunk/2.3 

[https://github.com/hortonworks/hoya/tree/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/yarn/service]

* The 
[SequenceService|https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/yarn/service/SequenceService.java]
 runs its children in sequence, failing when one fails
* The 
[CompoundService|https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/yarn/service/CompoundService.java]
 stops as soon as any one of its children fail, again propagating any faults up
These both implement a [Parent interface| Parent.java] so that they can be 
treated uniformally -and allow other bits of the code to add children

Alongside that:
* [EventNotifyingService| 
https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/yarn/service/EventNotifyingService.java]
 : sleeps, notifies a callback, stops
* 
[ForkedProcessService|https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/yarn/service/ForkedProcessService.java]:
 forks off a native process, stops when the process stops, kills the process 
when it itself is stopped, and forwards up exceptions on a process failure

These let me build up more complex workflows like this one [to start 
accumulo|https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/providers/accumulo/AccumuloProviderService.java#L331]
 -runs a sequence of "accumulo init" (if needed), followed by, in parallel, 
"accumulo start" and a delayed event callback. That callback will, if accumulo 
start hasn't failed in the meantime, trigger the request for containers for 
whatever other accumulo roles have been added.

Anyway, the services will catch, record, wrap and relay exceptions, the parents 
just need to be able to handle the fact that it will be a RuntimeException that 
comes back -and there is no need to catch and wrap it again if you want to pass 
it upstream.










> [Umbrella] Convert all RM components to Services
> 
>
> Key: YARN-1139
> URL: https://issues.apache.org/jira/browse/YARN-1139
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
>
> Some of the RM components - state store, scheduler etc. are not services. 
> Converting them to services goes well with the "Always On" and "Active" 
> service separation proposed on YARN-1098.
> Given that some of them already have start(), stop() methods, it should not 
> be too hard to convert them to services.
> That would also be a cleaner way of addressing YARN-1125.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

72 matches

Mail list logo