[jira] [Commented] (YARN-3768) Index out of range exception with environment variables without values

2015-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583064#comment-14583064
 ] 

Hadoop QA commented on YARN-3768:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 57s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 55s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 35s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| | |  40m  2s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12739173/YARN-3768.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 83e8110 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8241/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8241/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8241/console |


This message was automatically generated.

> Index out of range exception with environment variables without values
> --
>
> Key: YARN-3768
> URL: https://issues.apache.org/jira/browse/YARN-3768
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.5.0
>Reporter: Joe Ferner
>Assignee: zhihai xu
> Attachments: YARN-3768.000.patch
>
>
> Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range 
> exception occurs if an environment variable is encountered without a value.
> I believe this occurs because java will not return empty strings from the 
> split method. Similar to this 
> http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3750) yarn.log.server.url is not documented in yarn-default.xml

2015-06-12 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt reassigned YARN-3750:
--

Assignee: Bibin A Chundatt

> yarn.log.server.url is not documented in yarn-default.xml
> -
>
> Key: YARN-3750
> URL: https://issues.apache.org/jira/browse/YARN-3750
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.0
>Reporter: Dmitry Sivachenko
>Assignee: Bibin A Chundatt
>Priority: Minor
>
> From 
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201505.mbox/%3cd18c9931.52700%25xg...@hortonworks.com%3e
> I learned about yarn.log.server.url setting.
> But it is not mentioned in yarn-default.xml file.
> I propose to add this variable there with some short description.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

2015-06-12 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583093#comment-14583093
 ] 

Naganarasimha G R commented on YARN-3644:
-

Hi [~raju.bairishetti],
IIUC intention of this jira is to only make NM wait for RM infinitely and hence 
we don't want to set  {{yarn.resourcemanager.connect.max-wait.ms}} to  FOREVER 
retry policy which might affect other clients connecting to RM right ?
If so i feel overall approach is fine except for the cosmetic comments below
# {{NM_SHUTSDWON_ON_RM_CONNECTION_FAILURES}}  typo,  SHUTSDWON => SHUTDOWN
# if agree on the earlier then 
{{DEFAULT_NM_SHUTSDOWN_ON_RM_CONNECTION_FAILURES}} => 
{{DEFAULT_NM_SHUTDOWN_ON_RM_CONNECTION_FAILURES}} 
# configuration could be {{yarn.nodemanager.shutdown.on.connection.failures}} 
=> {{yarn.nodemanager.shutdown.on.RM.connection.failures}}. correct the same in 
yarn-default.xml's  description and name also
# Testcase introduces new {{MyNodeStatusUpdater6}} whose only change is to get 
the new Resource tracker for the test case, its becoming more and more 
duplicate code for NodeStatusUpdater as most of the other overloaded 
NodeStatusUpdater is also doing the same, so can we bring in a common 
NodeStatusUpdater  class which accepts ResourceTracker  as parameter to 
constructor ? (may be refactoring other classes can be taken up in other jira 
if req)
# {{MyResourceTracker8}} could extend {{MyResourceTracker5}} and just override 
the required methods. Would also appreciate if some documentation is added 
above these classes so that in future it will be helpfull to reuse if req.

> Node manager shuts down if unable to connect with RM
> 
>
> Key: YARN-3644
> URL: https://issues.apache.org/jira/browse/YARN-3644
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Srikanth Sundarrajan
>Assignee: Raju Bairishetti
> Attachments: YARN-3644.001.patch, YARN-3644.patch
>
>
> When NM is unable to connect to RM, NM shuts itself down.
> {code}
>   } catch (ConnectException e) {
> //catch and throw the exception if tried MAX wait time to connect 
> RM
> dispatcher.getEventHandler().handle(
> new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
> throw new YarnRuntimeException(e);
> {code}
> In large clusters, if RM is down for maintenance for longer period, all the 
> NMs shuts themselves down, requiring additional work to bring up the NMs.
> Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side 
> effects, where non connection failures are being retried infinitely by all 
> YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3743) Allow admin specify labels from RM with node labels provider

2015-06-12 Thread Dian Fu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned YARN-3743:
-

Assignee: Dian Fu

> Allow admin specify labels from RM with node labels provider
> 
>
> Key: YARN-3743
> URL: https://issues.apache.org/jira/browse/YARN-3743
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3743.1.patch
>
>
> As discussed in YARN-3557, providing a node label configuration mechanism 
> similar to YARN-2495 at RM side would ease the use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2015-06-12 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583116#comment-14583116
 ] 

Sidharta Seethana commented on YARN-2194:
-

[~ywskycn] ,  you'll need to change {{PrivilegedOperationExecutor}} as well 

{code}
  if (noneArgsOnly == false) {
//We have already appended at least one tasks file.
finalOpArg.append(",");
finalOpArg.append(tasksFile);
  } else {
finalOpArg.append(tasksFile);
noneArgsOnly = false;
  }
{code}


The tests appear to pass in TestLinuxContainerExecutorWithMocks,
 but it not clear why. One example in {{TestLinuxContainerExecutorWithMocks}} 
that should have caused a test failure : 

{code}
StringUtils.join(",", dirsHandler.getLocalDirs()),
StringUtils.join(",", dirsHandler.getLogDirs()), "cgroups=none"),
{code}

It appears to me that this construction is done in enough places that it would 
make sense to create a static constant for use as a separator when constructing 
an argument for the container-executor binary. A good candidate location to add 
such a constant would be the {{PrivilegedOperation}} class. You could, in 
addition, also ‘hide’ the join functionality by adding a static function in the 
{{PrivilegedOperation}} class. 



> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Critical
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3794) TestRMEmbeddedElector fails because of ambiguous LOG reference

2015-06-12 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3794:

Hadoop Flags: Reviewed

+1, Looks good to me, will commit it shortly.

YARN-3790 exists to track the TestWorkPreservingRMRestart failure.

> TestRMEmbeddedElector fails because of ambiguous LOG reference
> --
>
> Key: YARN-3794
> URL: https://issues.apache.org/jira/browse/YARN-3794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: YARN-3794.01.patch
>
>
> After YARN-2921, {{MockRM}} has also a {{LOG}} field. Therefore {{LOG}} in 
> the following code snippet is ambiguous.
> {code}
> protected AdminService createAdminService() {
>   return new AdminService(MockRMWithElector.this, getRMContext()) {
> @Override
> protected EmbeddedElectorService createEmbeddedElectorService() {
>   return new EmbeddedElectorService(getRMContext()) {
> @Override
> public void becomeActive() throws
> ServiceFailedException {
>   try {
> callbackCalled.set(true);
> LOG.info("Callback called. Sleeping now");
> Thread.sleep(delayMs);
> LOG.info("Sleep done");
>   } catch (InterruptedException e) {
> e.printStackTrace();
>   }
>   super.becomeActive();
> }
>   };
> }
>   };
> }
> {code}
> Eclipse gives the following error:
> {quote}
> The field LOG is defined in an inherited type and an enclosing scope
> {quote}
> IMO, we should fix this as {{TestRMEmbeddedElector.LOG}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3794) TestRMEmbeddedElector fails because of ambiguous LOG reference

2015-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583140#comment-14583140
 ] 

Hudson commented on YARN-3794:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8009 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8009/])
YARN-3794. TestRMEmbeddedElector fails because of ambiguous LOG reference. 
(devaraj: rev d8dcfa98e3ca6a6fea414fd503589bb83b7a9c51)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java


> TestRMEmbeddedElector fails because of ambiguous LOG reference
> --
>
> Key: YARN-3794
> URL: https://issues.apache.org/jira/browse/YARN-3794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.8.0
>
> Attachments: YARN-3794.01.patch
>
>
> After YARN-2921, {{MockRM}} has also a {{LOG}} field. Therefore {{LOG}} in 
> the following code snippet is ambiguous.
> {code}
> protected AdminService createAdminService() {
>   return new AdminService(MockRMWithElector.this, getRMContext()) {
> @Override
> protected EmbeddedElectorService createEmbeddedElectorService() {
>   return new EmbeddedElectorService(getRMContext()) {
> @Override
> public void becomeActive() throws
> ServiceFailedException {
>   try {
> callbackCalled.set(true);
> LOG.info("Callback called. Sleeping now");
> Thread.sleep(delayMs);
> LOG.info("Sleep done");
>   } catch (InterruptedException e) {
> e.printStackTrace();
>   }
>   super.becomeActive();
> }
>   };
> }
>   };
> }
> {code}
> Eclipse gives the following error:
> {quote}
> The field LOG is defined in an inherited type and an enclosing scope
> {quote}
> IMO, we should fix this as {{TestRMEmbeddedElector.LOG}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3792) Test case failures in TestDistributedShell after changes for subjira's of YARN-2928

2015-06-12 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3792:

Description: 
# encountered [testcase 
failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] 
which was happening even without the patch modifications in YARN-3044
TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow
TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow
TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression
# while testing locally TestDistributedShell intermittently fails for the 
vmem-Pmem ratio, hence we need to increase it



  was:
encountered [testcase 
failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] 
which was happening even without the patch modifications in YARN-3044

TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow
TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow
TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression


> Test case failures in TestDistributedShell after changes for subjira's of 
> YARN-2928
> ---
>
> Key: YARN-3792
> URL: https://issues.apache.org/jira/browse/YARN-3792
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>
> # encountered [testcase 
> failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] 
> which was happening even without the patch modifications in YARN-3044
> TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow
> TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow
> TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression
> # while testing locally TestDistributedShell intermittently fails for the 
> vmem-Pmem ratio, hence we need to increase it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3794) TestRMEmbeddedElector fails because of ambiguous LOG reference

2015-06-12 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583188#comment-14583188
 ] 

Chengbing Liu commented on YARN-3794:
-

Thanks [~devaraj.k] for committing!

> TestRMEmbeddedElector fails because of ambiguous LOG reference
> --
>
> Key: YARN-3794
> URL: https://issues.apache.org/jira/browse/YARN-3794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.8.0
>
> Attachments: YARN-3794.01.patch
>
>
> After YARN-2921, {{MockRM}} has also a {{LOG}} field. Therefore {{LOG}} in 
> the following code snippet is ambiguous.
> {code}
> protected AdminService createAdminService() {
>   return new AdminService(MockRMWithElector.this, getRMContext()) {
> @Override
> protected EmbeddedElectorService createEmbeddedElectorService() {
>   return new EmbeddedElectorService(getRMContext()) {
> @Override
> public void becomeActive() throws
> ServiceFailedException {
>   try {
> callbackCalled.set(true);
> LOG.info("Callback called. Sleeping now");
> Thread.sleep(delayMs);
> LOG.info("Sleep done");
>   } catch (InterruptedException e) {
> e.printStackTrace();
>   }
>   super.becomeActive();
> }
>   };
> }
>   };
> }
> {code}
> Eclipse gives the following error:
> {quote}
> The field LOG is defined in an inherited type and an enclosing scope
> {quote}
> IMO, we should fix this as {{TestRMEmbeddedElector.LOG}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

2015-06-12 Thread Raju Bairishetti (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583189#comment-14583189
 ] 

Raju Bairishetti commented on YARN-3644:


[~amareshwari] [~Naganarasimha] Thanks for the review and comments.

[~Naganarasimha] Yes,  this jira is only to make NM wait for RM.

> Node manager shuts down if unable to connect with RM
> 
>
> Key: YARN-3644
> URL: https://issues.apache.org/jira/browse/YARN-3644
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Srikanth Sundarrajan
>Assignee: Raju Bairishetti
> Attachments: YARN-3644.001.patch, YARN-3644.patch
>
>
> When NM is unable to connect to RM, NM shuts itself down.
> {code}
>   } catch (ConnectException e) {
> //catch and throw the exception if tried MAX wait time to connect 
> RM
> dispatcher.getEventHandler().handle(
> new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
> throw new YarnRuntimeException(e);
> {code}
> In large clusters, if RM is down for maintenance for longer period, all the 
> NMs shuts themselves down, requiring additional work to bring up the NMs.
> Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side 
> effects, where non connection failures are being retried infinitely by all 
> YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3798) RM shutdown with NoNode exception while updating appAttempt on zk

2015-06-12 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-3798:
--

 Summary: RM shutdown with NoNode exception while updating 
appAttempt on zk
 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt


RM going down with NoNode exception during create of znode for appattempt

*Please find the exception logs*

{code}
2015-06-09 10:09:44,732 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2015-06-09 10:09:44,732 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2015-06-09 10:09:44,886 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
Exception while executing a ZK operation.
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
at java.lang.Thread.run(Thread.java:745)
2015-06-09 10:09:44,887 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
out ZK retries. Giving up!
2015-06-09 10:09:44,887 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
updating appAttempt: appattempt_1433764310492_7152_01
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937

[jira] [Assigned] (YARN-3798) RM shutdown with NoNode exception while updating appAttempt on zk

2015-06-12 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-3798:
--

Assignee: Varun Saxena

> RM shutdown with NoNode exception while updating appAttempt on zk
> -
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery

[jira] [Commented] (YARN-3794) TestRMEmbeddedElector fails because of ambiguous LOG reference

2015-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583237#comment-14583237
 ] 

Hudson commented on YARN-3794:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #226 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/226/])
YARN-3794. TestRMEmbeddedElector fails because of ambiguous LOG reference. 
(devaraj: rev d8dcfa98e3ca6a6fea414fd503589bb83b7a9c51)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java


> TestRMEmbeddedElector fails because of ambiguous LOG reference
> --
>
> Key: YARN-3794
> URL: https://issues.apache.org/jira/browse/YARN-3794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.8.0
>
> Attachments: YARN-3794.01.patch
>
>
> After YARN-2921, {{MockRM}} has also a {{LOG}} field. Therefore {{LOG}} in 
> the following code snippet is ambiguous.
> {code}
> protected AdminService createAdminService() {
>   return new AdminService(MockRMWithElector.this, getRMContext()) {
> @Override
> protected EmbeddedElectorService createEmbeddedElectorService() {
>   return new EmbeddedElectorService(getRMContext()) {
> @Override
> public void becomeActive() throws
> ServiceFailedException {
>   try {
> callbackCalled.set(true);
> LOG.info("Callback called. Sleeping now");
> Thread.sleep(delayMs);
> LOG.info("Sleep done");
>   } catch (InterruptedException e) {
> e.printStackTrace();
>   }
>   super.becomeActive();
> }
>   };
> }
>   };
> }
> {code}
> Eclipse gives the following error:
> {quote}
> The field LOG is defined in an inherited type and an enclosing scope
> {quote}
> IMO, we should fix this as {{TestRMEmbeddedElector.LOG}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3794) TestRMEmbeddedElector fails because of ambiguous LOG reference

2015-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583246#comment-14583246
 ] 

Hudson commented on YARN-3794:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #956 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/956/])
YARN-3794. TestRMEmbeddedElector fails because of ambiguous LOG reference. 
(devaraj: rev d8dcfa98e3ca6a6fea414fd503589bb83b7a9c51)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java


> TestRMEmbeddedElector fails because of ambiguous LOG reference
> --
>
> Key: YARN-3794
> URL: https://issues.apache.org/jira/browse/YARN-3794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.8.0
>
> Attachments: YARN-3794.01.patch
>
>
> After YARN-2921, {{MockRM}} has also a {{LOG}} field. Therefore {{LOG}} in 
> the following code snippet is ambiguous.
> {code}
> protected AdminService createAdminService() {
>   return new AdminService(MockRMWithElector.this, getRMContext()) {
> @Override
> protected EmbeddedElectorService createEmbeddedElectorService() {
>   return new EmbeddedElectorService(getRMContext()) {
> @Override
> public void becomeActive() throws
> ServiceFailedException {
>   try {
> callbackCalled.set(true);
> LOG.info("Callback called. Sleeping now");
> Thread.sleep(delayMs);
> LOG.info("Sleep done");
>   } catch (InterruptedException e) {
> e.printStackTrace();
>   }
>   super.becomeActive();
> }
>   };
> }
>   };
> }
> {code}
> Eclipse gives the following error:
> {quote}
> The field LOG is defined in an inherited type and an enclosing scope
> {quote}
> IMO, we should fix this as {{TestRMEmbeddedElector.LOG}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3705) forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state

2015-06-12 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583378#comment-14583378
 ] 

Masatake Iwasaki commented on YARN-3705:


YARN-3790 is addressing the failure of TestWorkPreservingRMRestart.

> forcemanual transitionToStandby in RM-HA automatic-failover mode should 
> change elector state
> 
>
> Key: YARN-3705
> URL: https://issues.apache.org/jira/browse/YARN-3705
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
> Attachments: YARN-3705.001.patch
>
>
> Executing {{rmadmin -transitionToStandby --forcemanual}} in 
> automatic-failover.enabled mode makes ResouceManager standby while keeping 
> the state of ActiveStandbyElector. It should make elector to quit and rejoin 
> in order to enable other candidates to promote, otherwise forcemanual 
> transition should not be allowed in automatic-failover mode in order to avoid 
> confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3794) TestRMEmbeddedElector fails because of ambiguous LOG reference

2015-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583389#comment-14583389
 ] 

Hudson commented on YARN-3794:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2172 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2172/])
YARN-3794. TestRMEmbeddedElector fails because of ambiguous LOG reference. 
(devaraj: rev d8dcfa98e3ca6a6fea414fd503589bb83b7a9c51)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* hadoop-yarn-project/CHANGES.txt


> TestRMEmbeddedElector fails because of ambiguous LOG reference
> --
>
> Key: YARN-3794
> URL: https://issues.apache.org/jira/browse/YARN-3794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.8.0
>
> Attachments: YARN-3794.01.patch
>
>
> After YARN-2921, {{MockRM}} has also a {{LOG}} field. Therefore {{LOG}} in 
> the following code snippet is ambiguous.
> {code}
> protected AdminService createAdminService() {
>   return new AdminService(MockRMWithElector.this, getRMContext()) {
> @Override
> protected EmbeddedElectorService createEmbeddedElectorService() {
>   return new EmbeddedElectorService(getRMContext()) {
> @Override
> public void becomeActive() throws
> ServiceFailedException {
>   try {
> callbackCalled.set(true);
> LOG.info("Callback called. Sleeping now");
> Thread.sleep(delayMs);
> LOG.info("Sleep done");
>   } catch (InterruptedException e) {
> e.printStackTrace();
>   }
>   super.becomeActive();
> }
>   };
> }
>   };
> }
> {code}
> Eclipse gives the following error:
> {quote}
> The field LOG is defined in an inherited type and an enclosing scope
> {quote}
> IMO, we should fix this as {{TestRMEmbeddedElector.LOG}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2015-06-12 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583404#comment-14583404
 ] 

Wei Yan commented on YARN-2194:
---

[~kasha], [~sidharta-s], thanks for the comments. Looking into it.

> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Critical
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3798) RM shutdown with NoNode exception while updating appAttempt on zk

2015-06-12 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583413#comment-14583413
 ] 

Naganarasimha G R commented on YARN-3798:
-

hi [~bibinchundatt] & [~varun_saxena],
i think we should retry again before making the job fail, thoughts ?


> RM shutdown with NoNode exception while updating appAttempt on zk
> -
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanag

[jira] [Commented] (YARN-3798) RM shutdown with NoNode exception while updating appAttempt on zk

2015-06-12 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583433#comment-14583433
 ] 

Varun Saxena commented on YARN-3798:


We do retry a configurable number of times.

> RM shutdown with NoNode exception while updating appAttempt on zk
> -
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)

[jira] [Commented] (YARN-3798) RM shutdown with NoNode exception while updating appAttempt on zk

2015-06-12 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583438#comment-14583438
 ] 

Varun Saxena commented on YARN-3798:


Just to elaborate further, this issue comes because of Zookeeper being in an 
inconsistent state. 
This is because one of the zookeeper instances goes down.

The application node doesnt exist because Zookeeper instance hasn't yet synced 
the application node.
Probably on first failure, we can try and make a call to {{sync()}} to get 
consistent data from zookeeper. Or we can catch the exception and fail 
job(After retries).
Because IMHO RM should not go down.
Thoughts ?

> RM shutdown with NoNode exception while updating appAttempt on zk
> -
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)

[jira] [Commented] (YARN-3798) RM shutdown with NoNode exception while updating appAttempt on zk

2015-06-12 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583440#comment-14583440
 ] 

Varun Saxena commented on YARN-3798:


I meant "The application node doesnt exist because the new Zookeeper instance 
client connects to hasn't yet synced the application node."

> RM shutdown with NoNode exception while updating appAttempt on zk
> -
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.

[jira] [Commented] (YARN-3794) TestRMEmbeddedElector fails because of ambiguous LOG reference

2015-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583442#comment-14583442
 ] 

Hudson commented on YARN-3794:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2154 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2154/])
YARN-3794. TestRMEmbeddedElector fails because of ambiguous LOG reference. 
(devaraj: rev d8dcfa98e3ca6a6fea414fd503589bb83b7a9c51)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* hadoop-yarn-project/CHANGES.txt


> TestRMEmbeddedElector fails because of ambiguous LOG reference
> --
>
> Key: YARN-3794
> URL: https://issues.apache.org/jira/browse/YARN-3794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.8.0
>
> Attachments: YARN-3794.01.patch
>
>
> After YARN-2921, {{MockRM}} has also a {{LOG}} field. Therefore {{LOG}} in 
> the following code snippet is ambiguous.
> {code}
> protected AdminService createAdminService() {
>   return new AdminService(MockRMWithElector.this, getRMContext()) {
> @Override
> protected EmbeddedElectorService createEmbeddedElectorService() {
>   return new EmbeddedElectorService(getRMContext()) {
> @Override
> public void becomeActive() throws
> ServiceFailedException {
>   try {
> callbackCalled.set(true);
> LOG.info("Callback called. Sleeping now");
> Thread.sleep(delayMs);
> LOG.info("Sleep done");
>   } catch (InterruptedException e) {
> e.printStackTrace();
>   }
>   super.becomeActive();
> }
>   };
> }
>   };
> }
> {code}
> Eclipse gives the following error:
> {quote}
> The field LOG is defined in an inherited type and an enclosing scope
> {quote}
> IMO, we should fix this as {{TestRMEmbeddedElector.LOG}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3799) [JDK8] Fix javadoc errors caused by incorrect or illegal tags in hadoop-yarn-common

2015-06-12 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created YARN-3799:
---

 Summary: [JDK8] Fix javadoc errors caused by incorrect or illegal 
tags in hadoop-yarn-common
 Key: YARN-3799
 URL: https://issues.apache.org/jira/browse/YARN-3799
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Akira AJISAKA


{{mvn package -Pdist -DskipTests}} fails with JDK8 by illegal tag.
{code}
[ERROR] 
/home/centos/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java:829:
 error: @param name not found
[ERROR] * @param nodelabels
[ERROR] ^
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3799) [JDK8] Fix javadoc errors caused by incorrect or illegal tags in hadoop-yarn-common

2015-06-12 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA reassigned YARN-3799:
---

Assignee: Akira AJISAKA

> [JDK8] Fix javadoc errors caused by incorrect or illegal tags in 
> hadoop-yarn-common
> ---
>
> Key: YARN-3799
> URL: https://issues.apache.org/jira/browse/YARN-3799
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>
> {{mvn package -Pdist -DskipTests}} fails with JDK8 by illegal tag.
> {code}
> [ERROR] 
> /home/centos/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java:829:
>  error: @param name not found
> [ERROR] * @param nodelabels
> [ERROR] ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3794) TestRMEmbeddedElector fails because of ambiguous LOG reference

2015-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583509#comment-14583509
 ] 

Hudson commented on YARN-3794:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #215 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/215/])
YARN-3794. TestRMEmbeddedElector fails because of ambiguous LOG reference. 
(devaraj: rev d8dcfa98e3ca6a6fea414fd503589bb83b7a9c51)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* hadoop-yarn-project/CHANGES.txt


> TestRMEmbeddedElector fails because of ambiguous LOG reference
> --
>
> Key: YARN-3794
> URL: https://issues.apache.org/jira/browse/YARN-3794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.8.0
>
> Attachments: YARN-3794.01.patch
>
>
> After YARN-2921, {{MockRM}} has also a {{LOG}} field. Therefore {{LOG}} in 
> the following code snippet is ambiguous.
> {code}
> protected AdminService createAdminService() {
>   return new AdminService(MockRMWithElector.this, getRMContext()) {
> @Override
> protected EmbeddedElectorService createEmbeddedElectorService() {
>   return new EmbeddedElectorService(getRMContext()) {
> @Override
> public void becomeActive() throws
> ServiceFailedException {
>   try {
> callbackCalled.set(true);
> LOG.info("Callback called. Sleeping now");
> Thread.sleep(delayMs);
> LOG.info("Sleep done");
>   } catch (InterruptedException e) {
> e.printStackTrace();
>   }
>   super.becomeActive();
> }
>   };
> }
>   };
> }
> {code}
> Eclipse gives the following error:
> {quote}
> The field LOG is defined in an inherited type and an enclosing scope
> {quote}
> IMO, we should fix this as {{TestRMEmbeddedElector.LOG}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3799) [JDK8] Fix javadoc errors caused by incorrect or illegal tags

2015-06-12 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-3799:

Affects Version/s: 2.8.0

> [JDK8] Fix javadoc errors caused by incorrect or illegal tags
> -
>
> Key: YARN-3799
> URL: https://issues.apache.org/jira/browse/YARN-3799
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>
> {{mvn package -Pdist -DskipTests}} fails with JDK8 by illegal tag.
> {code}
> [ERROR] 
> /home/centos/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java:829:
>  error: @param name not found
> [ERROR] * @param nodelabels
> [ERROR] ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3799) [JDK8] Fix javadoc errors caused by incorrect or illegal tags

2015-06-12 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-3799:

Summary: [JDK8] Fix javadoc errors caused by incorrect or illegal tags  
(was: [JDK8] Fix javadoc errors caused by incorrect or illegal tags in 
hadoop-yarn-common)

> [JDK8] Fix javadoc errors caused by incorrect or illegal tags
> -
>
> Key: YARN-3799
> URL: https://issues.apache.org/jira/browse/YARN-3799
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>
> {{mvn package -Pdist -DskipTests}} fails with JDK8 by illegal tag.
> {code}
> [ERROR] 
> /home/centos/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java:829:
>  error: @param name not found
> [ERROR] * @param nodelabels
> [ERROR] ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure

2015-06-12 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583524#comment-14583524
 ] 

Varun Vasudev commented on YARN-3591:
-

Sorry for the late response. In my opinion, there's little benefit to storing 
the bad local dirs in the state store. We can just pass the 
LocalDirHandlerService to LocalResourcesTrackerImpl when it's created and 
incoming requests can be checked against the know error dirs in the 
isResourcePresent function.

[~lavkesh], would that solve the problem?

> Resource Localisation on a bad disk causes subsequent containers failure 
> -
>
> Key: YARN-3591
> URL: https://issues.apache.org/jira/browse/YARN-3591
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
> Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, 
> YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch
>
>
> It happens when a resource is localised on the disk, after localising that 
> disk has gone bad. NM keeps paths for localised resources in memory.  At the 
> time of resource request isResourcePresent(rsrc) will be called which calls 
> file.exists() on the localised path.
> In some cases when disk has gone bad, inodes are stilled cached and 
> file.exists() returns true. But at the time of reading, file will not open.
> Note: file.exists() actually calls stat64 natively which returns true because 
> it was able to find inode information from the OS.
> A proposal is to call file.list() on the parent path of the resource, which 
> will call open() natively. If the disk is good it should return an array of 
> paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3794) TestRMEmbeddedElector fails because of ambiguous LOG reference

2015-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583546#comment-14583546
 ] 

Hudson commented on YARN-3794:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #224 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/224/])
YARN-3794. TestRMEmbeddedElector fails because of ambiguous LOG reference. 
(devaraj: rev d8dcfa98e3ca6a6fea414fd503589bb83b7a9c51)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* hadoop-yarn-project/CHANGES.txt


> TestRMEmbeddedElector fails because of ambiguous LOG reference
> --
>
> Key: YARN-3794
> URL: https://issues.apache.org/jira/browse/YARN-3794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.8.0
>
> Attachments: YARN-3794.01.patch
>
>
> After YARN-2921, {{MockRM}} has also a {{LOG}} field. Therefore {{LOG}} in 
> the following code snippet is ambiguous.
> {code}
> protected AdminService createAdminService() {
>   return new AdminService(MockRMWithElector.this, getRMContext()) {
> @Override
> protected EmbeddedElectorService createEmbeddedElectorService() {
>   return new EmbeddedElectorService(getRMContext()) {
> @Override
> public void becomeActive() throws
> ServiceFailedException {
>   try {
> callbackCalled.set(true);
> LOG.info("Callback called. Sleeping now");
> Thread.sleep(delayMs);
> LOG.info("Sleep done");
>   } catch (InterruptedException e) {
> e.printStackTrace();
>   }
>   super.becomeActive();
> }
>   };
> }
>   };
> }
> {code}
> Eclipse gives the following error:
> {quote}
> The field LOG is defined in an inherited type and an enclosing scope
> {quote}
> IMO, we should fix this as {{TestRMEmbeddedElector.LOG}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3768) Index out of range exception with environment variables without values

2015-06-12 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583659#comment-14583659
 ] 

Xuan Gong commented on YARN-3768:
-

Thanks for working on this. [~zxu]. Could you also add a testcase which can 
verify if we give a bad environment, it will not throw out the exception ?

Also, why do we need to keep trailing empty strings ?

> Index out of range exception with environment variables without values
> --
>
> Key: YARN-3768
> URL: https://issues.apache.org/jira/browse/YARN-3768
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.5.0
>Reporter: Joe Ferner
>Assignee: zhihai xu
> Attachments: YARN-3768.000.patch
>
>
> Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range 
> exception occurs if an environment variable is encountered without a value.
> I believe this occurs because java will not return empty strings from the 
> split method. Similar to this 
> http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3789) Refactor logs for LeafQueue#activateApplications() to remove duplicate logging

2015-06-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583742#comment-14583742
 ] 

Bibin A Chundatt commented on YARN-3789:


[~rohithsharma] and [~devaraj.k] .Please review the patch submitted. As 
mentioned earlier the check style issue  seems unrelated and testcase addition 
not required since its just log updation.

> Refactor logs for LeafQueue#activateApplications() to remove duplicate logging
> --
>
> Key: YARN-3789
> URL: https://issues.apache.org/jira/browse/YARN-3789
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-3789.patch, 0002-YARN-3789.patch, 
> 0003-YARN-3789.patch
>
>
> Duplicate logging from resource manager
> during am limit check for each application
> {code}
> 015-06-09 17:32:40,019 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> not starting application as amIfStarted exceeds amLimit
> 2015-06-09 17:32:40,019 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> not starting application as amIfStarted exceeds amLimit
> 2015-06-09 17:32:40,019 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> not starting application as amIfStarted exceeds amLimit
> 2015-06-09 17:32:40,019 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> not starting application as amIfStarted exceeds amLimit
> 2015-06-09 17:32:40,019 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> not starting application as amIfStarted exceeds amLimit
> 2015-06-09 17:32:40,019 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> not starting application as amIfStarted exceeds amLimit
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-06-12 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583839#comment-14583839
 ] 

Li Lu commented on YARN-3051:
-

Hi [~varun_saxena], thanks for the update! Some of my quick thoughts for 
discussion...
# I just realized in this JIRA we are creating "backing storage read interface 
for ATS readers", but not the user facing ATS reader APIs. I believe these two 
topics are different: in this JIRA we're "wiring up" the storage systems, but 
in ATS reader APIs, we need to deal with user requirements. This said, I think 
the main design goal here is to provide a small set of generic interfaces so 
that we can easily connect them to our writers. We may want to have some brief 
ideas of the potential user facing features (as [~zjshen] mentioned in a 
previous comment), but I'm not sure if we need to implement them before we make 
a concrete design for the storage read interface. 
# If my understanding in point 1 is right, then perhaps we do not need to quite 
worry about the huge list of nulls. Of course, on code level we may want to to 
some cosmetic fixes, but since those interfaces are not user facing, making 
them more general may be more important I think?
# I still think when doing the v2 interface design, it is fine, if not even 
beneficial, to start from scratch rather than thinking about the existing v1 
design. If we're not implementing some v1 features as first-class in v2 storage 
implementations, maybe we can simply leave them out from the interfaces to 
storage level? (I assume we'll have an intermediate layer to do the wire up 
between our user facing reader APIs and the storage interfaces. )
# bq. Now from backing storage implementation viewpoint, would it make more 
sense to let these query params be passed as strings or do datatype conversion ?
I've got no strong preference on this. Leaving them as a generic type (like 
string) gives the storage layer more freedom to interpret the data, but the 
readers need to ensure they understand the types by themselves. 

BTW, could you please briefly skim through the list of Jenkins warnings and see 
if they're critical? Thanks! 

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051-YARN-2928.003.patch, 
> YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, 
> YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583854#comment-14583854
 ] 

Jian He commented on YARN-3017:
---

IMHO, although this theoretically doesn't break compatibility, but may be so in 
practice for some existing 3rd party tools. Also, if cluster is rolling 
upgraded from 2.6, then we have the same containerId printed with two different 
formats, which makes debugging process harder.

I don't know why containerId was originally written in that way to only print 2 
digits, but one reason I can think of is that in reality we won't see a large 
number of attempt failures (let alone the max-attempts is set to 2 by defaults).

> ContainerID in ResourceManager Log Has Slightly Different Format From 
> AppAttemptID
> --
>
> Key: YARN-3017
> URL: https://issues.apache.org/jira/browse/YARN-3017
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: MUFEED USMAN
>Assignee: Mohammad Shahid Khan
>Priority: Minor
>  Labels: PatchAvailable
> Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch, 
> YARN-3017_3.patch
>
>
> Not sure if this should be filed as a bug or not.
> In the ResourceManager log in the events surrounding the creation of a new
> application attempt,
> ...
> ...
> 2014-11-14 17:45:37,258 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
> masterappattempt_1412150883650_0001_02
> ...
> ...
> The application attempt has the ID format "_1412150883650_0001_02".
> Whereas the associated ContainerID goes by "_1412150883650_0001_02_".
> ...
> ...
> 2014-11-14 17:45:37,260 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
> up
> container Container: [ContainerId: container_1412150883650_0001_02_01,
> NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource:  vCores:1,
> disks:0.0>, Priority: 0, Token: Token { kind: ContainerToken, service:
> 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
> ...
> ...
> Curious to know if this is kept like that for a reason. If not while using
> filtering tools to, say, grep events surrounding a specific attempt by the
> numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-06-12 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583878#comment-14583878
 ] 

Li Lu commented on YARN-3051:
-

I verified locally that the pre-patch findbugs warnings no longer exists. 

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051-YARN-2928.003.patch, 
> YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, 
> YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3714) AM proxy filter can not get proper default proxy address if RM-HA is enabled

2015-06-12 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583890#comment-14583890
 ] 

Xuan Gong commented on YARN-3714:
-

[~iwasakims] Did you see this issue in a real cluster environment ? As far as I 
know, when we start RM, and we only set up yarn.resourcemanager.hostname.rm-id, 
we will set this for all the service address, including 
yarn.resourcemanager.webapp.address.rm-id and 
yarn.resourcemanager.webapp.https.address.rm-id.

{code}
// Set HA configuration should be done before login
this.rmContext.setHAEnabled(HAUtil.isHAEnabled(this.conf));
if (this.rmContext.isHAEnabled()) {
  HAUtil.verifyAndSetConfiguration(this.conf);
}
{code}


> AM proxy filter can not get proper default proxy address if RM-HA is enabled
> 
>
> Key: YARN-3714
> URL: https://issues.apache.org/jira/browse/YARN-3714
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: YARN-3714.001.patch
>
>
> Default proxy address could not be got without setting 
> {{yarn.resourcemanager.webapp.address._rm-id_}} and/or 
> {{yarn.resourcemanager.webapp.https.address._rm-id_}} explicitly if RM-HA is 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-12 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583892#comment-14583892
 ] 

Xuan Gong commented on YARN-3017:
-

+1 for jian's comment. We do not need to change this.

> ContainerID in ResourceManager Log Has Slightly Different Format From 
> AppAttemptID
> --
>
> Key: YARN-3017
> URL: https://issues.apache.org/jira/browse/YARN-3017
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: MUFEED USMAN
>Assignee: Mohammad Shahid Khan
>Priority: Minor
>  Labels: PatchAvailable
> Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch, 
> YARN-3017_3.patch
>
>
> Not sure if this should be filed as a bug or not.
> In the ResourceManager log in the events surrounding the creation of a new
> application attempt,
> ...
> ...
> 2014-11-14 17:45:37,258 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
> masterappattempt_1412150883650_0001_02
> ...
> ...
> The application attempt has the ID format "_1412150883650_0001_02".
> Whereas the associated ContainerID goes by "_1412150883650_0001_02_".
> ...
> ...
> 2014-11-14 17:45:37,260 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
> up
> container Container: [ContainerId: container_1412150883650_0001_02_01,
> NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource:  vCores:1,
> disks:0.0>, Priority: 0, Token: Token { kind: ContainerToken, service:
> 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
> ...
> ...
> Curious to know if this is kept like that for a reason. If not while using
> filtering tools to, say, grep events surrounding a specific attempt by the
> numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.

2015-06-12 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583903#comment-14583903
 ] 

Xuan Gong commented on YARN-3543:
-

[~rohithsharma] 
Could we not directly change the ApplicationReport.newInstance() ? This will 
break other applications, such as Tez.
We should create compatible newInstance change instead.

> ApplicationReport should be able to tell whether the Application is AM 
> managed or not. 
> ---
>
> Key: YARN-3543
> URL: https://issues.apache.org/jira/browse/YARN-3543
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.6.0
>Reporter: Spandan Dutta
>Assignee: Rohith
> Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 
> 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 
> 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, 
> YARN-3543-AH.PNG, YARN-3543-RM.PNG
>
>
> Currently we can know whether the application submitted by the user is AM 
> managed from the applicationSubmissionContext. This can be only done  at the 
> time when the user submits the job. We should have access to this info from 
> the ApplicationReport as well so that we can check whether an app is AM 
> managed or not anytime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2497) Changes for fair scheduler to support allocate resource respect labels

2015-06-12 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583918#comment-14583918
 ] 

Naganarasimha G R commented on YARN-2497:
-

Hi [~yufeldman],
 I would like to work on this jira if you have not yet started working, 
please inform can i take over this jira ?

> Changes for fair scheduler to support allocate resource respect labels
> --
>
> Key: YARN-2497
> URL: https://issues.apache.org/jira/browse/YARN-2497
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Yuliya Feldman
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure

2015-06-12 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583925#comment-14583925
 ] 

zhihai xu commented on YARN-3591:
-

Hi [~vvasudev], thanks for the suggestion.
It looks like your suggestion is similar as [~lavkesh]'s original patch 
0001-YARN-3591.patch. Compared to [~lavkesh]'s original patch, your suggestion 
sometimes may not detect the disk failure because LocalDirHandlerService only 
calls {{checkDirs}} every 2 minutes by default and if the disk failure happens 
right after {{checkDirs}} is called and before {{isResourcePresent}} is called, 
your suggestion won't detect the disk failure but [~lavkesh]'s original patch 
can detect the disk failure. So it looks like [~lavkesh]'s original patch is 
better than your suggestion. It is my understanding, and please correct me if I 
am wrong.

> Resource Localisation on a bad disk causes subsequent containers failure 
> -
>
> Key: YARN-3591
> URL: https://issues.apache.org/jira/browse/YARN-3591
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
> Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, 
> YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch
>
>
> It happens when a resource is localised on the disk, after localising that 
> disk has gone bad. NM keeps paths for localised resources in memory.  At the 
> time of resource request isResourcePresent(rsrc) will be called which calls 
> file.exists() on the localised path.
> In some cases when disk has gone bad, inodes are stilled cached and 
> file.exists() returns true. But at the time of reading, file will not open.
> Note: file.exists() actually calls stat64 natively which returns true because 
> it was able to find inode information from the OS.
> A proposal is to call file.list() on the parent path of the resource, which 
> will call open() natively. If the disk is good it should return an array of 
> paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-06-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584034#comment-14584034
 ] 

Zhijie Shen commented on YARN-3044:
---

[~Naganarasimha], thanks for fixing the race, but 
TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow seems to be 
caused by your new patch. Can you double check?

> [Event producers] Implement RM writing app lifecycle events to ATS
> --
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3044-YARN-2928.004.patch, 
> YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, 
> YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, 
> YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, 
> YARN-3044-YARN-2928.011.patch, YARN-3044.20150325-1.patch, 
> YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-06-12 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584073#comment-14584073
 ] 

Naganarasimha G R commented on YARN-3044:
-

hi [~zjshen], i have already checked it, i am able to reproduce the failure 
even without my patch applied, also i had ran this test case individually with 
my changes and it was fine !  Based on my analysis its caused due to multiple 
issues : 
# setting the timeline auxillary service only for 
{{testDSShellWithoutDomainV2CustomizedFlow}} & 
{{testDSShellWithoutDomainV2CustomizedFlow}} in 
{{TestDistributedShell.setupInternal}} but 
yarn.nodemanager.container-metrics.enable is enabled by default hence metrics 
are always trying to be published.
# {{TimelineClientImpl.putObjects}} once finishes all the retry attempts in 
{{pollTimelineServiceAddress}} and still the {{timelineServiceAddress}} is null 
(not updated) then the next while loop should be run only if the 
{{timelineServiceAddress}} is not null. if not nullpointer exception is thrown 
in {{constructResURI}}

On Null pointer exception all the threads gets expired and thread pool executor 
in ContainersMonitorImpl starts rejecting to launch threads hence the container 
metrics are not getting launched. Will try to provide the patch @ the earliest. 

> [Event producers] Implement RM writing app lifecycle events to ATS
> --
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3044-YARN-2928.004.patch, 
> YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, 
> YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, 
> YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, 
> YARN-3044-YARN-2928.011.patch, YARN-3044.20150325-1.patch, 
> YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3714) AM proxy filter can not get proper default proxy address if RM-HA is enabled

2015-06-12 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584099#comment-14584099
 ] 

Masatake Iwasaki commented on YARN-3714:


Thanks for the comment, [~xgong]. {{HAUtil.verifyAndSetConfiguration}} works 
only on the RM node. AMs running in slave nodes also need to know the RM webapp 
addresses.

> AM proxy filter can not get proper default proxy address if RM-HA is enabled
> 
>
> Key: YARN-3714
> URL: https://issues.apache.org/jira/browse/YARN-3714
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: YARN-3714.001.patch
>
>
> Default proxy address could not be got without setting 
> {{yarn.resourcemanager.webapp.address._rm-id_}} and/or 
> {{yarn.resourcemanager.webapp.https.address._rm-id_}} explicitly if RM-HA is 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3798) RM shutdown with NoNode exception while updating appAttempt on zk

2015-06-12 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584141#comment-14584141
 ] 

Tsuyoshi Ozawa commented on YARN-3798:
--

[~varun_saxena], [~bibinchundatt] thank you for taking this. One of our users 
also faced same issue.
sync() is effective only when accessing from multiple clients. ZKRMStateStore 
has only one client, so I think it's not effective in this case.

BTW, expected behaviour can be done by catching NoNodeException, but we should 
check why and when it happens.

> RM shutdown with NoNode exception while updating appAttempt on zk
> -
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-06-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584148#comment-14584148
 ] 

Zhijie Shen commented on YARN-3051:
---

bq. APIs' for querying individual entity/flow/flow run/user and APIs' for 
querying a set of entities/flow runs/flows/users. APIs' such a set of 
flows/users will contain aggregated data. The reason for separate endpoints for 
entities, flows, users,etc. is because of the different tables in HBase/Phoenix 
schema.

I think we don't store the first class citizen entity in a different way and in 
different tables (Li/Vrushali, correct me If I'm wrong). When fetching an 
entity, it doesn't matter it is a customized entity or a predefined entity such 
as ApplicationEntity.

In fact, we have two level of interfaces. One is the storage interface and the 
other is user-oriented interface. I think it's a good idea to let the 
user-oriented interface to have more specific/advanced APIs to handle the 
special entity objects, the storage interface could have fewer, more uniformed 
APIs to reuse the common logic as much as possible. Thoughts?

bq. Every query param will be received as a String, even timestamp. Now from 
backing storage implementation viewpoint, would it make more sense to let these 
query params be passed as strings or do datatype conversion ?

I think we need to take the generic type as the param. If it's transformed to a 
string, it is likely to be difficult to recover the original type information. 
For example, when we see a string "true", how do we know whether it used to be 
a "true" string too or a true boolean. Also, "1234567" is a number or is a 
string that represents a vehicle license.


> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051-YARN-2928.003.patch, 
> YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, 
> YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-12 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584154#comment-14584154
 ] 

Tsuyoshi Ozawa commented on YARN-3017:
--

Thanks committers for your comments. This change can influence large parts 
unexpectedly. The inconsistency is dirty, but it works actually. We can keep 
this format for keeping compatibility.

> ContainerID in ResourceManager Log Has Slightly Different Format From 
> AppAttemptID
> --
>
> Key: YARN-3017
> URL: https://issues.apache.org/jira/browse/YARN-3017
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: MUFEED USMAN
>Assignee: Mohammad Shahid Khan
>Priority: Minor
>  Labels: PatchAvailable
> Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch, 
> YARN-3017_3.patch
>
>
> Not sure if this should be filed as a bug or not.
> In the ResourceManager log in the events surrounding the creation of a new
> application attempt,
> ...
> ...
> 2014-11-14 17:45:37,258 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
> masterappattempt_1412150883650_0001_02
> ...
> ...
> The application attempt has the ID format "_1412150883650_0001_02".
> Whereas the associated ContainerID goes by "_1412150883650_0001_02_".
> ...
> ...
> 2014-11-14 17:45:37,260 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
> up
> container Container: [ContainerId: container_1412150883650_0001_02_01,
> NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource:  vCores:1,
> disks:0.0>, Priority: 0, Token: Token { kind: ContainerToken, service:
> 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
> ...
> ...
> Curious to know if this is kept like that for a reason. If not while using
> filtering tools to, say, grep events surrounding a specific attempt by the
> numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations

2015-06-12 Thread Ishai Menache (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishai Menache updated YARN-3656:

Attachment: YARN-3656-v1.patch

> LowCost: A Cost-Based Placement Agent for YARN Reservations
> ---
>
> Key: YARN-3656
> URL: https://issues.apache.org/jira/browse/YARN-3656
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager
>Reporter: Ishai Menache
> Attachments: LowCostRayonExternal.pdf, YARN-3656-v1.patch
>
>
> YARN-1051 enables SLA support by allowing users to reserve cluster capacity 
> ahead of time. YARN-1710 introduced a greedy agent for placing user 
> reservations. The greedy agent makes fast placement decisions but at the cost 
> of ignoring the cluster committed resources, which might result in blocking 
> the cluster resources for certain periods of time, and in turn rejecting some 
> arriving jobs.
> We propose LowCost – a new cost-based planning algorithm. LowCost “spreads” 
> the demand of the job throughout the allowed time-window according to a 
> global, load-based cost function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3800) Simplify inmemory state for ReservationAllocation

2015-06-12 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-3800:
---

 Summary: Simplify inmemory state for ReservationAllocation
 Key: YARN-3800
 URL: https://issues.apache.org/jira/browse/YARN-3800
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot


Instead of storing the ReservationRequest we store the Resource for 
allocations, as thats the only thing we need. Ultimately we convert everything 
to resources anyway



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3800) Simplify inmemory state for ReservationAllocation

2015-06-12 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3800:

Attachment: YARN-3800.001.patch

> Simplify inmemory state for ReservationAllocation
> -
>
> Key: YARN-3800
> URL: https://issues.apache.org/jira/browse/YARN-3800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3800.001.patch
>
>
> Instead of storing the ReservationRequest we store the Resource for 
> allocations, as thats the only thing we need. Ultimately we convert 
> everything to resources anyway



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations

2015-06-12 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-3656:
-
Assignee: Jonathan Yaniv

> LowCost: A Cost-Based Placement Agent for YARN Reservations
> ---
>
> Key: YARN-3656
> URL: https://issues.apache.org/jira/browse/YARN-3656
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Ishai Menache
>Assignee: Jonathan Yaniv
>  Labels: capacity-scheduler, resourcemanager
> Attachments: LowCostRayonExternal.pdf, YARN-3656-v1.patch
>
>
> YARN-1051 enables SLA support by allowing users to reserve cluster capacity 
> ahead of time. YARN-1710 introduced a greedy agent for placing user 
> reservations. The greedy agent makes fast placement decisions but at the cost 
> of ignoring the cluster committed resources, which might result in blocking 
> the cluster resources for certain periods of time, and in turn rejecting some 
> arriving jobs.
> We propose LowCost – a new cost-based planning algorithm. LowCost “spreads” 
> the demand of the job throughout the allowed time-window according to a 
> global, load-based cost function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations

2015-06-12 Thread Ishai Menache (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishai Menache updated YARN-3656:

Attachment: lowcostrayonexternal_v2.pdf

new version of the design doc, including a class diagram.

> LowCost: A Cost-Based Placement Agent for YARN Reservations
> ---
>
> Key: YARN-3656
> URL: https://issues.apache.org/jira/browse/YARN-3656
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Ishai Menache
>Assignee: Jonathan Yaniv
>  Labels: capacity-scheduler, resourcemanager
> Attachments: LowCostRayonExternal.pdf, YARN-3656-v1.patch, 
> lowcostrayonexternal_v2.pdf
>
>
> YARN-1051 enables SLA support by allowing users to reserve cluster capacity 
> ahead of time. YARN-1710 introduced a greedy agent for placing user 
> reservations. The greedy agent makes fast placement decisions but at the cost 
> of ignoring the cluster committed resources, which might result in blocking 
> the cluster resources for certain periods of time, and in turn rejecting some 
> arriving jobs.
> We propose LowCost – a new cost-based planning algorithm. LowCost “spreads” 
> the demand of the job throughout the allowed time-window according to a 
> global, load-based cost function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-06-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584189#comment-14584189
 ] 

Zhijie Shen commented on YARN-3044:
---

Interesting. Is it an intermittent test failure? I cannot reproduce it on my 
machine with a clean YARN-2928 branch.

BTW, I tried the new patch on a single node cluster. The race condition is 
fixed.

> [Event producers] Implement RM writing app lifecycle events to ATS
> --
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3044-YARN-2928.004.patch, 
> YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, 
> YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, 
> YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, 
> YARN-3044-YARN-2928.011.patch, YARN-3044.20150325-1.patch, 
> YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-06-12 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584197#comment-14584197
 ] 

Li Lu commented on YARN-3051:
-

bq. APIs' for querying individual entity/flow/flow run/user and APIs' for 
querying a set of entities/flow runs/flows/users. APIs' such a set of 
flows/users will contain aggregated data.
bq. I think we don't store the first class citizen entity in a different way 
and in different tables (Li/Vrushali, correct me If I'm wrong). When fetching 
an entity, it doesn't matter it is a customized entity or a predefined entity 
such as ApplicationEntity.

If we're discussing about storage read interface, why is it harmful to 
explicitly separate interfaces for raw data and aggregated data, as [~zjshen] 
proposed before? We can work on the raw data interface first, when designing 
aggregations. 

bq. If it's transformed to a string, it is likely to be difficult to recover 
the original type information. 

I agree. A follow up concern is, who to maintain, or explain, the type 
information? I assume we need the readers themselves to keep track of this? 

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051-YARN-2928.003.patch, 
> YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, 
> YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2015-06-12 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1198:
--
Assignee: Wangda Tan  (was: Craig Welch)

> Capacity Scheduler headroom calculation does not work as expected
> -
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Wangda Tan
> Attachments: YARN-1198.1.patch, YARN-1198.10.patch, 
> YARN-1198.11-with-1857.patch, YARN-1198.11.patch, 
> YARN-1198.12-with-1857.patch, YARN-1198.2.patch, YARN-1198.3.patch, 
> YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, 
> YARN-1198.8.patch, YARN-1198.9.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for 
> this calculation
> * If a container finishes then headroom for that application will change and 
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the 
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM 
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then 
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom 
> per User per LeafQueue so that everyone gets the same picture (apps belonging 
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications 
> submitted by all users in that queue should be notified of the headroom 
> change.
> * Also today headroom is an absolute number ( I think it should be normalized 
> but then this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2015-06-12 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1680:
--
Assignee: Chen He  (was: Craig Welch)

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Chen He
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3320) Support a Priority OrderingPolicy

2015-06-12 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3320:
--
Assignee: Wangda Tan  (was: Craig Welch)

> Support a Priority OrderingPolicy
> -
>
> Key: YARN-3320
> URL: https://issues.apache.org/jira/browse/YARN-3320
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Craig Welch
>Assignee: Wangda Tan
>
> When [YARN-2004] is complete, bring relevant logic into the OrderingPolicy 
> framework



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2015-06-12 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584266#comment-14584266
 ] 

Craig Welch commented on YARN-1680:
---

[~airbots], unfortunately, I'm having no more luck seeing this through than you 
have had!  I have gone ahead and handed this back to you, if you don't believe 
you'll have time to work on it, you might want to see if [~leftnoteasy] is 
interested in picking it up.  Thanks.

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Chen He
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1039) Add parameter for YARN resource requests to indicate "long lived"

2015-06-12 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1039:
--
Assignee: Vinod Kumar Vavilapalli  (was: Craig Welch)

> Add parameter for YARN resource requests to indicate "long lived"
> -
>
> Key: YARN-1039
> URL: https://issues.apache.org/jira/browse/YARN-1039
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.1.1-beta
>Reporter: Steve Loughran
>Assignee: Vinod Kumar Vavilapalli
> Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch
>
>
> A container request could support a new parameter "long-lived". This could be 
> used by a scheduler that would know not to host the service on a transient 
> (cloud: spot priced) node.
> Schedulers could also decide whether or not to allocate multiple long-lived 
> containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3800) Simplify inmemory state for ReservationAllocation

2015-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584269#comment-14584269
 ] 

Hadoop QA commented on YARN-3800:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 44s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 50s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 57s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 47s | The applied patch generated  7 
new checkstyle issues (total was 54, now 55). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  46m 30s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  85m 50s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12739350/YARN-3800.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / eef7b50 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8242/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8242/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8242/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8242/console |


This message was automatically generated.

> Simplify inmemory state for ReservationAllocation
> -
>
> Key: YARN-3800
> URL: https://issues.apache.org/jira/browse/YARN-3800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3800.001.patch
>
>
> Instead of storing the ReservationRequest we store the Resource for 
> allocations, as thats the only thing we need. Ultimately we convert 
> everything to resources anyway



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate "long lived"

2015-06-12 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584271#comment-14584271
 ] 

Craig Welch commented on YARN-1039:
---

I'll go back to my earlier assertion that I think it's not "duration" we are 
really concerned with here, that is covered in various ways in other places, 
but more the notion of an application type, a "batch" or a "service", with the 
defining characteristic being for the potential of "continuous operation" 
(service) or "unit of work which will run to completion" (batch), and an 
enumeration of "service" and "batch" make sense to me.  In any case, 
[~vinodkv], it seems that there still seems to be enough diversity of opinion 
here to require some ongoing discussion/reconciliation, so I will leave this in 
your capable hands.

> Add parameter for YARN resource requests to indicate "long lived"
> -
>
> Key: YARN-1039
> URL: https://issues.apache.org/jira/browse/YARN-1039
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.1.1-beta
>Reporter: Steve Loughran
>Assignee: Vinod Kumar Vavilapalli
> Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch
>
>
> A container request could support a new parameter "long-lived". This could be 
> used by a scheduler that would know not to host the service on a transient 
> (cloud: spot priced) node.
> Schedulers could also decide whether or not to allocate multiple long-lived 
> containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3510) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness

2015-06-12 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3510:
--
Assignee: Wangda Tan  (was: Craig Welch)

> Create an extension of ProportionalCapacityPreemptionPolicy which preempts a 
> number of containers from each application in a way which respects fairness
> 
>
> Key: YARN-3510
> URL: https://issues.apache.org/jira/browse/YARN-3510
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Craig Welch
>Assignee: Wangda Tan
> Attachments: YARN-3510.2.patch, YARN-3510.3.patch, YARN-3510.5.patch, 
> YARN-3510.6.patch
>
>
> The ProportionalCapacityPreemptionPolicy preempts as many containers from 
> applications as it can during it's preemption run.  For fifo this makes 
> sense, as it is prempting in reverse order & therefore maintaining the 
> primacy of the "oldest".  For fair ordering this does not have the desired 
> effect - instead, it should preempt a number of containers from each 
> application which maintains a fair balance /close to a fair balance between 
> them



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1983) Support heterogeneous container types at runtime on YARN

2015-06-12 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584272#comment-14584272
 ] 

Chris Douglas commented on YARN-1983:
-

(sorry for the delayed reply; missed this)

bq. I was proposing we continue the same without adding a new CLC field. Are we 
both saying the same thing then?

Yeah, I think we agree. We don't need to extend the CLC definition for this use 
case, because it's less invasive to add a composite CE that can inspect the CLC 
and demux on a set of rules.

I scanned the patch on YARN-1964, and maybe I'm being dense but I couldn't find 
the demux. It does some validation using patterns...

> Support heterogeneous container types at runtime on YARN
> 
>
> Key: YARN-1983
> URL: https://issues.apache.org/jira/browse/YARN-1983
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Junping Du
> Attachments: YARN-1983.2.patch, YARN-1983.patch
>
>
> Different container types (default, LXC, docker, VM box, etc.) have different 
> semantics on isolation of security, namespace/env, performance, etc.
> Per discussions in YARN-1964, we have some good thoughts on supporting 
> different types of containers running on YARN and specified by application at 
> runtime which largely enhance YARN's flexibility to meet heterogenous app's 
> requirement on isolation at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3711) Documentation of ResourceManager HA should explain about webapp address configuration

2015-06-12 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-3711:
---
Description: 
There should be explanation about webapp address in addition to RPC address.

AM proxy filter needs explicit definition of 
{{yarn.resourcemanager.webapp.address._rm-id_}} and/or 
{{yarn.resourcemanager.webapp.https.address._rm-id_}} to get proper addresses 
in RM-HA mode.


  was:
There should be explanation about webapp address in addition to RPC address.

AM proxy filter needs explicit definition of 
{{yarn.resourcemanager.webapp.address._rm-id_}} and/or 
{{yarn.resourcemanager.webapp.https.address._rm-id_}} to get proper default 
addresses in RM-HA mode now.



> Documentation of ResourceManager HA should explain about webapp address 
> configuration
> -
>
> Key: YARN-3711
> URL: https://issues.apache.org/jira/browse/YARN-3711
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: YARN-3711.002.patch
>
>
> There should be explanation about webapp address in addition to RPC address.
> AM proxy filter needs explicit definition of 
> {{yarn.resourcemanager.webapp.address._rm-id_}} and/or 
> {{yarn.resourcemanager.webapp.https.address._rm-id_}} to get proper addresses 
> in RM-HA mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3711) Documentation of ResourceManager HA should explain about webapp address configuration

2015-06-12 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584327#comment-14584327
 ] 

Masatake Iwasaki commented on YARN-3711:


Though {{HAUtil.verifyAndSetConfiguration}} updates the configuration for HA on 
RM node, it only cares the rm-id of the node. AMs running on slave nodes need 
to know webapp addresses of all RMs. The URL of application shown in RM UI 
refers to wrong host name without explicit definition of webapp addresses.

> Documentation of ResourceManager HA should explain about webapp address 
> configuration
> -
>
> Key: YARN-3711
> URL: https://issues.apache.org/jira/browse/YARN-3711
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: YARN-3711.002.patch
>
>
> There should be explanation about webapp address in addition to RPC address.
> AM proxy filter needs explicit definition of 
> {{yarn.resourcemanager.webapp.address._rm-id_}} and/or 
> {{yarn.resourcemanager.webapp.https.address._rm-id_}} to get proper addresses 
> in RM-HA mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3801) [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util

2015-06-12 Thread Tsuyoshi Ozawa (JIRA)
Tsuyoshi Ozawa created YARN-3801:


 Summary: [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client 
and hbase-testing-util
 Key: YARN-3801
 URL: https://issues.apache.org/jira/browse/YARN-3801
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa


timelineservice depends on hbase-client and hbase-testing-util, and they dpend 
on jdk.tools:1.7. This leads to fail to compile hadoop with JDK8.
{quote}
[WARNING] 
Dependency convergence error for jdk.tools:jdk.tools:1.8 paths to dependency 
are:
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
+-jdk.tools:jdk.tools:1.8
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hbase:hbase-client:1.0.1
+-org.apache.hbase:hbase-annotations:1.0.1
  +-jdk.tools:jdk.tools:1.7
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hbase:hbase-testing-util:1.0.1
+-org.apache.hbase:hbase-annotations:1.0.1
  +-jdk.tools:jdk.tools:1.7

[WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
failed with message:
Failed while enforcing releasability the error(s) are [
Dependency convergence error for jdk.tools:jdk.tools:1.8 paths to dependency 
are:
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
+-jdk.tools:jdk.tools:1.8
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hbase:hbase-client:1.0.1
+-org.apache.hbase:hbase-annotations:1.0.1
  +-jdk.tools:jdk.tools:1.7
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hbase:hbase-testing-util:1.0.1
+-org.apache.hbase:hbase-annotations:1.0.1
  +-jdk.tools:jdk.tools:1.7
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3801) [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util

2015-06-12 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3801:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-2928

> [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util
> -
>
> Key: YARN-3801
> URL: https://issues.apache.org/jira/browse/YARN-3801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>
> timelineservice depends on hbase-client and hbase-testing-util, and they 
> dpend on jdk.tools:1.7. This leads to fail to compile hadoop with JDK8.
> {quote}
> [WARNING] 
> Dependency convergence error for jdk.tools:jdk.tools:1.8 paths to dependency 
> are:
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
> +-jdk.tools:jdk.tools:1.8
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-client:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-testing-util:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
> failed with message:
> Failed while enforcing releasability the error(s) are [
> Dependency convergence error for jdk.tools:jdk.tools:1.8 paths to dependency 
> are:
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
> +-jdk.tools:jdk.tools:1.8
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-client:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-testing-util:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3801) [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util

2015-06-12 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3801:
-
Attachment: YARN-3801.001.patch

Attaching a first patch.

> [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util
> -
>
> Key: YARN-3801
> URL: https://issues.apache.org/jira/browse/YARN-3801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Attachments: YARN-3801.001.patch
>
>
> timelineservice depends on hbase-client and hbase-testing-util, and they 
> dpend on jdk.tools:1.7. This leads to fail to compile hadoop with JDK8.
> {quote}
> [WARNING] 
> Dependency convergence error for jdk.tools:jdk.tools:1.8 paths to dependency 
> are:
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
> +-jdk.tools:jdk.tools:1.8
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-client:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-testing-util:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
> failed with message:
> Failed while enforcing releasability the error(s) are [
> Dependency convergence error for jdk.tools:jdk.tools:1.8 paths to dependency 
> are:
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
> +-jdk.tools:jdk.tools:1.8
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-client:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-testing-util:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3801) [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util

2015-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584334#comment-14584334
 ] 

Hadoop QA commented on YARN-3801:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12739379/YARN-3801.001.patch |
| Optional Tests | javadoc javac unit |
| git revision | trunk / eef7b50 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8244/console |


This message was automatically generated.

> [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util
> -
>
> Key: YARN-3801
> URL: https://issues.apache.org/jira/browse/YARN-3801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Attachments: YARN-3801.001.patch
>
>
> timelineservice depends on hbase-client and hbase-testing-util, and they 
> dpend on jdk.tools:1.7. This leads to fail to compile hadoop with JDK8.
> {quote}
> [WARNING] 
> Dependency convergence error for jdk.tools:jdk.tools:1.8 paths to dependency 
> are:
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
> +-jdk.tools:jdk.tools:1.8
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-client:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-testing-util:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
> failed with message:
> Failed while enforcing releasability the error(s) are [
> Dependency convergence error for jdk.tools:jdk.tools:1.8 paths to dependency 
> are:
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
> +-jdk.tools:jdk.tools:1.8
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-client:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-testing-util:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2015-06-12 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584411#comment-14584411
 ] 

Chen He commented on YARN-1680:
---

Thank you, [~cwelch]. Appreciate you assign it back. :)

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Chen He
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3801) [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util

2015-06-12 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584420#comment-14584420
 ] 

Tsuyoshi Ozawa commented on YARN-3801:
--

[~sjlee0] could you take a look?

> [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util
> -
>
> Key: YARN-3801
> URL: https://issues.apache.org/jira/browse/YARN-3801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Attachments: YARN-3801.001.patch
>
>
> timelineservice depends on hbase-client and hbase-testing-util, and they 
> dpend on jdk.tools:1.7. This leads to fail to compile hadoop with JDK8.
> {quote}
> [WARNING] 
> Dependency convergence error for jdk.tools:jdk.tools:1.8 paths to dependency 
> are:
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
> +-jdk.tools:jdk.tools:1.8
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-client:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-testing-util:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
> failed with message:
> Failed while enforcing releasability the error(s) are [
> Dependency convergence error for jdk.tools:jdk.tools:1.8 paths to dependency 
> are:
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
> +-jdk.tools:jdk.tools:1.8
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-client:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-testing-util:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3801) [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util

2015-06-12 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584426#comment-14584426
 ] 

Sean Busbey commented on YARN-3801:
---

What's your timeline? HBase is adding support for JDK8 in the upcoming 1.2 
release line, so we'll get this cleaned up in our code base.

> [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util
> -
>
> Key: YARN-3801
> URL: https://issues.apache.org/jira/browse/YARN-3801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Attachments: YARN-3801.001.patch
>
>
> timelineservice depends on hbase-client and hbase-testing-util, and they 
> dpend on jdk.tools:1.7. This leads to fail to compile hadoop with JDK8.
> {quote}
> [WARNING] 
> Dependency convergence error for jdk.tools:jdk.tools:1.8 paths to dependency 
> are:
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
> +-jdk.tools:jdk.tools:1.8
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-client:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-testing-util:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
> failed with message:
> Failed while enforcing releasability the error(s) are [
> Dependency convergence error for jdk.tools:jdk.tools:1.8 paths to dependency 
> are:
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
> +-jdk.tools:jdk.tools:1.8
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-client:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-testing-util:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3801) [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util

2015-06-12 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584433#comment-14584433
 ] 

Tsuyoshi Ozawa commented on YARN-3801:
--

Currently, timeline service depends on hbase-client and hbase-resting-util 
1.0.1. We can upgrade it after the 1.2.0 release.

> [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util
> -
>
> Key: YARN-3801
> URL: https://issues.apache.org/jira/browse/YARN-3801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Attachments: YARN-3801.001.patch
>
>
> timelineservice depends on hbase-client and hbase-testing-util, and they 
> dpend on jdk.tools:1.7. This leads to fail to compile hadoop with JDK8.
> {quote}
> [WARNING] 
> Dependency convergence error for jdk.tools:jdk.tools:1.8 paths to dependency 
> are:
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
> +-jdk.tools:jdk.tools:1.8
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-client:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-testing-util:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
> failed with message:
> Failed while enforcing releasability the error(s) are [
> Dependency convergence error for jdk.tools:jdk.tools:1.8 paths to dependency 
> are:
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
> +-jdk.tools:jdk.tools:1.8
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-client:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> and
> +-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
>   +-org.apache.hbase:hbase-testing-util:1.0.1
> +-org.apache.hbase:hbase-annotations:1.0.1
>   +-jdk.tools:jdk.tools:1.7
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3798) RM shutdown with NoNode exception while updating appAttempt on zk

2015-06-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584472#comment-14584472
 ] 

Bibin A Chundatt commented on YARN-3798:


[~ozawa] thnk you for looking into this issue. Multiple times the ZK services 
down had happened during crash and transition to standBy and active state on RM 
side. 
{quote}
 expected behaviour can be done by catching NoNodeException
{quote}
Yes we should try to find the root cause of the same. Soon will upload the part 
logs on RM and ZK during this exception. 
[~Naganarasimha] 
{quote}
i think we should retry again before making the job fail?
{quote}
Already we do have retry and timeout for zk . 
*recovery.ZKRMStateStore$ZKAction.runWithRetries*

> RM shutdown with NoNode exception while updating appAttempt on zk
> -
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException