date:20131202


[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836372#comment-13836372
 ] 

qingwu.fu commented on YARN-1458:
-

We have test our suspicious of points, and it doesn't work. We will  focus on 
handle 0 values of ComputeFairShares#computeShares returnning.  How about if it 
return 0 we just count it's weight just as the situation that sizebasedweight 
is true.  That's mean, if the it return 0, we can set it's weight 1.

 hadoop2.2.0 fairscheduler ResourceManager Event Processor thread blocked
 

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
  Labels: patch
   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1019) YarnConfiguration validation for local disk path and http addresses.

2013-12-02 Thread Rekha Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rekha Joshi updated YARN-1019:
--

Attachment: YARN-1019.0.patch

 YarnConfiguration validation for local disk path and http addresses.
 

 Key: YARN-1019
 URL: https://issues.apache.org/jira/browse/YARN-1019
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.5-alpha
Reporter: Omkar Vinit Joshi
Priority: Minor
  Labels: newbie
 Attachments: YARN-1019.0.patch


 Today we are not validating certain configuration parameters set in 
 yarn-site.xml. 1) Configurations related to paths... such as local-dirs, 
 log-dirs.. Our NM crashes during startup if they are set to relative paths 
 rather than absolute paths. To avoid such failures we can enforce checks 
 (absolute paths) before startup . i.e. before we actually startup...( i.e. 
 directory handler creating directories).
 2) Also for all the parameters using hostname:port unless we are ok with 
 default port.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1019) YarnConfiguration validation for local disk path and http addresses.


[ 
https://issues.apache.org/jira/browse/YARN-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836397#comment-13836397
 ] 

Hadoop QA commented on YARN-1019:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12616524/YARN-1019.0.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2567//console

This message is automatically generated.

 YarnConfiguration validation for local disk path and http addresses.
 

 Key: YARN-1019
 URL: https://issues.apache.org/jira/browse/YARN-1019
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.5-alpha
Reporter: Omkar Vinit Joshi
Priority: Minor
  Labels: newbie
 Attachments: YARN-1019.0.patch


 Today we are not validating certain configuration parameters set in 
 yarn-site.xml. 1) Configurations related to paths... such as local-dirs, 
 log-dirs.. Our NM crashes during startup if they are set to relative paths 
 rather than absolute paths. To avoid such failures we can enforce checks 
 (absolute paths) before startup . i.e. before we actually startup...( i.e. 
 directory handler creating directories).
 2) Also for all the parameters using hostname:port unless we are ok with 
 default port.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1390) Provide a way to capture source of an application to be queried through REST or Java Client APIs

2013-12-02 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836410#comment-13836410
 ] 

Steve Loughran commented on YARN-1390:
--

# Some limits on tag size is going to be needed, obviously. If AMs can update 
tag data they can use it as a store of information, which would be convenient 
and dangerous.

# app metadata is visible to all so users need to be reminded to limit what 
they say


 Provide a way to capture source of an application to be queried through REST 
 or Java Client APIs
 

 Key: YARN-1390
 URL: https://issues.apache.org/jira/browse/YARN-1390
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 In addition to other fields like application-type (added in YARN-563), it is 
 useful to have an applicationSource field to track the source of an 
 application. The application source can be useful in (1) fetching only those 
 applications a user is interested in, (2) potentially adding source-specific 
 optimizations in the future. 
 Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop 
 etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1445) Separate FINISHING and FINISHED state in YarnApplicationState

[
https://issues.apache.org/jira/browse/YARN-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836701#comment-13836701
]

Zhijie Shen commented on YARN-1445:
---

1. I agree on the most plans about dealing with FINISHING.

bq. 3.WebAppProxyServlet::doGet(). I think we might need to handle FINISHING as
well as FINISHED.

After RMApp enters FINISHING, the AM has already been unregistered, and
tracking url has already been updated. Therefore, we're able to redirect the
request in this state.

bq. 5.ClientServiceDelegate::getProxy(), we might need to handle
YarnApplicationState.FINISHING, too.

When the application is in FINISHING, the AM has already been unregistered, and
MR job need to be redirected to JHS for the details.

bq. 6.ApplicationCLI::killApplication(). Here is a question mark. We can kill
the AM when the RMApp is at Finishing state, since the AM did not really exist.
Just for DS and MR, when the AM did the unregisterApplicationMaster, that means
the application is finished, so at this time, if we send kill event, it is
meaningless. Here, I just handle the YarnApplicationState.Finishing and
Finished with the same way.

I've the different opinion here. RMAppImpl actually allows killing the app when
it is in FINISHING. Moreover, if we handle FINISHING in the same way as
FINISHED, we will see that printApplicationReport tells the app is still in
FINISHING, while killApplication says the app is finished. Thoughts?

2. TestYarnClient#testSubmitApplication need to be changed accordingly as well.
Would you please double check other test classes where FINISHED is referred,
and check whether the test cases will be broken or not?

3. Is it good to document the difference in detail between FINISHING and
FINISHED?

Separate FINISHING and FINISHED state in YarnApplicationState
-

Key: YARN-1445
URL: https://issues.apache.org/jira/browse/YARN-1445
Project: Hadoop YARN
Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
Attachments: YARN-1445.1.patch, YARN-1445.2.patch

Today, we will transmit both RMAppState.FINISHING and RMAppState.FINISHED to
YarnApplicationState.FINISHED.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


 [ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang reassigned YARN-1463:
---

Assignee: Binglin Chang

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang

 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836713#comment-13836713
 ] 

Binglin Chang commented on YARN-1463:
-

HDFS-5545 introduced this bug: when decide whether to init spnego, original 
code logic is broken

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang

 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


 [ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated YARN-1463:


Attachment: YARN-1463.v1.patch

Attach patch with simple fix, the test can succeed now.

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1390) Provide a way to capture source of an application to be queried through REST or Java Client APIs

2013-12-02 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836723#comment-13836723
 ] 

Alejandro Abdelnur commented on YARN-1390:
--

Agree with Steve, we should limit the length of a tag and number of tags. I'd 
suggest going hardcoded for now, i.e. 50chars/10tags and going configurable 
later if the need arises.

 Provide a way to capture source of an application to be queried through REST 
 or Java Client APIs
 

 Key: YARN-1390
 URL: https://issues.apache.org/jira/browse/YARN-1390
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 In addition to other fields like application-type (added in YARN-563), it is 
 useful to have an applicationSource field to track the source of an 
 application. The application source can be useful in (1) fetching only those 
 applications a user is interested in, (2) potentially adding source-specific 
 optimizations in the future. 
 Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop 
 etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836734#comment-13836734
 ] 

Hadoop QA commented on YARN-1463:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12616579/YARN-1463.v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2568//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2568//console

This message is automatically generated.

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1446) Change killing application to wait until state store is done


[ 
https://issues.apache.org/jira/browse/YARN-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836740#comment-13836740
 ] 

Zhijie Shen commented on YARN-1446:
---

Hm... I see. The patch is generally good. I've the following comments.

1. It's better to use KillApplicationRequest.newInstance
{code}
+KillApplicationRequest req =
+Records.newRecord(KillApplicationRequest.class);
{code}

2. Fix the grammar bellow
{code}
+  // reaches killed state.and also check that attempt state is saved before app
{code}

3. Ago, app can be killed at FINISHING. With the following change in 
ClientRMSerivce, it seems to be no longer applicable.
{code}
+if (application.isAppSafeToTerminate()) {
+  return KillApplicationResponse.newInstance(true);
+}
{code}

4. Instead of logging the killing info every 100ms, how about doing something 
similar in YarnClientImpl#submitApplication?

5. Do you have an estimation on the number of KILLING requests that is sent 
before KILLING is succeeded?

6. Does this ticket a bit overlap YARN-261? After the change, it is actually 
killing the attempt instead of the app, but we doesn't allow retry here.

 Change killing application to wait until state store is done
 

 Key: YARN-1446
 URL: https://issues.apache.org/jira/browse/YARN-1446
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1446.1.patch, YARN-1446.1.patch, YARN-1446.1.patch


 When user kills an application, it should wait until the state store is done 
 with saving the killed status of the application. Otherwise, if RM crashes in 
 the middle between user killing the application and writing the status to the 
 store, RM will relaunch this application after it restarts.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1458) hadoop2.2.0 fairscheduler ResourceManager Event Processor thread blocked

2013-12-02 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836747#comment-13836747
 ] 

Sandy Ryza commented on YARN-1458:
--

If size based weight is turned on and an app has 0 demand, I think giving it 0 
fair share is the correct thing to do.  I.e., if there are two apps and one has 
0 demand, the other app should get the entire share.  We just need to handle 
the special case where all apps in a queue have 0 weight and make it so that 
this does not result in an infinite loop in the computeShares method.

 hadoop2.2.0 fairscheduler ResourceManager Event Processor thread blocked
 

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
  Labels: patch
   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2013-12-02 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1458:
-

Summary: In Fair Scheduler, size based weight can cause update thread to 
hold lock indefinitely  (was: hadoop2.2.0 fairscheduler ResourceManager Event 
Processor thread blocked)

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
  Labels: patch
   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2013-12-02 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1458:
-

Description: 
The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
clients submit lots jobs, it is not easy to reapear. We run the test cluster 
for days to reapear it. The output of  jstack command on resourcemanager pid:
{code}
 ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
waiting for monitor entry [0x43aa9000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
- waiting to lock 0x00070026b6e0 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
……
FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
runnable [0x433a2000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
- locked 0x00070026b6e0 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
- locked 0x00070026b6e0 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
at java.lang.Thread.run(Thread.java:744)
{code}


  was:
The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
clients submit lots jobs, it is not easy to reapear. We run the test cluster 
for days to reapear it. The output of  jstack command on resourcemanager pid:
 ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
waiting for monitor entry [0x43aa9000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
- waiting to lock 0x00070026b6e0 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
……
FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
runnable [0x433a2000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
- locked 0x00070026b6e0 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
at

[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data


[ 
https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836804#comment-13836804
 ] 

Zhijie Shen commented on YARN-967:
--

The aforementioned issues are fixed in the last patch

 [YARN-321] Command Line Interface(CLI) for Reading Application History 
 Storage Data
 ---

 Key: YARN-967
 URL: https://issues.apache.org/jira/browse/YARN-967
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-967-1.patch, YARN-967-10.patch, YARN-967-11.patch, 
 YARN-967-2.patch, YARN-967-3.patch, YARN-967-4.patch, YARN-967-5.patch, 
 YARN-967-6.patch, YARN-967-7.patch, YARN-967-8.patch, YARN-967-9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs


 [ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1462:
---

Summary: AHS API and other AHS changes to handle tags for completed MR jobs 
 (was: AHS API and JHS changes to handle tags for completed MR jobs)

 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla

 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1465) define and add shared constants and utilities for the shared cache

2013-12-02 Thread Sangjin Lee (JIRA)

Sangjin Lee created YARN-1465:
-

 Summary: define and add shared constants and utilities for the 
shared cache
 Key: YARN-1465
 URL: https://issues.apache.org/jira/browse/YARN-1465
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Sangjin Lee
Assignee: Sangjin Lee






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1390) Provide a way to capture source of an application to be queried through REST or Java Client APIs

2013-12-02 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836823#comment-13836823
 ] 

Steve Loughran commented on YARN-1390:
--

oh and restrict the tag names to stuff that works well in URLs

 Provide a way to capture source of an application to be queried through REST 
 or Java Client APIs
 

 Key: YARN-1390
 URL: https://issues.apache.org/jira/browse/YARN-1390
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 In addition to other fields like application-type (added in YARN-563), it is 
 useful to have an applicationSource field to track the source of an 
 application. The application source can be useful in (1) fetching only those 
 applications a user is interested in, (2) potentially adding source-specific 
 optimizations in the future. 
 Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop 
 etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1466) implement the cleaner service for the shared cache

2013-12-02 Thread Sangjin Lee (JIRA)

Sangjin Lee created YARN-1466:
-

 Summary: implement the cleaner service for the shared cache
 Key: YARN-1466
 URL: https://issues.apache.org/jira/browse/YARN-1466
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Sangjin Lee
Assignee: Sangjin Lee






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1467) implement checksum verification for resource localization service for the shared cache

2013-12-02 Thread Sangjin Lee (JIRA)

Sangjin Lee created YARN-1467:
-

 Summary: implement checksum verification for resource localization 
service for the shared cache
 Key: YARN-1467
 URL: https://issues.apache.org/jira/browse/YARN-1467
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Sangjin Lee
Assignee: Sangjin Lee






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags


[ 
https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836829#comment-13836829
 ] 

Karthik Kambatla commented on YARN-1399:


From discussion on YARN-1390:

The tags can be a list of Strings with limits on the number of tags (hardcoded 
to 10 for now) and what goes in the tag (50 characters that behave well in 
URLs). 

 Allow users to annotate an application with multiple tags
 -

 Key: YARN-1399
 URL: https://issues.apache.org/jira/browse/YARN-1399
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Nowadays, when submitting an application, users can fill the applicationType 
 field to facilitate searching it later. IMHO, it's good to accept multiple 
 tags to allow users to describe their applications in multiple aspects, 
 including the application type. Then, searching by tags may be more efficient 
 for users to reach their desired application collection. It's pretty much 
 like the tag system of online photo/video/music and etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1390) Provide a way to capture source of an application to be queried through REST or Java Client APIs


[ 
https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836831#comment-13836831
 ] 

Karthik Kambatla commented on YARN-1390:


Agree with Steve and Alejandro. Copied the gist to YARN-1399.

 Provide a way to capture source of an application to be queried through REST 
 or Java Client APIs
 

 Key: YARN-1390
 URL: https://issues.apache.org/jira/browse/YARN-1390
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 In addition to other fields like application-type (added in YARN-563), it is 
 useful to have an applicationSource field to track the source of an 
 application. The application source can be useful in (1) fetching only those 
 applications a user is interested in, (2) potentially adding source-specific 
 optimizations in the future. 
 Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop 
 etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1445) Separate FINISHING and FINISHED state in YarnApplicationState


 [ 
https://issues.apache.org/jira/browse/YARN-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1445:


Attachment: YARN-1445.3.patch

 Separate FINISHING and FINISHED state in YarnApplicationState
 -

 Key: YARN-1445
 URL: https://issues.apache.org/jira/browse/YARN-1445
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1445.1.patch, YARN-1445.2.patch, YARN-1445.3.patch


 Today, we will transmit both RMAppState.FINISHING and RMAppState.FINISHED to 
 YarnApplicationState.FINISHED.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1445) Separate FINISHING and FINISHED state in YarnApplicationState


[ 
https://issues.apache.org/jira/browse/YARN-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836841#comment-13836841
 ] 

Xuan Gong commented on YARN-1445:
-

bq. I've the different opinion here. RMAppImpl actually allows killing the app 
when it is in FINISHING. Moreover, if we handle FINISHING in the same way as 
FINISHED, we will see that printApplicationReport tells the app is still in 
FINISHING, while killApplication says the app is finished. Thoughts?

Make sense

bq. 2. TestYarnClient#testSubmitApplication need to be changed accordingly as 
well. Would you please double check other test classes where FINISHED is 
referred, and check whether the test cases will be broken or not?

Good catch. Fixed. For other places, I think they are fine. I did the full run 
on all the test for hadoop-yarn project. 

bq. 3. Is it good to document the difference in detail between FINISHING and 
FINISHED

Added

 Separate FINISHING and FINISHED state in YarnApplicationState
 -

 Key: YARN-1445
 URL: https://issues.apache.org/jira/browse/YARN-1445
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1445.1.patch, YARN-1445.2.patch, YARN-1445.3.patch


 Today, we will transmit both RMAppState.FINISHING and RMAppState.FINISHED to 
 YarnApplicationState.FINISHED.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags


[ 
https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836847#comment-13836847
 ] 

Zhijie Shen commented on YARN-1399:
---

I agree we should have the limits for the number of tags and the length of 
them. Either they are configured or hardcoded, IMHO,  we should expose the 
information to users. For example, if the user input a tag which is too long to 
be accepted, RM should return with a suitable exception.

In addition, I think It's also good to regulate the charset that the tag can 
use, avoid users to input some strange characters. Moreover, in general, IMHO, 
user name, queue name, application name and application type should be 
regulated as well. Thoughts.

 Allow users to annotate an application with multiple tags
 -

 Key: YARN-1399
 URL: https://issues.apache.org/jira/browse/YARN-1399
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Nowadays, when submitting an application, users can fill the applicationType 
 field to facilitate searching it later. IMHO, it's good to accept multiple 
 tags to allow users to describe their applications in multiple aspects, 
 including the application type. Then, searching by tags may be more efficient 
 for users to reach their desired application collection. It's pretty much 
 like the tag system of online photo/video/music and etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1446) Change killing application to wait until state store is done

[
https://issues.apache.org/jira/browse/YARN-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836857#comment-13836857
]

Jian He commented on YARN-1446:
---

Thanks zhijie for the review.

bq. Ago, app can be killed at FINISHING. With the following change in
ClientRMSerivce, it seems to be no longer applicable.
Investigate more, I think kill event can be ignored when App is at finishing
state, because attempt is anyways ignoring the kill event at Finishing. Here,
we made a decision that an application that has called unregistered even still
remaining at Final_Saving state is not killable, sounds reasonable ? Updated
the patch accordingly.

bq. Do you have an estimation on the number of KILLING requests that is sent
before KILLING is succeeded?
Experimented on single node cluster, on average it's sending 2 requests. Added
one more check in isAppSafeToTerminate() method that if recovery is not
enabled, just return true.

bq. Does this ticket a bit overlap YARN-261
Just took a quick look at the patch of that jira, that jira may still be
needed, as that jira is actually doing adding a functionality to manually
failing the attempt not killing the attempt.

Fixed other comments also.

Change killing application to wait until state store is done

Key: YARN-1446
URL: https://issues.apache.org/jira/browse/YARN-1446
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
Attachments: YARN-1446.1.patch, YARN-1446.1.patch, YARN-1446.1.patch

When user kills an application, it should wait until the state store is done
with saving the killed status of the application. Otherwise, if RM crashes in
the middle between user killing the application and writing the status to the
store, RM will relaunch this application after it restarts.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1446) Change killing application to wait until state store is done


 [ 
https://issues.apache.org/jira/browse/YARN-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1446:
--

Attachment: YARN-1446.2.patch

 Change killing application to wait until state store is done
 

 Key: YARN-1446
 URL: https://issues.apache.org/jira/browse/YARN-1446
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1446.1.patch, YARN-1446.1.patch, YARN-1446.1.patch, 
 YARN-1446.2.patch


 When user kills an application, it should wait until the state store is done 
 with saving the killed status of the application. Otherwise, if RM crashes in 
 the middle between user killing the application and writing the status to the 
 store, RM will relaunch this application after it restarts.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-291) [Umbrella] Dynamic resource configuration

2013-12-02 Thread Cindy Li (JIRA)

[
https://issues.apache.org/jira/browse/YARN-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836883#comment-13836883
]

Cindy Li commented on YARN-291:
---

Junping, just saw your comments on YARN-999 . I can help on it.
Can you help me understand the use cases/scope of YARN-999 besides graceful
decommission. In the code below:

// TODO process resource over-commitment case (allocated containers
// total capacity) in different option by getting value of
// overCommitTimeoutMillis.

By different options above, do you mean overCommitTimeoutMills 0, = 0, 0 ? I
want to find out more use cases associated with this setting besides graceful
decommission. For example, you mentioned preemption for long running tasks in
YARN-999, is that part of or a different use case from graceful decommission?

Also, about the August patch CoreAndAdmin.patch (in YARN-291) , can you let us
know your plan about it because it seems useful for graceful decommission from
outside of YARN code.

Thanks,

[Umbrella] Dynamic resource configuration
-

Key: YARN-291
URL: https://issues.apache.org/jira/browse/YARN-291
Project: Hadoop YARN
Issue Type: New Feature
Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Labels: features
Attachments: Elastic Resources for YARN-v0.2.pdf,
YARN-291-AddClientRMProtocolToSetNodeResource-03.patch,
YARN-291-CoreAndAdmin.patch, YARN-291-JMXInterfaceOnNM-02.patch,
YARN-291-OnlyUpdateWhenResourceChange-01-fix.patch,
YARN-291-YARNClientCommandline-04.patch, YARN-291-all-v1.patch,
YARN-291-core-HeartBeatAndScheduler-01.patch

The current Hadoop YARN resource management logic assumes per node resource
is static during the lifetime of the NM process. Allowing run-time
configuration on per node resource will give us finer granularity of resource
elasticity. This allows Hadoop workloads to coexist with other workloads on
the same hardware efficiently, whether or not the environment is virtualized.
More background and design details can be found in attached proposal.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1287) Consolidate MockClocks

2013-12-02 Thread Sebastian Wong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836892#comment-13836892
 ] 

Sebastian Wong commented on YARN-1287:
--

Where should the new MockClock class be placed directory wise? 

 Consolidate MockClocks
 --

 Key: YARN-1287
 URL: https://issues.apache.org/jira/browse/YARN-1287
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
  Labels: newbie

 A bunch of different tests have near-identical implementations of MockClock.  
 TestFairScheduler, TestFSSchedulerApp, and TestCgroupsLCEResourcesHandler for 
 example.  They should be consolidated into a single MockClock.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy


[ 
https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836894#comment-13836894
 ] 

Xuan Gong commented on YARN-1028:
-

bq. It might appear so, but the actual wait time is controlled by 
ipc.client.connect.max.retries, which is 10 seconds by default. Verified it on 
a cluster.

Yes, you are right. It is controlled by ipc.client.connect.max.retries in this 
case. 

 Add FailoverProxyProvider like capability to RMProxy
 

 Key: YARN-1028
 URL: https://issues.apache.org/jira/browse/YARN-1028
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: yarn-1028-1.patch, yarn-1028-draft-cumulative.patch


 RMProxy layer currently abstracts RM discovery and implements it by looking 
 up service information from configuration. Motivated by HDFS and using 
 existing classes from Common, we can add failover proxy providers that may 
 provide RM discovery in extensible ways.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1446) Change killing application to wait until state store is done


[ 
https://issues.apache.org/jira/browse/YARN-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836918#comment-13836918
 ] 

Hadoop QA commented on YARN-1446:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12616610/YARN-1446.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2570//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2570//console

This message is automatically generated.

 Change killing application to wait until state store is done
 

 Key: YARN-1446
 URL: https://issues.apache.org/jira/browse/YARN-1446
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1446.1.patch, YARN-1446.1.patch, YARN-1446.1.patch, 
 YARN-1446.2.patch


 When user kills an application, it should wait until the state store is done 
 with saving the killed status of the application. Otherwise, if RM crashes in 
 the middle between user killing the application and writing the status to the 
 store, RM will relaunch this application after it restarts.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1442) change yarn minicluster base directory via system property

2013-12-02 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836941#comment-13836941
 ] 

Mark Miller commented on YARN-1442:
---

+1 - The Apache Solr project runs Yarn in it's tests and currently has to 
duplicate a bunch of Yarn mini cluster code to work around this issue.

 change yarn minicluster base directory via system property
 --

 Key: YARN-1442
 URL: https://issues.apache.org/jira/browse/YARN-1442
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.2.0
Reporter: André Kelpe
Priority: Minor
 Attachments: HADOOP-10122.patch


 The yarn minicluster used for testing uses the target directory by default. 
 We use gradle for building our projects and we would like to see it using a 
 different directory. This patch makes it possible to use a different 
 directory by setting the yarn.minicluster.directory system property.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-895) RM crashes if it restarts while the state-store is down

2013-12-02 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-895:
-

Summary: RM crashes if it restarts while the state-store is down  (was: RM 
crashes if it restarts while NameNode is in safe mode)

 RM crashes if it restarts while the state-store is down
 ---

 Key: YARN-895
 URL: https://issues.apache.org/jira/browse/YARN-895
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, 
 YARN-895.3.patch, YARN-895.3.patch, YARN-895.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1445) Separate FINISHING and FINISHED state in YarnApplicationState


[ 
https://issues.apache.org/jira/browse/YARN-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836960#comment-13836960
 ] 

Hadoop QA commented on YARN-1445:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12616603/YARN-1445.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site:

  org.apache.hadoop.mapreduce.security.TestJHSSecurity

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2569//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2569//console

This message is automatically generated.

 Separate FINISHING and FINISHED state in YarnApplicationState
 -

 Key: YARN-1445
 URL: https://issues.apache.org/jira/browse/YARN-1445
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1445.1.patch, YARN-1445.2.patch, YARN-1445.3.patch


 Today, we will transmit both RMAppState.FINISHING and RMAppState.FINISHED to 
 YarnApplicationState.FINISHED.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1459) Handle supergroups, usergroups and ACLs across RMs during failover

2013-12-02 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836966#comment-13836966
 ] 

Vinod Kumar Vavilapalli commented on YARN-1459:
---

As I was trying to indicate 
[here|https://issues.apache.org/jira/browse/YARN-1318?focusedCommentId=13834101page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13834101],
 we may have to think about completely moving them off the local disk, but it 
radically changes the operator workflow. Today admins edit those files 
separately, we'll have to move towards CLI tools completely for this to happen.

 Handle supergroups, usergroups and ACLs across RMs during failover
 --

 Key: YARN-1459
 URL: https://issues.apache.org/jira/browse/YARN-1459
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla

 The supergroups, usergroups and ACL configurations are per RM and might have 
 been changed while the RM is running. After failing over, the new Active RM 
 should have the latest configuration from the previously Active RM.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags


[ 
https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836975#comment-13836975
 ] 

Karthik Kambatla commented on YARN-1399:


bq.  Moreover, in general, IMHO, user name, queue name, application name and 
application type should be regulated as well.
This would be an incompatible change, and we should probably avoid it if 
possible. This brings up another interesting issue of handling applicationTypes 
as a special kind of tags when we get to that.

 Allow users to annotate an application with multiple tags
 -

 Key: YARN-1399
 URL: https://issues.apache.org/jira/browse/YARN-1399
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Nowadays, when submitting an application, users can fill the applicationType 
 field to facilitate searching it later. IMHO, it's good to accept multiple 
 tags to allow users to describe their applications in multiple aspects, 
 including the application type. Then, searching by tags may be more efficient 
 for users to reach their desired application collection. It's pretty much 
 like the tag system of online photo/video/music and etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags

2013-12-02 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836980#comment-13836980
 ] 

Alejandro Abdelnur commented on YARN-1399:
--

What is the concern for a tag being a valid unicode string? If queried via rest 
API the values would be urlencoded, thus no harm.

 Allow users to annotate an application with multiple tags
 -

 Key: YARN-1399
 URL: https://issues.apache.org/jira/browse/YARN-1399
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Nowadays, when submitting an application, users can fill the applicationType 
 field to facilitate searching it later. IMHO, it's good to accept multiple 
 tags to allow users to describe their applications in multiple aspects, 
 including the application type. Then, searching by tags may be more efficient 
 for users to reach their desired application collection. It's pretty much 
 like the tag system of online photo/video/music and etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836987#comment-13836987
 ] 

Haohui Mai commented on YARN-1463:
--

Can you please explain why it is broken? -- Jenkins does not complain at 
HDFS-5545.

I don't quite get what this patch changes -- it seems to me that the same case 
is covered by HttpServer#initSpnego().



 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1318) Promote AdminService to an Always-On service and merge in RMHAProtocolService

2013-12-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836999#comment-13836999
 ] 

Hudson commented on YARN-1318:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4817 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4817/])
YARN-1318. Promoted AdminService to an Always-On service and merged it into 
RMHAProtocolService. Contributed by Karthik Kambatla. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1547212)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/RMNotYetActiveException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMHAServiceTarget.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAProtocolService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/authorize/RMPolicyProvider.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


 Promote AdminService to an Always-On service and merge in RMHAProtocolService
 -

 Key: YARN-1318
 URL: https://issues.apache.org/jira/browse/YARN-1318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
  Labels: ha
 Fix For: 2.4.0

 Attachments: yarn-1318-0.patch, yarn-1318-1.patch, yarn-1318-2.patch, 
 yarn-1318-2.patch, yarn-1318-3.patch, yarn-1318-4.patch, yarn-1318-4.patch, 
 yarn-1318-5.patch, yarn-1318-6.patch


 Per discussion in YARN-1068, we want AdminService to handle HA-admin 
 operations in addition to the regular non-HA admin operations. To facilitate 
 this, we need to move AdminService an Always-On service. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data


[ 
https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837002#comment-13837002
 ] 

Mayank Bansal commented on YARN-967:


[~vinodkv] sorry missed your comments

Attaching latest patch,

Thanks,
Mayank


 [YARN-321] Command Line Interface(CLI) for Reading Application History 
 Storage Data
 ---

 Key: YARN-967
 URL: https://issues.apache.org/jira/browse/YARN-967
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-967-1.patch, YARN-967-10.patch, YARN-967-11.patch, 
 YARN-967-12.patch, YARN-967-2.patch, YARN-967-3.patch, YARN-967-4.patch, 
 YARN-967-5.patch, YARN-967-6.patch, YARN-967-7.patch, YARN-967-8.patch, 
 YARN-967-9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data


 [ 
https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-967:
---

Attachment: YARN-967-12.patch

 [YARN-321] Command Line Interface(CLI) for Reading Application History 
 Storage Data
 ---

 Key: YARN-967
 URL: https://issues.apache.org/jira/browse/YARN-967
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-967-1.patch, YARN-967-10.patch, YARN-967-11.patch, 
 YARN-967-12.patch, YARN-967-2.patch, YARN-967-3.patch, YARN-967-4.patch, 
 YARN-967-5.patch, YARN-967-6.patch, YARN-967-7.patch, YARN-967-8.patch, 
 YARN-967-9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data


[ 
https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837011#comment-13837011
 ] 

Hadoop QA commented on YARN-967:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12616622/YARN-967-12.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2571//console

This message is automatically generated.

 [YARN-321] Command Line Interface(CLI) for Reading Application History 
 Storage Data
 ---

 Key: YARN-967
 URL: https://issues.apache.org/jira/browse/YARN-967
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-967-1.patch, YARN-967-10.patch, YARN-967-11.patch, 
 YARN-967-12.patch, YARN-967-2.patch, YARN-967-3.patch, YARN-967-4.patch, 
 YARN-967-5.patch, YARN-967-6.patch, YARN-967-7.patch, YARN-967-8.patch, 
 YARN-967-9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags

[
https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837020#comment-13837020
]

Zhijie Shen commented on YARN-1399:
---

bq. What is the concern for a tag being a valid unicode string? If queried via
rest API the values would be urlencoded, thus no harm.

For example, do we want to support the multiple words in a tag, such as
distributed systems? It doesn't have the problem when we do exact match for
searching via tags? However, if we want somewhat fuzzy match, we may need to
take care of splitting word.

For user/queue/applicationType, we may want to them be lowercase/uppercase (or
be converted to lowercase/uppercase), thus being insensitive.

Also, it's good to ignore some characters, such as ?!/={} and etc. Thoughts?

Allow users to annotate an application with multiple tags
-

Key: YARN-1399
URL: https://issues.apache.org/jira/browse/YARN-1399
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen

Nowadays, when submitting an application, users can fill the applicationType
field to facilitate searching it later. IMHO, it's good to accept multiple
tags to allow users to describe their applications in multiple aspects,
including the application type. Then, searching by tags may be more efficient
for users to reach their desired application collection. It's pretty much
like the tag system of online photo/video/music and etc.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-895) RM crashes if it restarts while the state-store is down


[ 
https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837027#comment-13837027
 ] 

Jian He commented on YARN-895:
--

Fixed the comments
bq. Test: In the HDFS test, you don't wait for any time at all for the client 
to get exceptions?
clientThread.join() is called for waiting client to get exceptions, the test 
fails if retry is disabled. and pass if retry is enabled.

 RM crashes if it restarts while the state-store is down
 ---

 Key: YARN-895
 URL: https://issues.apache.org/jira/browse/YARN-895
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, 
 YARN-895.3.patch, YARN-895.3.patch, YARN-895.4.patch, YARN-895.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-895) RM crashes if it restarts while the state-store is down


 [ 
https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-895:
-

Attachment: YARN-895.4.patch

 RM crashes if it restarts while the state-store is down
 ---

 Key: YARN-895
 URL: https://issues.apache.org/jira/browse/YARN-895
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, 
 YARN-895.3.patch, YARN-895.3.patch, YARN-895.4.patch, YARN-895.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (YARN-1147) Add end-to-end tests for HA


 [ 
https://issues.apache.org/jira/browse/YARN-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1147:
---

Assignee: Xuan Gong

 Add end-to-end tests for HA
 ---

 Key: YARN-1147
 URL: https://issues.apache.org/jira/browse/YARN-1147
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.4.0


 While individual sub-tasks add tests for the code they include, it will be 
 handy to write end-to-end tests for HA including some stress testing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (YARN-1459) Handle supergroups, usergroups and ACLs across RMs during failover


 [ 
https://issues.apache.org/jira/browse/YARN-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1459:
---

Assignee: Xuan Gong

 Handle supergroups, usergroups and ACLs across RMs during failover
 --

 Key: YARN-1459
 URL: https://issues.apache.org/jira/browse/YARN-1459
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong

 The supergroups, usergroups and ACL configurations are per RM and might have 
 been changed while the RM is running. After failing over, the new Active RM 
 should have the latest configuration from the previously Active RM.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (YARN-1325) RMHAProtocolService#serviceInit should check configuration contains multiple RM


 [ 
https://issues.apache.org/jira/browse/YARN-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1325:
---

Assignee: Xuan Gong

 RMHAProtocolService#serviceInit should check configuration contains multiple 
 RM
 ---

 Key: YARN-1325
 URL: https://issues.apache.org/jira/browse/YARN-1325
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Xuan Gong
  Labels: ha

 Currently, we can enable RM HA configuration without multiple RM 
 ids(YarnConfiguration.RM_HA_IDS).  This behaviour can cause wrong operations. 
 ResourceManager should verify that more than 1 RM id must be specified in 
 RM-HA-IDs.
 One idea is to support strict mode to enforce this check as 
 configuration(e.g. yarn.resourcemanager.ha.strict-mode.enabled).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (YARN-1410) Handle client failover during 2 step client API's like app submission


 [ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1410:
---

Assignee: Xuan Gong

 Handle client failover during 2 step client API's like app submission
 -

 Key: YARN-1410
 URL: https://issues.apache.org/jira/browse/YARN-1410
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong

 App submission involves
 1) creating appId
 2) using that appId to submit an ApplicationSubmissionContext to the user.
 The client may have obtained an appId from an RM, the RM may have failed 
 over, and the client may submit the app to the new RM.
 Since the new RM has a different notion of cluster timestamp (used to create 
 app id) the new RM may reject the app submission resulting in unexpected 
 failure on the client side.
 The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (YARN-1125) Add shutdown support to non-service RM components


 [ 
https://issues.apache.org/jira/browse/YARN-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1125:
---

Assignee: Xuan Gong

 Add shutdown support to non-service RM components
 -

 Key: YARN-1125
 URL: https://issues.apache.org/jira/browse/YARN-1125
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Xuan Gong

 The ResourceManager has certain non-service components like the Scheduler. 
 While transitioning to standby, these components should be completely turned 
 off. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags

2013-12-02 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837031#comment-13837031
 ] 

Alejandro Abdelnur commented on YARN-1399:
--

I would stick to exact tag match. Case insensitive seems reasonable, though I 
would implement it by lowercase or upper case tags on arrival and when 
querying. Then the matching is the cheapest.

Regarding symbols, what is the harm in supporting them?

One thing we didn't mentioned before, on querying I would support only OR, then 
the client must do any further filtering if it wants to do AND.

 Allow users to annotate an application with multiple tags
 -

 Key: YARN-1399
 URL: https://issues.apache.org/jira/browse/YARN-1399
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Nowadays, when submitting an application, users can fill the applicationType 
 field to facilitate searching it later. IMHO, it's good to accept multiple 
 tags to allow users to describe their applications in multiple aspects, 
 including the application type. Then, searching by tags may be more efficient 
 for users to reach their desired application collection. It's pretty much 
 like the tag system of online photo/video/music and etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails

2013-12-02 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837040#comment-13837040
 ] 

Vinod Kumar Vavilapalli commented on YARN-1463:
---

After YARN-1318, the exception message reported is 
{code}
2013-12-02 22:49:34,492 INFO  [Thread-322] service.AbstractService 
(AbstractService.java:noteFailure(272)) - Service RMActiveServices failed in 
state STARTED; cause: java.lang.NullPointerException
java.lang.NullPointerException
at java.util.Hashtable.get(Hashtable.java:334)
at java.util.Properties.getProperty(Properties.java:932)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:874)
at org.apache.hadoop.http.HttpServer.initSpnego(HttpServer.java:892)
at org.apache.hadoop.http.HttpServer.access$100(HttpServer.java:101)
at org.apache.hadoop.http.HttpServer$Builder.build(HttpServer.java:323)
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:232)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:826)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:477)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:850)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:205)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:118)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:880)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
{code}

Haohui/Binglin, can you see if this can be fixed in common itself? If that is 
the case, we can avoid these YARN specific changes.

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely


[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837050#comment-13837050
 ] 

qingwu.fu commented on YARN-1458:
-

Thanks Sandy. We were confused by your point that   If it returns 0 we should 
just set the fair shares of all the considered schedulables to 0.. In our 
understanding, you suggested to set all app's weight to 0 when one app's weight 
is 0. So we proposed the idea above. 
But now we agree with the point that If size based weight is turned on and an 
app has 0 demand, I think giving it 0 fair share is the correct thing to do..  
It's more precise to the principle of FairShare.

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
  Labels: patch
   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1181) Augment MiniYARNCluster to support HA mode


[ 
https://issues.apache.org/jira/browse/YARN-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837051#comment-13837051
 ] 

Karthik Kambatla commented on YARN-1181:


The failing tests are unrelated - YARN-1463 and YARN-1464 respectively.


 Augment MiniYARNCluster to support HA mode
 --

 Key: YARN-1181
 URL: https://issues.apache.org/jira/browse/YARN-1181
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1181-1.patch, yarn-1181-2.patch, yarn-1181-3.patch


 MiniYARNHACluster, along the lines of MiniYARNCluster, is needed for 
 end-to-end HA tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-895) RM crashes if it restarts while the state-store is down


 [ 
https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-895:
-

Attachment: YARN-895.4.patch

Missed the change in yarn-default.xml

 RM crashes if it restarts while the state-store is down
 ---

 Key: YARN-895
 URL: https://issues.apache.org/jira/browse/YARN-895
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, 
 YARN-895.3.patch, YARN-895.3.patch, YARN-895.4.patch, YARN-895.4.patch, 
 YARN-895.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data


[ 
https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837054#comment-13837054
 ] 

Mayank Bansal commented on YARN-967:


Thanks [~vinodkv] for review.

Updated the java docs.

Thanks,
Mayank

 [YARN-321] Command Line Interface(CLI) for Reading Application History 
 Storage Data
 ---

 Key: YARN-967
 URL: https://issues.apache.org/jira/browse/YARN-967
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-967-1.patch, YARN-967-10.patch, YARN-967-11.patch, 
 YARN-967-12.patch, YARN-967-2.patch, YARN-967-3.patch, YARN-967-4.patch, 
 YARN-967-5.patch, YARN-967-6.patch, YARN-967-7.patch, YARN-967-8.patch, 
 YARN-967-9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data


 [ 
https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-967:
---

Attachment: YARN-967-13.patch

 [YARN-321] Command Line Interface(CLI) for Reading Application History 
 Storage Data
 ---

 Key: YARN-967
 URL: https://issues.apache.org/jira/browse/YARN-967
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-967-1.patch, YARN-967-10.patch, YARN-967-11.patch, 
 YARN-967-12.patch, YARN-967-13.patch, YARN-967-2.patch, YARN-967-3.patch, 
 YARN-967-4.patch, YARN-967-5.patch, YARN-967-6.patch, YARN-967-7.patch, 
 YARN-967-8.patch, YARN-967-9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-895) RM crashes if it restarts while the state-store is down


[ 
https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837059#comment-13837059
 ] 

Hadoop QA commented on YARN-895:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12616625/YARN-895.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2572//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2572//console

This message is automatically generated.

 RM crashes if it restarts while the state-store is down
 ---

 Key: YARN-895
 URL: https://issues.apache.org/jira/browse/YARN-895
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, 
 YARN-895.3.patch, YARN-895.3.patch, YARN-895.4.patch, YARN-895.4.patch, 
 YARN-895.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data


[ 
https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837057#comment-13837057
 ] 

Hadoop QA commented on YARN-967:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12616629/YARN-967-13.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2574//console

This message is automatically generated.

 [YARN-321] Command Line Interface(CLI) for Reading Application History 
 Storage Data
 ---

 Key: YARN-967
 URL: https://issues.apache.org/jira/browse/YARN-967
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-967-1.patch, YARN-967-10.patch, YARN-967-11.patch, 
 YARN-967-12.patch, YARN-967-13.patch, YARN-967-2.patch, YARN-967-3.patch, 
 YARN-967-4.patch, YARN-967-5.patch, YARN-967-6.patch, YARN-967-7.patch, 
 YARN-967-8.patch, YARN-967-9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837064#comment-13837064
 ] 

Haohui Mai commented on YARN-1463:
--

After the discussion with [~vinodkv], this is due to the code does conf.get() 
for spnegokey / keytabkey for twice.

The following patch should fix the problem:

{code}
 if (hasSpnegoConf) {
-  builder.setUsernameConfKey(conf.get(spnegoPrincipalKey))
-  .setKeytabConfKey(conf.get(spnegoKeytabKey))
+  builder.setUsernameConfKey(spnegoPrincipalKey)
+  .setKeytabConfKey(spnegoKeytabKey)
   .setSecurityEnabled(UserGroupInformation.isSecurityEnabled());
 }
{code}

[~decster], I believe that the null pointer checks are redundant as HttpServer 
has already covered them.

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-895) RM crashes if it restarts while the state-store is down


[ 
https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837082#comment-13837082
 ] 

Hadoop QA commented on YARN-895:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12616628/YARN-895.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2573//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2573//console

This message is automatically generated.

 RM crashes if it restarts while the state-store is down
 ---

 Key: YARN-895
 URL: https://issues.apache.org/jira/browse/YARN-895
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, 
 YARN-895.3.patch, YARN-895.3.patch, YARN-895.4.patch, YARN-895.4.patch, 
 YARN-895.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


 [ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated YARN-1463:
-

Attachment: YARN-1463.000.patch

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.000.patch, YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-895) RM crashes if it restarts while the state-store is down


 [ 
https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-895:
-

Attachment: YARN-895.5.patch

Misundertood the comment.
Update the patch to sleep some time for client to get exceptions

 RM crashes if it restarts while the state-store is down
 ---

 Key: YARN-895
 URL: https://issues.apache.org/jira/browse/YARN-895
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, 
 YARN-895.3.patch, YARN-895.3.patch, YARN-895.4.patch, YARN-895.4.patch, 
 YARN-895.5.patch, YARN-895.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1181) Augment MiniYARNCluster to support HA mode


 [ 
https://issues.apache.org/jira/browse/YARN-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1181:
---

Attachment: yarn-1181-4.patch

Rebased on trunk post YARN-1318.

 Augment MiniYARNCluster to support HA mode
 --

 Key: YARN-1181
 URL: https://issues.apache.org/jira/browse/YARN-1181
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1181-1.patch, yarn-1181-2.patch, yarn-1181-3.patch, 
 yarn-1181-4.patch


 MiniYARNHACluster, along the lines of MiniYARNCluster, is needed for 
 end-to-end HA tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy


[ 
https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837144#comment-13837144
 ] 

Karthik Kambatla commented on YARN-1028:


In RMProxy, we build an exceptionPolicyMap and handle a couple of Exceptions. 
Is there a particular reason for this? In other words, are there any Exceptions 
we don't want the default retryPolicy to handle? 
{code}
MapClass? extends Exception, RetryPolicy exceptionToPolicyMap =
new HashMapClass? extends Exception, RetryPolicy();
exceptionToPolicyMap.put(ConnectException.class, retryPolicy);
//TO DO: after HADOOP-9576,  IOException can be changed to EOFException
exceptionToPolicyMap.put(IOException.class, retryPolicy);

return RetryPolicies.retryByException(RetryPolicies.TRY_ONCE_THEN_FAIL,
  exceptionToPolicyMap);
{code}

In the context of this JIRA, we have one RetryPolicy for the HA case and 
another for the non-HA case. We ll probably have to add different exceptions 
based on whether HA is enabled or not. Wondering if it is really required. 
[~xgong], [~jianhe] - thoughts?

 Add FailoverProxyProvider like capability to RMProxy
 

 Key: YARN-1028
 URL: https://issues.apache.org/jira/browse/YARN-1028
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: yarn-1028-1.patch, yarn-1028-draft-cumulative.patch


 RMProxy layer currently abstracts RM discovery and implements it by looking 
 up service information from configuration. Motivated by HDFS and using 
 existing classes from Common, we can add failover proxy providers that may 
 provide RM discovery in extensible ways.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1125) Add shutdown support to non-service RM components


[ 
https://issues.apache.org/jira/browse/YARN-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837179#comment-13837179
 ] 

Tsuyoshi OZAWA commented on YARN-1125:
--

[~xgong], could you wait for taking this? Before doing this JIRA, we need to 
deal with YARN-1139, YARN-1172, HADOOP-10043. I'm waiting for the review. 

[~kkambatl], could you help me to advance HADOOP-10043?

 Add shutdown support to non-service RM components
 -

 Key: YARN-1125
 URL: https://issues.apache.org/jira/browse/YARN-1125
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Xuan Gong

 The ResourceManager has certain non-service components like the Scheduler. 
 While transitioning to standby, these components should be completely turned 
 off. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1181) Augment MiniYARNCluster to support HA mode


[ 
https://issues.apache.org/jira/browse/YARN-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837183#comment-13837183
 ] 

Hadoop QA commented on YARN-1181:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12616645/yarn-1181-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  org.apache.hadoop.yarn.server.TestContainerManagerSecurity
  org.apache.hadoop.yarn.server.TestRMNMSecretKeys

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2575//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2575//console

This message is automatically generated.

 Augment MiniYARNCluster to support HA mode
 --

 Key: YARN-1181
 URL: https://issues.apache.org/jira/browse/YARN-1181
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1181-1.patch, yarn-1181-2.patch, yarn-1181-3.patch, 
 yarn-1181-4.patch


 MiniYARNHACluster, along the lines of MiniYARNCluster, is needed for 
 end-to-end HA tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837185#comment-13837185
 ] 

Hadoop QA commented on YARN-1463:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12616643/YARN-1463.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2576//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2576//console

This message is automatically generated.

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.000.patch, YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1301) Need to log the blacklist additions/removals when YarnSchedule#allocate


 [ 
https://issues.apache.org/jira/browse/YARN-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1301:
-

Attachment: YARN-1301.5.patch

Sorry for delay, updated a patch to check whether blacklist additions/removals 
are null

 Need to log the blacklist additions/removals when YarnSchedule#allocate
 ---

 Key: YARN-1301
 URL: https://issues.apache.org/jira/browse/YARN-1301
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Fix For: 2.4.0

 Attachments: YARN-1301.1.patch, YARN-1301.2.patch, YARN-1301.3.patch, 
 YARN-1301.4.patch, YARN-1301.5.patch


 Now without the log, it's hard to debug whether blacklist is updated on the 
 scheduler side or not



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA


[ 
https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837200#comment-13837200
 ] 

Tsuyoshi OZAWA commented on YARN-1307:
--

*ping* any comments are welcome.

 Rethink znode structure for RM HA
 -

 Key: YARN-1307
 URL: https://issues.apache.org/jira/browse/YARN-1307
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1307.1.patch, YARN-1307.2.patch, YARN-1307.3.patch, 
 YARN-1307.4-2.patch, YARN-1307.4-3.patch, YARN-1307.4.patch, 
 YARN-1307.5.patch, YARN-1307.6.patch, YARN-1307.7.patch, YARN-1307.8.patch


 Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, 
 YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in 
 YARN-1222:
 {quote}
 We should move to creating a node hierarchy for apps such that all znodes for 
 an app are stored under an app znode instead of the app root znode. This will 
 help in removeApplication and also in scaling better on ZK. The earlier code 
 was written this way to ensure create/delete happens under a root znode for 
 fencing. But given that we have moved to multi-operations globally, this isnt 
 required anymore.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-895) RM crashes if it restarts while the state-store is down


[ 
https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837201#comment-13837201
 ] 

Hadoop QA commented on YARN-895:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12616644/YARN-895.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2577//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2577//console

This message is automatically generated.

 RM crashes if it restarts while the state-store is down
 ---

 Key: YARN-895
 URL: https://issues.apache.org/jira/browse/YARN-895
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, 
 YARN-895.3.patch, YARN-895.3.patch, YARN-895.4.patch, YARN-895.4.patch, 
 YARN-895.5.patch, YARN-895.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy


[ 
https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837211#comment-13837211
 ] 

Jian He commented on YARN-1028:
---

The assumption was to retry some connection related exceptions, maybe later on 
some other type of exceptions. I can find one exception 
ApplicationNotFoundException, which should not be retried by the client.

 Add FailoverProxyProvider like capability to RMProxy
 

 Key: YARN-1028
 URL: https://issues.apache.org/jira/browse/YARN-1028
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: yarn-1028-1.patch, yarn-1028-draft-cumulative.patch


 RMProxy layer currently abstracts RM discovery and implements it by looking 
 up service information from configuration. Motivated by HDFS and using 
 existing classes from Common, we can add failover proxy providers that may 
 provide RM discovery in extensible ways.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-895) RM crashes if it restarts while the state-store is down


[ 
https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837212#comment-13837212
 ] 

Jian He commented on YARN-895:
--

test failure not related


 RM crashes if it restarts while the state-store is down
 ---

 Key: YARN-895
 URL: https://issues.apache.org/jira/browse/YARN-895
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, 
 YARN-895.3.patch, YARN-895.3.patch, YARN-895.4.patch, YARN-895.4.patch, 
 YARN-895.5.patch, YARN-895.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy


[ 
https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837218#comment-13837218
 ] 

Xuan Gong commented on YARN-1028:
-

Yes, I agree with [~jianhe]. We did make such assumption. Basically, whether or 
which retry policy it will choose is based on the exceptions. In HA case, I 
think that we do not need to wrap with RetryPolicies.retryByException. Just 
directly return with RetryPolicies.failoverOnNetworkException should be enough. 
Otherwise, we will only retry for connectionException and IOException. But if 
we directly use RetryPolicies.failoverOnNetworkException, it will consider much 
more exceptions. Take a look at FailoverOnNetworkExceptionRetry::shouldRetry(). 
we can find more exceptions that this retryPolicy can handle.

 Add FailoverProxyProvider like capability to RMProxy
 

 Key: YARN-1028
 URL: https://issues.apache.org/jira/browse/YARN-1028
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: yarn-1028-1.patch, yarn-1028-draft-cumulative.patch


 RMProxy layer currently abstracts RM discovery and implements it by looking 
 up service information from configuration. Motivated by HDFS and using 
 existing classes from Common, we can add failover proxy providers that may 
 provide RM discovery in extensible ways.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1301) Need to log the blacklist additions/removals when YarnSchedule#allocate


[ 
https://issues.apache.org/jira/browse/YARN-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837222#comment-13837222
 ] 

Hadoop QA commented on YARN-1301:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12616668/YARN-1301.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2578//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2578//console

This message is automatically generated.

 Need to log the blacklist additions/removals when YarnSchedule#allocate
 ---

 Key: YARN-1301
 URL: https://issues.apache.org/jira/browse/YARN-1301
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Fix For: 2.4.0

 Attachments: YARN-1301.1.patch, YARN-1301.2.patch, YARN-1301.3.patch, 
 YARN-1301.4.patch, YARN-1301.5.patch


 Now without the log, it's hard to debug whether blacklist is updated on the 
 scheduler side or not



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-291) [Umbrella] Dynamic resource configuration

2013-12-02 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837238#comment-13837238
 ] 

Junping Du commented on YARN-291:
-

 Junping, just saw your comments on YARN-999 . I can help on it.
Thanks! I plan to finish option without timeout in Dec, so it would be great 
for you to help on timeout part.
By different options above, do you mean overCommitTimeoutMills  0, = 0, 0 ? 
I want to find out more use cases associated with this setting besides 
graceful decommission. For example, you mentioned preemption for long running 
tasks in YARN-999, is that part of or a different use case from graceful 
decommission?
Yes. overCommitTimeoutMills value sets different options here. 0 (or just -1) 
means we tolerant tasks running to the end even under resource over-consumed 
cases; =0 means we only tolerant a few time specified in 
overCommitTimeoutMills. Once timeout, we do aggressive ways (i.e. preemption on 
assigned containers with frozen or kill tasks) to reclaim resources so that 
NM's resource can get it balanced again. Graceful decommission is just a 
special case for this where we always set NM's totalResource to 0 first, so all 
assigned containers will get released after a timeout (except timeout = -1). If 
we can set a proper timeout value here, then it will get chance for NM to 
finish running tasks with intermediate map output get retrieval before 
decommissioned and that's why we call it graceful.  
Also, about the August patch CoreAndAdmin.patch (in YARN-291) , can you let us 
know your plan about it because it seems useful for graceful 
 decommission from outside of YARN code.
Most of patches are on the track. YARN-311 (core changes) get checked in, 
YARN-312 (RPC) get reviewed with +1. Will be there soon.
Cheers,


 [Umbrella] Dynamic resource configuration
 -

 Key: YARN-291
 URL: https://issues.apache.org/jira/browse/YARN-291
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: Elastic Resources for YARN-v0.2.pdf, 
 YARN-291-AddClientRMProtocolToSetNodeResource-03.patch, 
 YARN-291-CoreAndAdmin.patch, YARN-291-JMXInterfaceOnNM-02.patch, 
 YARN-291-OnlyUpdateWhenResourceChange-01-fix.patch, 
 YARN-291-YARNClientCommandline-04.patch, YARN-291-all-v1.patch, 
 YARN-291-core-HeartBeatAndScheduler-01.patch


 The current Hadoop YARN resource management logic assumes per node resource 
 is static during the lifetime of the NM process. Allowing run-time 
 configuration on per node resource will give us finer granularity of resource 
 elasticity. This allows Hadoop workloads to coexist with other workloads on 
 the same hardware efficiently, whether or not the environment is virtualized. 
 More background and design details can be found in attached proposal.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely


 [ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qingwu.fu updated YARN-1458:


Attachment: YARN-1458.patch

In the Fair Scheduler, if size based weight is turned on, it will lead to 
endless loop in ComputeFairShares.computeShares (ComputeFairShares.java:102) 
that if all app's require resource in one queue is 0.
This patch deals with that situation, we let the program jump out of the loop 
if all app's require resource of one queue is 0. That means set that queue's 
require resource to 0

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837251#comment-13837251
 ] 

Binglin Chang commented on YARN-1463:
-

Hi Haohui, 
I originally did the same as your patch did, but it still failed with other 
errors on my Macbook pro.
So I add more checks, just as the original code did, and it now passed.
{code}
Running org.apache.hadoop.yarn.server.TestContainerManagerSecurity
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 26.663 sec  
FAILURE! - in org.apache.hadoop.yarn.server.TestContainerManagerSecurity
testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
  Time elapsed: 1.735 sec   ERROR!
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
ResourceManager failed to start. Final state is STOPPED
at 
org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
{code}

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.000.patch, YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837257#comment-13837257
 ] 

Binglin Chang commented on YARN-1463:
-

Detail log:

2013-12-03 10:30:44,577 WARN  [Thread-321] mortbay.log (Slf4jLog.java:warn(89)) 
- Failed startup of context 
org.mortbay.jetty.webapp.WebAppContext@9ba0281{/,file:/Users/decster/projects/hadoop-trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/classes/webapps/cluster}
javax.servlet.ServletException: javax.servlet.ServletException: Principal not 
defined in configuration
at 
org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.init(KerberosAuthenticationHandler.java:203)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:146)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at org.mortbay.jetty.Server.doStart(Server.java:224)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.apache.hadoop.http.HttpServer.start(HttpServer.java:914)
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:245)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:820)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:471)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:844)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.transitionToActive(RMHAProtocolService.java:187)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceStart(RMHAProtocolService.java:101)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:871)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper$3.run(MiniYARNCluster.java:242)

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.000.patch, YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely


[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837255#comment-13837255
 ] 

qingwu.fu commented on YARN-1458:
-

Hi Sandy，
We have submitted the patch: YARN-1458.patch . Please help us reviewing it.
Nice to work with you. Thank you so much!

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837258#comment-13837258
 ] 

Binglin Chang commented on YARN-1463:
-

We can see from the code, HttpServer does not cover null check for keys

{code}
  private void initSpnego(Configuration conf, String hostName,
  String usernameConfKey, String keytabConfKey) throws IOException {
MapString, String params = new HashMapString, String();
String principalInConf = conf.get(usernameConfKey);
if (principalInConf != null  !principalInConf.isEmpty()) {
  params.put(kerberos.principal, SecurityUtil.getServerPrincipal(
  principalInConf, hostName));
}
String httpKeytab = conf.get(keytabConfKey);
if (httpKeytab != null  !httpKeytab.isEmpty()) {
  params.put(kerberos.keytab, httpKeytab);
}
params.put(AuthenticationFilter.AUTH_TYPE, kerberos);
  
defineFilter(webAppContext, SPNEGO_FILTER,
 AuthenticationFilter.class.getName(), params, null);
  }
{code}

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.000.patch, YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837264#comment-13837264
 ] 

Haohui Mai commented on YARN-1463:
--

Based on the stack traces, it seems to me that there're two issues here.

First, HDFS-5545 introduces a bug so that it is passing null as the 
configuration key of principals / keytabs into the HttpServer.Builder. The 
attached patch fixes the problem.

Second, Webapps enables spnego authentication when security is enabled but no 
principals / keytabs are passed in. This configuration is wrong and it should 
fail. Therefore, in my opinion it is problematic to mask the failures in 
WebApps.java. Maybe we should fix the unit test instead.

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.000.patch, YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1301) Need to log the blacklist additions/removals when YarnSchedule#allocate


[ 
https://issues.apache.org/jira/browse/YARN-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837269#comment-13837269
 ] 

Tsuyoshi OZAWA commented on YARN-1301:
--

[~zjshen], a patch is ready now. Could you review it? Thanks.

 Need to log the blacklist additions/removals when YarnSchedule#allocate
 ---

 Key: YARN-1301
 URL: https://issues.apache.org/jira/browse/YARN-1301
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Fix For: 2.4.0

 Attachments: YARN-1301.1.patch, YARN-1301.2.patch, YARN-1301.3.patch, 
 YARN-1301.4.patch, YARN-1301.5.patch


 Now without the log, it's hard to debug whether blacklist is updated on the 
 scheduler side or not



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-321) Generic application history service

2013-12-02 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837276#comment-13837276
 ] 

Jeff Zhang commented on YARN-321:
-

Will this jira been included in the next release ?

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837278#comment-13837278
 ] 

Binglin Chang commented on YARN-1463:
-

bq. Webapps enables spnego authentication when security is enabled but no 
principals / keytabs are passed in. This configuration is wrong and it should 
fail.
I thought the same, but when I looked at the original code:
{code}
  if (spnegoPrincipalKey == null
  || conf.get(spnegoPrincipalKey, ).isEmpty()) {
LOG.warn(Principal for spnego filter is not set);
initSpnego = false;
  }
  if (spnegoKeytabKey == null
  || conf.get(spnegoKeytabKey, ).isEmpty()) {
LOG.warn(Keytab for spnego filter is not set);
initSpnego = false;
  }
{code}

The code make a WARN log instead of ERROR, it looks like a intentional 
behavior, so I keep the original behavior just for safe, thoughts?

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.000.patch, YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-321) Generic application history service

2013-12-02 Thread Jeff Zhang (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837281#comment-13837281
]

Jeff Zhang commented on YARN-321:
-

Another question about this jira. I found that the container logURL is
hard-coded there, user still could not see the logs of each container ( stdout,
stderror ). Is it on the roadmap that allow user to see the logs ? And which
jira is tracking this ? Thanks .

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf,
HistoryStorageDemo.java

The mapreduce job history server currently needs to be deployed as a trusted
server in sync with the mapreduce runtime. Every new application would need a
similar application history server. Having to deploy O(T*V) (where T is
number of type of application, V is number of version of application) trusted
servers is clearly not scalable.
Job history storage handling itself is pretty generic: move the logs and
history data into a particular directory for later serving. Job history data
is already stored as json (or binary avro). I propose that we create only one
trusted application history server, which can have a generic UI (display json
as a tree of strings) as well. Specific application/version can deploy
untrusted webapps (a la AMs) to query the application history server and
interpret the json for its specific UI and/or analytics.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1468) TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed.

2013-12-02 Thread Junping Du (JIRA)

Junping Du created YARN-1468:


 Summary: TestRMRestart.testRMRestartWaitForPreviousAMToFinish get 
failed.
 Key: YARN-1468
 URL: https://issues.apache.org/jira/browse/YARN-1468
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Junping Du
Priority: Critical


Log is as following:
{code}
Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec 
 FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
  Time elapsed: 44.197 sec   FAILURE!
junit.framework.AssertionFailedError: AppAttempt state is not correct 
(timedout) expected:ALLOCATED but was:SCHEDULED
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:826)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:464)
{code}




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837303#comment-13837303
 ] 

Haohui Mai commented on YARN-1463:
--

This is fine with me, but the test is broken then. Maybe we can leave a comment 
there and fix it later on.

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.000.patch, YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837310#comment-13837310
 ] 

Binglin Chang commented on YARN-1463:
-

bq. but the test is broken then
I am sorry. What do you mean? Which test? With my original patch, I didn't see 
any test fail? 

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.000.patch, YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1468) TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed.


[ 
https://issues.apache.org/jira/browse/YARN-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837312#comment-13837312
 ] 

Tsuyoshi OZAWA commented on YARN-1468:
--

Maybe this is a timing bug: I cannot reproduce the problem in my local 
environment.

 TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed.
 

 Key: YARN-1468
 URL: https://issues.apache.org/jira/browse/YARN-1468
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Junping Du
Priority: Critical

 Log is as following:
 {code}
 Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec 
  FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
 testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
   Time elapsed: 44.197 sec   FAILURE!
 junit.framework.AssertionFailedError: AppAttempt state is not correct 
 (timedout) expected:ALLOCATED but was:SCHEDULED
 at junit.framework.Assert.fail(Assert.java:50)
 at junit.framework.Assert.failNotEquals(Assert.java:287)
 at junit.framework.Assert.assertEquals(Assert.java:67)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:826)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:464)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1468) TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed.


 [ 
https://issues.apache.org/jira/browse/YARN-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1468:
-

Description: 
Log is as following:
{code}
Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec 
 FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
  Time elapsed: 44.197 sec   FAILURE!
junit.framework.AssertionFailedError: AppAttempt state is not correct 
(timedout) expected:ALLOCATED but was:SCHEDULED
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:826)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:464)
{code}

Another log is as following:
{code}
Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 143.009 sec 
 FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
testRMDelegationTokenRestoredOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
  Time elapsed: 2.077 sec   FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertTrue(Assert.java:54)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMDelegationTokenRestoredOnRMRestart(TestRMRestart.java:1259)
{code}]

  was:
Log is as following:
{code}
Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec 
 FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
  Time elapsed: 44.197 sec   FAILURE!
junit.framework.AssertionFailedError: AppAttempt state is not correct 
(timedout) expected:ALLOCATED but was:SCHEDULED
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:826)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:464)
{code}



 TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed.
 

 Key: YARN-1468
 URL: https://issues.apache.org/jira/browse/YARN-1468
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Junping Du
Priority: Critical

 Log is as following:
 {code}
 Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec 
  FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
 testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
   Time elapsed: 44.197 sec   FAILURE!
 junit.framework.AssertionFailedError: AppAttempt state is not correct 
 (timedout) expected:ALLOCATED but was:SCHEDULED
 at junit.framework.Assert.fail(Assert.java:50)
 at junit.framework.Assert.failNotEquals(Assert.java:287)
 at junit.framework.Assert.assertEquals(Assert.java:67)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:826)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:464)
 {code}
 Another log is as following:
 {code}
 Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 143.009 sec 
  FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
 testRMDelegationTokenRestoredOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
   Time elapsed: 2.077 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at

[jira] [Updated] (YARN-1468) TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed.


 [ 
https://issues.apache.org/jira/browse/YARN-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1468:
-

Description: 
Log is as following:
{code}
Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec 
 FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
  Time elapsed: 44.197 sec   FAILURE!
junit.framework.AssertionFailedError: AppAttempt state is not correct 
(timedout) expected:ALLOCATED but was:SCHEDULED
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:826)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:464)
{code}

Another log is as following:
{code}
Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 143.009 sec 
 FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
testRMDelegationTokenRestoredOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
  Time elapsed: 2.077 sec   FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertTrue(Assert.java:54)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMDelegationTokenRestoredOnRMRestart(TestRMRestart.java:1259)
{code}

  was:
Log is as following:
{code}
Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec 
 FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
  Time elapsed: 44.197 sec   FAILURE!
junit.framework.AssertionFailedError: AppAttempt state is not correct 
(timedout) expected:ALLOCATED but was:SCHEDULED
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:826)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:464)
{code}

Another log is as following:
{code}
Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 143.009 sec 
 FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
testRMDelegationTokenRestoredOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
  Time elapsed: 2.077 sec   FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertTrue(Assert.java:54)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMDelegationTokenRestoredOnRMRestart(TestRMRestart.java:1259)
{code}]


 TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed.
 

 Key: YARN-1468
 URL: https://issues.apache.org/jira/browse/YARN-1468
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Junping Du
Priority: Critical

 Log is as following:
 {code}
 Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec 
  FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
 testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
   Time elapsed: 44.197 sec   FAILURE!
 junit.framework.AssertionFailedError: AppAttempt state is not correct 
 (timedout) expected:ALLOCATED but was:SCHEDULED
 at junit.framework.Assert.fail(Assert.java:50)
 at junit.framework.Assert.failNotEquals(Assert.java:287)
 at junit.framework.Assert.assertEquals(Assert.java:67)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292)
 at

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837320#comment-13837320
 ] 

Haohui Mai commented on YARN-1463:
--

I looked into TestContainerManagerSecurity. I'm not familiar with the code, but 
it seems to me that the code is testing the secure set up. The unit test does 
not pass any principals / keytabs in the configuration, therefore spnego will 
always be disabled.

I'm not an expert of the YARN code, but it seems to me that you won't be able 
to get the right token when spnego is disabled. Maybe someone more familiar 
with the code can comment on this.

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.000.patch, YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-126) yarn rmadmin help message contains reference to hadoop cli and JT


[ 
https://issues.apache.org/jira/browse/YARN-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837333#comment-13837333
 ] 

Hadoop QA commented on YARN-126:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12580129/YARN-126.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2580//console

This message is automatically generated.

 yarn rmadmin help message contains reference to hadoop cli and JT
 -

 Key: YARN-126
 URL: https://issues.apache.org/jira/browse/YARN-126
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
Assignee: Rémy SAISSY
  Labels: usability
 Attachments: YARN-126.patch


 has option to specify a job tracker and the last line for general command 
 line syntax had bin/hadoop command [genericOptions] [commandOptions]
 ran yarn rmadmin to get usage:
 RMAdmin
 Usage: java RMAdmin
[-refreshQueues]
[-refreshNodes]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-refreshAdminAcls]
[-refreshServiceAcl]
[-help [cmd]]
 Generic options supported are
 -conf configuration file specify an application configuration file
 -D property=valueuse value for given property
 -fs local|namenode:port  specify a namenode
 -jt local|jobtracker:portspecify a job tracker
 -files comma separated list of filesspecify comma separated files to be 
 copied to the map reduce cluster
 -libjars comma separated list of jarsspecify comma separated jar files 
 to include in the classpath.
 -archives comma separated list of archivesspecify comma separated 
 archives to be unarchived on the compute machines.
 The general command line syntax is
 bin/hadoop command [genericOptions] [commandOptions]



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-862) ResourceManager and NodeManager versions should match on node registration or error out


[ 
https://issues.apache.org/jira/browse/YARN-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837330#comment-13837330
 ] 

Hadoop QA commented on YARN-862:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12589256/YARN-862-b0.23-v2.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2579//console

This message is automatically generated.

 ResourceManager and NodeManager versions should match on node registration or 
 error out
 ---

 Key: YARN-862
 URL: https://issues.apache.org/jira/browse/YARN-862
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 0.23.8
Reporter: Robert Parker
Assignee: Robert Parker
 Attachments: YARN-862-b0.23-v1.patch, YARN-862-b0.23-v2.patch


 For branch-0.23 the versions of the node manager and the resource manager 
 should match to complete a successful registration.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy


[ 
https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837337#comment-13837337
 ] 

Karthik Kambatla commented on YARN-1028:


Verified on a cluster, writing a unit test for the same.

 Add FailoverProxyProvider like capability to RMProxy
 

 Key: YARN-1028
 URL: https://issues.apache.org/jira/browse/YARN-1028
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: yarn-1028-1.patch, yarn-1028-2.patch, 
 yarn-1028-draft-cumulative.patch


 RMProxy layer currently abstracts RM discovery and implements it by looking 
 up service information from configuration. Motivated by HDFS and using 
 existing classes from Common, we can add failover proxy providers that may 
 provide RM discovery in extensible ways.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy


 [ 
https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1028:
---

Attachment: yarn-1028-2.patch

Here is a new patch that introduces a pluggable failover model and fixes the 
retry mechanism. 

High level details:
# YarnFailoverProxyProvider implements YARN specific failover-proxy-provider 
from Clients/ AMs/  NMs to connect to the RM. 
# ConfiguredFailoverProxyProvider extends the pluggable failover-proxy to 
toggle between RMs
# Required changes to RMProxy, ClientRMProxy and ServerRMProxy.

 Add FailoverProxyProvider like capability to RMProxy
 

 Key: YARN-1028
 URL: https://issues.apache.org/jira/browse/YARN-1028
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: yarn-1028-1.patch, yarn-1028-2.patch, 
 yarn-1028-draft-cumulative.patch


 RMProxy layer currently abstracts RM discovery and implements it by looking 
 up service information from configuration. Motivated by HDFS and using 
 existing classes from Common, we can add failover proxy providers that may 
 provide RM discovery in extensible ways.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1301) Need to log the blacklist additions/removals when YarnSchedule#allocate


[ 
https://issues.apache.org/jira/browse/YARN-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837352#comment-13837352
 ] 

Tsuyoshi OZAWA commented on YARN-1301:
--

Do you mean we should record the number of additions/removals?

 Need to log the blacklist additions/removals when YarnSchedule#allocate
 ---

 Key: YARN-1301
 URL: https://issues.apache.org/jira/browse/YARN-1301
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Fix For: 2.4.0

 Attachments: YARN-1301.1.patch, YARN-1301.2.patch, YARN-1301.3.patch, 
 YARN-1301.4.patch, YARN-1301.5.patch


 Now without the log, it's hard to debug whether blacklist is updated on the 
 scheduler side or not



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837360#comment-13837360
 ] 

Haohui Mai commented on YARN-1463:
--

Just to clarify, I think we can fix the unit test in a separate jira. However, 
it might be worthwhile to add some comments to explain the situations in the 
unit test. +1. 

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.000.patch, YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy