[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880797#comment-13880797
 ] 

Jian He commented on YARN-1618:
---

bq. is it still the case that RPC servers are started after recovery is 
complete?
it is.
bq.  The START should come almost immediately after the RMAppImpl object is 
created in a NEW state during regular app submission. Karthik, are we sure that 
this happened?
yes, it is.
bq. There is no need for history for an app that was never submitted 
successfully to the RM.
I agree. We don't need to save the final state of the app if the app is not 
even accepted by the RM.
bq. If we don't want the store to be touched until the app is SUBMITTED/ 
ACCEPTED (X), we should probably replace the existing NEW_SAVING state with a 
corresponding X_SAVING state, and re-jig the transitions to directly go to 
KILLED/FAILED from any of the states before this X_SAVING state.
Regarding the two approaches Karthik proposed. I'm in favor of the 1st one.  

 Applications transition from NEW to FINAL_SAVING, and try to update 
 non-existing entries in the state-store
 ---

 Key: YARN-1618
 URL: https://issues.apache.org/jira/browse/YARN-1618
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1618-1.patch


 YARN-891 augments the RMStateStore to store information on completed 
 applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
 This leads to the RM trying to update entries in the state-store that do not 
 exist. On ZKRMStateStore, this leads to the RM crashing. 
 Previous description:
 ZKRMStateStore fails to handle updates to znodes that don't exist. For 
 instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
 In these cases, the store should create the missing znode and handle the 
 update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880813#comment-13880813
 ] 

Bikas Saha commented on YARN-1618:
--

All we need to do is go from NEW-KILLED on KILL event and ignore START event 
in KILLED state.

 Applications transition from NEW to FINAL_SAVING, and try to update 
 non-existing entries in the state-store
 ---

 Key: YARN-1618
 URL: https://issues.apache.org/jira/browse/YARN-1618
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1618-1.patch


 YARN-891 augments the RMStateStore to store information on completed 
 applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
 This leads to the RM trying to update entries in the state-store that do not 
 exist. On ZKRMStateStore, this leads to the RM crashing. 
 Previous description:
 ZKRMStateStore fails to handle updates to znodes that don't exist. For 
 instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
 In these cases, the store should create the missing znode and handle the 
 update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command

2014-01-24 Thread Kenji Kikushima (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-1480:
--

Attachment: YARN-1480-3.patch

Added a test for invalid finalStatus refer to appStates test case.
Thanks for your confirmation and suggestion.

 RM web services getApps() accepts many more filters than ApplicationCLI 
 list command
 --

 Key: YARN-1480
 URL: https://issues.apache.org/jira/browse/YARN-1480
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480.patch


 Nowadays RM web services getApps() accepts many more filters than 
 ApplicationCLI list command, which only accepts state and type. IMHO, 
 ideally, different interfaces should provide consistent functionality. Is it 
 better to allow more filters in ApplicationCLI?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1631) Container allocation issue in Leafqueue assignContainers()

2014-01-24 Thread Sunil G (JIRA)
Sunil G created YARN-1631:
-

 Summary: Container allocation issue in Leafqueue assignContainers()
 Key: YARN-1631
 URL: https://issues.apache.org/jira/browse/YARN-1631
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: SuSe 11 Linux 
Reporter: Sunil G


Application1 has a demand of 8GB[Map Task Size as 8GB] which is more than 
Node_1 can handle.
Node_1 has a size of 8GB and 2GB is used by Application1's AM.
Hence reservation happened for remaining 6GB in Node_1 by Application1.

A new job is submitted with 2GB AM size and 2GB task size with only 2 Maps to 
run.
Node_2 also has 8GB capability.

But Application2's AM cannot be launched in Node_2. And Application2 waits 
longer as only 2 Nodes are available in cluster.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1631) Container allocation issue in Leafqueue assignContainers()

2014-01-24 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1631:
--

Attachment: Yarn-1631.1.patch

As per Leafqueue assignContainers(), application will be fetched from 
activeApplications.
In this scenario, Application1 is fetched first when Node Update of Node_2 came.

So below check in assignContainers failed 
  // Check queue max-capacity limit
  if (!assignToQueue(clusterResource, required)) {
return NULL_ASSIGNMENT;
  }
Here the queue limit was crossing the limit. But as per the return statement, 
the loop never tried for the second application.
Application2 has only 2GB demand to launch. And this could have launched in 
Node_2.

So instead of return NULL_ASSIGNMENT, it is better to break from the inner 
loop. 
some user limit check also breaking from inner loop.

Kindly check this patch and please share your thoughts.

 Container allocation issue in Leafqueue assignContainers()
 --

 Key: YARN-1631
 URL: https://issues.apache.org/jira/browse/YARN-1631
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: SuSe 11 Linux 
Reporter: Sunil G
 Attachments: Yarn-1631.1.patch


 Application1 has a demand of 8GB[Map Task Size as 8GB] which is more than 
 Node_1 can handle.
 Node_1 has a size of 8GB and 2GB is used by Application1's AM.
 Hence reservation happened for remaining 6GB in Node_1 by Application1.
 A new job is submitted with 2GB AM size and 2GB task size with only 2 Maps to 
 run.
 Node_2 also has 8GB capability.
 But Application2's AM cannot be launched in Node_2. And Application2 waits 
 longer as only 2 Nodes are available in cluster.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1575) Public localizer crashes with Localized unkown resource

2014-01-24 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881134#comment-13881134
 ] 

Kihwal Lee commented on YARN-1575:
--

+1 The patch looks good. The use of synchronizedMap and double locking is well 
justified.

 Public localizer crashes with Localized unkown resource
 -

 Key: YARN-1575
 URL: https://issues.apache.org/jira/browse/YARN-1575
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: YARN-1575.branch-0.23.patch, YARN-1575.patch


 The public localizer can crash with the error:
 {noformat}
 2014-01-08 14:11:43,212 [Thread-467] ERROR 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Localized unkonwn resource to java.util.concurrent.FutureTask@852e26
 2014-01-08 14:11:43,212 [Thread-467] INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Public cache exiting
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1479) Invalid NaN values in Hadoop REST API JSON response

2014-01-24 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881180#comment-13881180
 ] 

Jonathan Eagles commented on YARN-1479:
---

Thanks, Chen. Couple of minor things and a question for you.
* There are a couple of unnecessary imports in TestApplicationMasterService. 
Let's get those cleaned up before this patch goes in.
* progressCheck - the function will be better off package-private since the 
intention is not to advertise new functionality
* progressCheck - this function should be renamed since check is a question and 
not an indication something is being modified. Perhaps progressFilter or 
hopefully you can think of something better.


 Invalid NaN values in Hadoop REST API JSON response
 ---

 Key: YARN-1479
 URL: https://issues.apache.org/jira/browse/YARN-1479
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 0.23.6, 2.0.4-alpha
Reporter: Kendall Thrapp
Assignee: Chen He
 Fix For: 2.4.0

 Attachments: Yarn-1479.patch


 I've been occasionally coming across instances where Hadoop's Cluster 
 Applications REST API 
 (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API)
  has returned JSON that PHP's json_decode function failed to parse.  I've 
 tracked the syntax error down to the presence of the unquoted word NaN 
 appearing as a value in the JSON.  For example:
 progress:NaN,
 NaN is not part of the JSON spec, so its presence renders the whole JSON 
 string invalid.  Hadoop needs to return something other than NaN in this case 
 -- perhaps an empty string or the quoted string NaN.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-01-24 Thread Cindy Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881354#comment-13881354
 ] 

Cindy Li commented on YARN-1525:


[~xgong], thanks for your comments. In the current code, it just loop through 
all the available RM ids. There is already an rpcTimeoutForChecks when querying 
HAState, do we need to set additional MAXIMUM_waiting time here too? 

@Karthik, thanks for your comments. I can make it into HAUtil. 

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch.v1, YARN1525.patch.v2


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1617) Remove ancient comment and surround LOG.debug in AppSchedulingInfo.allocate

2014-01-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881420#comment-13881420
 ] 

Karthik Kambatla commented on YARN-1617:


+1

 Remove ancient comment and surround LOG.debug in AppSchedulingInfo.allocate
 ---

 Key: YARN-1617
 URL: https://issues.apache.org/jira/browse/YARN-1617
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1617.patch


 {code}
   synchronized private void allocate(Container container) {
 // Update consumption and track allocations
 //TODO: fixme sharad
 /* try {
 store.storeContainer(container);
   } catch (IOException ie) {
 // TODO fix this. we shouldnt ignore
   }*/
 
 LOG.debug(allocate: applicationId= + applicationId +  container=
 + container.getId() +  host=
 + container.getNodeId().toString());
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package

2014-01-24 Thread Chen He (JIRA)
Chen He created YARN-1632:
-

 Summary: TestApplicationMasterServices should be under 
org.apache.hadoop.yarn.server.resourcemanager package
 Key: YARN-1632
 URL: https://issues.apache.org/jira/browse/YARN-1632
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0, 0.23.9
Reporter: Chen He
Assignee: Chen He
Priority: Minor


ApplicationMasterService is under org.apache.hadoop.yarn.server.resourcemanager 
package. However, its unit test file TestApplicationMasterService is placed 
under org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice 
package which only contains one file (TestApplicationMasterService). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1479) Invalid NaN values in Hadoop REST API JSON response

2014-01-24 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881499#comment-13881499
 ] 

Chen He commented on YARN-1479:
---

Hi [~jeagles]
Thank you for your suggestion.
I can answer your questions one by one.
{quote}There are a couple of unnecessary imports in 
TestApplicationMasterService. Let's get those cleaned up before this patch goes 
in.{quote}
I have removed those unnecessary imports;
{quote}progressCheck - the function will be better off package-private since 
the intention is not to advertise new functionality{quote}
{quote}progressCheck - this function should be renamed since check is a 
question and not an indication something is being modified. Perhaps 
progressFilter or hopefully you can think of something better.{quote}
If progressCheck is package-private, it can not be directly called in the 
TestApplicationMasterSerive since Yarn-1632;
I will remove progressCheck method in the yarn-1479v2.patch and migrate its 
code into ApplicationMasterService.allocate() method. Then, we only need to 
have testAllocate() method in TestApplicationMasterService.


 Invalid NaN values in Hadoop REST API JSON response
 ---

 Key: YARN-1479
 URL: https://issues.apache.org/jira/browse/YARN-1479
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 0.23.6, 2.0.4-alpha
Reporter: Kendall Thrapp
Assignee: Chen He
 Fix For: 2.4.0

 Attachments: Yarn-1479.patch


 I've been occasionally coming across instances where Hadoop's Cluster 
 Applications REST API 
 (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API)
  has returned JSON that PHP's json_decode function failed to parse.  I've 
 tracked the syntax error down to the presence of the unquoted word NaN 
 appearing as a value in the JSON.  For example:
 progress:NaN,
 NaN is not part of the JSON spec, so its presence renders the whole JSON 
 string invalid.  Hadoop needs to return something other than NaN in this case 
 -- perhaps an empty string or the quoted string NaN.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (YARN-1600) RM does not startup when security is enabled without spnego configured

2014-01-24 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai reassigned YARN-1600:


Assignee: Haohui Mai

 RM does not startup when security is enabled without spnego configured
 --

 Key: YARN-1600
 URL: https://issues.apache.org/jira/browse/YARN-1600
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Haohui Mai
Priority: Blocker

 We have a custom auth filter in front of our various UI pages that handles 
 user authentication.  However currently the RM assumes that if security is 
 enabled then the user must have configured spnego as well for the RM web 
 pages which is not true in our case.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-24 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881507#comment-13881507
 ] 

Sandy Ryza commented on YARN-1630:
--

Can the config be added as a timeout instead of a number of polls?

Also, a couple nits:
Timeout should default to -1, meaning forever
No need for message var.  just LOG.info directly.
An (Yarn)exception should be thrown to indicate that the operation didn't 
complete in time.  Otherwise clients might think it had completed succesfuly.

 Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
 forever
 ---

 Key: YARN-1630
 URL: https://issues.apache.org/jira/browse/YARN-1630
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Aditya Acharya
Assignee: Aditya Acharya
 Attachments: diff.txt


 I ran an MR2 application that would have been long running, and killed it 
 programmatically using a YarnClient. The app was killed, but the client hung 
 forever. The message that I saw, which spammed the logs, was Watiting for 
 application application_1389036507624_0018 to be killed.
 The RM log indicated that the app had indeed transitioned from RUNNING to 
 KILLED, but for some reason future responses to the RPC to kill the 
 application did not indicate that the app had been terminated.
 I tracked this down to YarnClientImpl.java, and though I was unable to 
 reproduce the bug, I wrote a patch to introduce a bound on the number of 
 times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1633) Define the entity, entity-info and event objects

2014-01-24 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-1633:
-

 Summary: Define the entity, entity-info and event objects
 Key: YARN-1633
 URL: https://issues.apache.org/jira/browse/YARN-1633
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Define the core objects of the application-timeline effort.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1633) Define the entity, entity-info and event objects

2014-01-24 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1633:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1530

 Define the entity, entity-info and event objects
 

 Key: YARN-1633
 URL: https://issues.apache.org/jira/browse/YARN-1633
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 Define the core objects of the application-timeline effort.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1634) Define a ApplicationTimelineStore interface and an in-memory implementation

2014-01-24 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-1634:
-

 Summary: Define a ApplicationTimelineStore interface and an 
in-memory implementation 
 Key: YARN-1634
 URL: https://issues.apache.org/jira/browse/YARN-1634
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


As per the design doc, the store needs to pluggable. We need a base interface, 
and an in-memory implementation for testing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1635) Implement a Leveldb based ApplicationTimelineStore

2014-01-24 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-1635:
-

 Summary: Implement a Leveldb based ApplicationTimelineStore
 Key: YARN-1635
 URL: https://issues.apache.org/jira/browse/YARN-1635
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli


As per the design doc, we need a levelDB + local-filesystem based 
implementation to start with and for small deployments.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1636) Implement timeline related web-services inside AHS for storing and retrieving entities+eventies

2014-01-24 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-1636:
-

 Summary: Implement timeline related web-services inside AHS for 
storing and retrieving entities+eventies
 Key: YARN-1636
 URL: https://issues.apache.org/jira/browse/YARN-1636
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (YARN-1635) Implement a Leveldb based ApplicationTimelineStore

2014-01-24 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-1635:
-

Assignee: Vinod Kumar Vavilapalli

 Implement a Leveldb based ApplicationTimelineStore
 --

 Key: YARN-1635
 URL: https://issues.apache.org/jira/browse/YARN-1635
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 As per the design doc, we need a levelDB + local-filesystem based 
 implementation to start with and for small deployments.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1637) Implement a client library for java users to post entities+events

2014-01-24 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-1637:
-

 Summary: Implement a client library for java users to post 
entities+events
 Key: YARN-1637
 URL: https://issues.apache.org/jira/browse/YARN-1637
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


This is a wrapper around the web-service to facilitate easy posting of 
entity+event data to the time-line server.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1638) Add an integration test validating post, storage and retrival of entites+events

2014-01-24 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-1638:
-

 Summary: Add an integration test validating post, storage and 
retrival of entites+events
 Key: YARN-1638
 URL: https://issues.apache.org/jira/browse/YARN-1638
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1600) RM does not startup when security is enabled without spnego configured

2014-01-24 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated YARN-1600:
-

Attachment: YARN-1600.000.patch

This patch ports the solution in the earlier patches of YARN-1463.

 RM does not startup when security is enabled without spnego configured
 --

 Key: YARN-1600
 URL: https://issues.apache.org/jira/browse/YARN-1600
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Haohui Mai
Priority: Blocker
 Attachments: YARN-1600.000.patch


 We have a custom auth filter in front of our various UI pages that handles 
 user authentication.  However currently the RM assumes that if security is 
 enabled then the user must have configured spnego as well for the RM web 
 pages which is not true in our case.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package

2014-01-24 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1632:
--

Attachment: yarn-1632.patch

 TestApplicationMasterServices should be under 
 org.apache.hadoop.yarn.server.resourcemanager package
 ---

 Key: YARN-1632
 URL: https://issues.apache.org/jira/browse/YARN-1632
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9, 2.2.0
Reporter: Chen He
Assignee: Chen He
Priority: Minor
 Attachments: yarn-1632.patch


 ApplicationMasterService is under 
 org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test 
 file TestApplicationMasterService is placed under 
 org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice 
 package which only contains one file (TestApplicationMasterService). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-01-24 Thread Cindy Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cindy Li updated YARN-1525:
---

Attachment: YARN1525.patch.v3

I've added a test case file TestRMWebHA.java, a HA util file named 
RMHAUtils.java. It is easier to add a new HA util file specifically for RM than 
to add the getActiveRMId functions to HAUtils.java, which is part of yarn conf 
package. Talked with Xuan and Vinod offline, we don't need to add 
MAXIMUM_waiting_time here, as the scan for active RM only happens once. If no 
active RM is found, it returns null. I've also minimized the format related 
change in the new patch. 

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-24 Thread Aditya Acharya (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Acharya updated YARN-1630:
-

Attachment: (was: diff.txt)

 Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
 forever
 ---

 Key: YARN-1630
 URL: https://issues.apache.org/jira/browse/YARN-1630
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Aditya Acharya
Assignee: Aditya Acharya
 Attachments: diff.txt


 I ran an MR2 application that would have been long running, and killed it 
 programmatically using a YarnClient. The app was killed, but the client hung 
 forever. The message that I saw, which spammed the logs, was Watiting for 
 application application_1389036507624_0018 to be killed.
 The RM log indicated that the app had indeed transitioned from RUNNING to 
 KILLED, but for some reason future responses to the RPC to kill the 
 application did not indicate that the app had been terminated.
 I tracked this down to YarnClientImpl.java, and though I was unable to 
 reproduce the bug, I wrote a patch to introduce a bound on the number of 
 times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-24 Thread Aditya Acharya (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Acharya updated YARN-1630:
-

Attachment: diff.txt

 Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
 forever
 ---

 Key: YARN-1630
 URL: https://issues.apache.org/jira/browse/YARN-1630
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Aditya Acharya
Assignee: Aditya Acharya
 Attachments: diff.txt


 I ran an MR2 application that would have been long running, and killed it 
 programmatically using a YarnClient. The app was killed, but the client hung 
 forever. The message that I saw, which spammed the logs, was Watiting for 
 application application_1389036507624_0018 to be killed.
 The RM log indicated that the app had indeed transitioned from RUNNING to 
 KILLED, but for some reason future responses to the RPC to kill the 
 application did not indicate that the app had been terminated.
 I tracked this down to YarnClientImpl.java, and though I was unable to 
 reproduce the bug, I wrote a patch to introduce a bound on the number of 
 times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1639) YARM RM HA requires different configs on different RM hosts

2014-01-24 Thread Arpit Gupta (JIRA)
Arpit Gupta created YARN-1639:
-

 Summary: YARM RM HA requires different configs on different RM 
hosts
 Key: YARN-1639
 URL: https://issues.apache.org/jira/browse/YARN-1639
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Arpit Gupta
Assignee: Xuan Gong


We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you 
want to first or second.
This means we have different configs on different RM nodes. This is unlike HDFS 
HA where the same configs are pushed to both NN's and it would be better to 
have the same setup for RM as this would make installation and managing easier.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-24 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881568#comment-13881568
 ] 

Sandy Ryza commented on YARN-1630:
--

Thanks Aditya.  To make the naming less ambiguous, we should call the property 
client.application-client-protocol.poll-timeout-ms.

After that, LGTM.

 Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
 forever
 ---

 Key: YARN-1630
 URL: https://issues.apache.org/jira/browse/YARN-1630
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Aditya Acharya
Assignee: Aditya Acharya
 Attachments: diff.txt


 I ran an MR2 application that would have been long running, and killed it 
 programmatically using a YarnClient. The app was killed, but the client hung 
 forever. The message that I saw, which spammed the logs, was Watiting for 
 application application_1389036507624_0018 to be killed.
 The RM log indicated that the app had indeed transitioned from RUNNING to 
 KILLED, but for some reason future responses to the RPC to kill the 
 application did not indicate that the app had been terminated.
 I tracked this down to YarnClientImpl.java, and though I was unable to 
 reproduce the bug, I wrote a patch to introduce a bound on the number of 
 times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1639) YARM RM HA requires different configs on different RM hosts

2014-01-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881576#comment-13881576
 ] 

Karthik Kambatla commented on YARN-1639:


Agree being able to use the same configs for both RMs would be simpler to 
deploy. While working on YARN-1232, I discussed this with Bikas and Alejandro. 
I can't remember the reason, but I think there was a reason we decided to go 
with different ha.ids/configs. [~bikassaha], [~tucu00] - do you remember why?

 YARM RM HA requires different configs on different RM hosts
 ---

 Key: YARN-1639
 URL: https://issues.apache.org/jira/browse/YARN-1639
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Arpit Gupta
Assignee: Xuan Gong

 We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you 
 want to first or second.
 This means we have different configs on different RM nodes. This is unlike 
 HDFS HA where the same configs are pushed to both NN's and it would be better 
 to have the same setup for RM as this would make installation and managing 
 easier.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1639) YARM RM HA requires different configs on different RM hosts

2014-01-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1639:
---

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-149

 YARM RM HA requires different configs on different RM hosts
 ---

 Key: YARN-1639
 URL: https://issues.apache.org/jira/browse/YARN-1639
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Arpit Gupta
Assignee: Xuan Gong

 We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you 
 want to first or second.
 This means we have different configs on different RM nodes. This is unlike 
 HDFS HA where the same configs are pushed to both NN's and it would be better 
 to have the same setup for RM as this would make installation and managing 
 easier.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1639) YARM RM HA requires different configs on different RM hosts

2014-01-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881577#comment-13881577
 ] 

Karthik Kambatla commented on YARN-1639:


In any case, if we decide to go ahead with simplifying this, can we do it such 
that explicitly specifying the ha.id also works. 

 YARM RM HA requires different configs on different RM hosts
 ---

 Key: YARN-1639
 URL: https://issues.apache.org/jira/browse/YARN-1639
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Arpit Gupta
Assignee: Xuan Gong

 We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you 
 want to first or second.
 This means we have different configs on different RM nodes. This is unlike 
 HDFS HA where the same configs are pushed to both NN's and it would be better 
 to have the same setup for RM as this would make installation and managing 
 easier.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1640) Manual Failover does not work in secure clusters

2014-01-24 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-1640:
---

 Summary: Manual Failover does not work in secure clusters
 Key: YARN-1640
 URL: https://issues.apache.org/jira/browse/YARN-1640
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1640) Manual Failover does not work in secure clusters

2014-01-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881592#comment-13881592
 ] 

Karthik Kambatla commented on YARN-1640:


[~xgong] - can you confirm YARN-1598 is part of what you are testing. 

 Manual Failover does not work in secure clusters
 

 Key: YARN-1640
 URL: https://issues.apache.org/jira/browse/YARN-1640
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong

 NodeManager gets rejected after manually making one RM as active.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1640) Manual Failover does not work in secure clusters

2014-01-24 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1640:


Description: NodeManager gets rejected after manually making one RM as 
active.

 Manual Failover does not work in secure clusters
 

 Key: YARN-1640
 URL: https://issues.apache.org/jira/browse/YARN-1640
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong

 NodeManager gets rejected after manually making one RM as active.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1640) Manual Failover does not work in secure clusters

2014-01-24 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881594#comment-13881594
 ] 

Xuan Gong commented on YARN-1640:
-

I think so, I am using the latest trunk code..

 Manual Failover does not work in secure clusters
 

 Key: YARN-1640
 URL: https://issues.apache.org/jira/browse/YARN-1640
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong

 NodeManager gets rejected after manually making one RM as active.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1641) ZK store - periodically create a dummy file to kick in fencing

2014-01-24 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-1641:
--

 Summary: ZK store - periodically create a dummy file to kick in 
fencing
 Key: YARN-1641
 URL: https://issues.apache.org/jira/browse/YARN-1641
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


Fencing in ZK store kicks in when the RM tries to write something to the store. 
If the RM doesn't write anything to the store, it doesn't get fenced and can 
continue to assume being the Active. 

By periodically writing a file (say, every RM_ZK_TIMEOUT_MS seconds), we can 
ensure it gets fenced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-01-24 Thread Cindy Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881600#comment-13881600
 ] 

Cindy Li commented on YARN-1525:


I'll work on a new ticket for the web service redirection part.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package

2014-01-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881672#comment-13881672
 ] 

Hadoop QA commented on YARN-1632:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625145/yarn-1632.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2931//console

This message is automatically generated.

 TestApplicationMasterServices should be under 
 org.apache.hadoop.yarn.server.resourcemanager package
 ---

 Key: YARN-1632
 URL: https://issues.apache.org/jira/browse/YARN-1632
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9, 2.2.0
Reporter: Chen He
Assignee: Chen He
Priority: Minor
 Attachments: yarn-1632.patch


 ApplicationMasterService is under 
 org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test 
 file TestApplicationMasterService is placed under 
 org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice 
 package which only contains one file (TestApplicationMasterService). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1573) ZK store should use a private password for root-node-acls

2014-01-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881687#comment-13881687
 ] 

Hudson commented on YARN-1573:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5036 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5036/])
YARN-1573. ZK store should use a private password for root-node-acls. (kasha) 
(kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1560594)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java


 ZK store should use a private password for root-node-acls
 -

 Key: YARN-1573
 URL: https://issues.apache.org/jira/browse/YARN-1573
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.4.0

 Attachments: yarn-1573-1.patch, yarn-1573-2.patch


 Currently, when HA is enabled, ZK store uses cluster-timestamp as the 
 password for root node ACLs to give the Active RM exclusive access to the 
 store. A more private value like a random number might be better. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1575) Public localizer crashes with Localized unkown resource

2014-01-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881685#comment-13881685
 ] 

Hudson commented on YARN-1575:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5036 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5036/])
YARN-1575. Public localizer crashes with Localized unkown resource. 
Contributed by Jason Lowe (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561110)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


 Public localizer crashes with Localized unkown resource
 -

 Key: YARN-1575
 URL: https://issues.apache.org/jira/browse/YARN-1575
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 3.0.0, 2.4.0, 0.23.11

 Attachments: YARN-1575.branch-0.23.patch, YARN-1575.patch


 The public localizer can crash with the error:
 {noformat}
 2014-01-08 14:11:43,212 [Thread-467] ERROR 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Localized unkonwn resource to java.util.concurrent.FutureTask@852e26
 2014-01-08 14:11:43,212 [Thread-467] INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Public cache exiting
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1628) TestContainerManagerSecurity fails on trunk

2014-01-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881699#comment-13881699
 ] 

Hadoop QA commented on YARN-1628:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12624860/YARN-1628.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2932//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2932//console

This message is automatically generated.

 TestContainerManagerSecurity fails on trunk
 ---

 Key: YARN-1628
 URL: https://issues.apache.org/jira/browse/YARN-1628
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-1628.patch


 The Test fails with the following error
 {noformat}
 java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost
   at 
 org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
   at 
 org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145)
   at 
 org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881714#comment-13881714
 ] 

Hadoop QA commented on YARN-1630:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625151/diff.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2934//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2934//console

This message is automatically generated.

 Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
 forever
 ---

 Key: YARN-1630
 URL: https://issues.apache.org/jira/browse/YARN-1630
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Aditya Acharya
Assignee: Aditya Acharya
 Attachments: diff.txt


 I ran an MR2 application that would have been long running, and killed it 
 programmatically using a YarnClient. The app was killed, but the client hung 
 forever. The message that I saw, which spammed the logs, was Watiting for 
 application application_1389036507624_0018 to be killed.
 The RM log indicated that the app had indeed transitioned from RUNNING to 
 KILLED, but for some reason future responses to the RPC to kill the 
 application did not indicate that the app had been terminated.
 I tracked this down to YarnClientImpl.java, and though I was unable to 
 reproduce the bug, I wrote a patch to introduce a bound on the number of 
 times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer

2014-01-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881715#comment-13881715
 ] 

Hadoop QA commented on YARN-1629:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12624937/YARN-1629-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2933//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2933//console

This message is automatically generated.

 IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer
 --

 Key: YARN-1629
 URL: https://issues.apache.org/jira/browse/YARN-1629
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1629-1.patch, YARN-1629-2.patch, YARN-1629.patch


 This can occur when the second-to-last app in a queue's pending app list is 
 made runnable.  The app is pulled out from under the iterator. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command

2014-01-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881719#comment-13881719
 ] 

Hadoop QA commented on YARN-1480:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625013/YARN-1480-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

org.apache.hadoop.yarn.client.api.impl.TestNMClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2935//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2935//console

This message is automatically generated.

 RM web services getApps() accepts many more filters than ApplicationCLI 
 list command
 --

 Key: YARN-1480
 URL: https://issues.apache.org/jira/browse/YARN-1480
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480.patch


 Nowadays RM web services getApps() accepts many more filters than 
 ApplicationCLI list command, which only accepts state and type. IMHO, 
 ideally, different interfaces should provide consistent functionality. Is it 
 better to allow more filters in ApplicationCLI?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)