date:20140124


[ 
https://issues.apache.org/jira/browse/YARN-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881420#comment-13881420
 ] 

Karthik Kambatla commented on YARN-1617:


+1

 Remove ancient comment and surround LOG.debug in AppSchedulingInfo.allocate
 ---

 Key: YARN-1617
 URL: https://issues.apache.org/jira/browse/YARN-1617
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1617.patch


 {code}
   synchronized private void allocate(Container container) {
 // Update consumption and track allocations
 //TODO: fixme sharad
 /* try {
 store.storeContainer(container);
   } catch (IOException ie) {
 // TODO fix this. we shouldnt ignore
   }*/
 
 LOG.debug(allocate: applicationId= + applicationId +  container=
 + container.getId() +  host=
 + container.getNodeId().toString());
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package

2014-01-24 Thread Chen He (JIRA)

Chen He created YARN-1632:
-

 Summary: TestApplicationMasterServices should be under 
org.apache.hadoop.yarn.server.resourcemanager package
 Key: YARN-1632
 URL: https://issues.apache.org/jira/browse/YARN-1632
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0, 0.23.9
Reporter: Chen He
Assignee: Chen He
Priority: Minor


ApplicationMasterService is under org.apache.hadoop.yarn.server.resourcemanager 
package. However, its unit test file TestApplicationMasterService is placed 
under org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice 
package which only contains one file (TestApplicationMasterService). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1479) Invalid NaN values in Hadoop REST API JSON response

2014-01-24 Thread Chen He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881499#comment-13881499
 ] 

Chen He commented on YARN-1479:
---

Hi [~jeagles]
Thank you for your suggestion.
I can answer your questions one by one.
{quote}There are a couple of unnecessary imports in 
TestApplicationMasterService. Let's get those cleaned up before this patch goes 
in.{quote}
I have removed those unnecessary imports;
{quote}progressCheck - the function will be better off package-private since 
the intention is not to advertise new functionality{quote}
{quote}progressCheck - this function should be renamed since check is a 
question and not an indication something is being modified. Perhaps 
progressFilter or hopefully you can think of something better.{quote}
If progressCheck is package-private, it can not be directly called in the 
TestApplicationMasterSerive since Yarn-1632;
I will remove progressCheck method in the yarn-1479v2.patch and migrate its 
code into ApplicationMasterService.allocate() method. Then, we only need to 
have testAllocate() method in TestApplicationMasterService.


 Invalid NaN values in Hadoop REST API JSON response
 ---

 Key: YARN-1479
 URL: https://issues.apache.org/jira/browse/YARN-1479
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 0.23.6, 2.0.4-alpha
Reporter: Kendall Thrapp
Assignee: Chen He
 Fix For: 2.4.0

 Attachments: Yarn-1479.patch


 I've been occasionally coming across instances where Hadoop's Cluster 
 Applications REST API 
 (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API)
  has returned JSON that PHP's json_decode function failed to parse.  I've 
 tracked the syntax error down to the presence of the unquoted word NaN 
 appearing as a value in the JSON.  For example:
 progress:NaN,
 NaN is not part of the JSON spec, so its presence renders the whole JSON 
 string invalid.  Hadoop needs to return something other than NaN in this case 
 -- perhaps an empty string or the quoted string NaN.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Assigned] (YARN-1600) RM does not startup when security is enabled without spnego configured

2014-01-24 Thread Haohui Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai reassigned YARN-1600:


Assignee: Haohui Mai

 RM does not startup when security is enabled without spnego configured
 --

 Key: YARN-1600
 URL: https://issues.apache.org/jira/browse/YARN-1600
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Haohui Mai
Priority: Blocker

 We have a custom auth filter in front of our various UI pages that handles 
 user authentication.  However currently the RM assumes that if security is 
 enabled then the user must have configured spnego as well for the RM web 
 pages which is not true in our case.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-24 Thread Sandy Ryza (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881507#comment-13881507
]

Sandy Ryza commented on YARN-1630:
--

Can the config be added as a timeout instead of a number of polls?

Also, a couple nits:
Timeout should default to -1, meaning forever
No need for message var. just LOG.info directly.
An (Yarn)exception should be thrown to indicate that the operation didn't
complete in time. Otherwise clients might think it had completed succesfuly.

Unbounded waiting for response in YarnClientImpl.java causes thread to hang
forever
---

Key: YARN-1630
URL: https://issues.apache.org/jira/browse/YARN-1630
Project: Hadoop YARN
Issue Type: Bug
Components: client
Affects Versions: 2.2.0
Reporter: Aditya Acharya
Assignee: Aditya Acharya
Attachments: diff.txt

I ran an MR2 application that would have been long running, and killed it
programmatically using a YarnClient. The app was killed, but the client hung
forever. The message that I saw, which spammed the logs, was Watiting for
application application_1389036507624_0018 to be killed.
The RM log indicated that the app had indeed transitioned from RUNNING to
KILLED, but for some reason future responses to the RPC to kill the
application did not indicate that the app had been terminated.
I tracked this down to YarnClientImpl.java, and though I was unable to
reproduce the bug, I wrote a patch to introduce a bound on the number of
times that YarnClientImpl retries the RPC before giving up.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1633) Define the entity, entity-info and event objects

Vinod Kumar Vavilapalli created YARN-1633:
-

 Summary: Define the entity, entity-info and event objects
 Key: YARN-1633
 URL: https://issues.apache.org/jira/browse/YARN-1633
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Define the core objects of the application-timeline effort.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1633) Define the entity, entity-info and event objects


 [ 
https://issues.apache.org/jira/browse/YARN-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1633:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1530

 Define the entity, entity-info and event objects
 

 Key: YARN-1633
 URL: https://issues.apache.org/jira/browse/YARN-1633
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 Define the core objects of the application-timeline effort.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1634) Define a ApplicationTimelineStore interface and an in-memory implementation

Vinod Kumar Vavilapalli created YARN-1634:
-

 Summary: Define a ApplicationTimelineStore interface and an 
in-memory implementation 
 Key: YARN-1634
 URL: https://issues.apache.org/jira/browse/YARN-1634
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


As per the design doc, the store needs to pluggable. We need a base interface, 
and an in-memory implementation for testing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1635) Implement a Leveldb based ApplicationTimelineStore

Vinod Kumar Vavilapalli created YARN-1635:
-

 Summary: Implement a Leveldb based ApplicationTimelineStore
 Key: YARN-1635
 URL: https://issues.apache.org/jira/browse/YARN-1635
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli


As per the design doc, we need a levelDB + local-filesystem based 
implementation to start with and for small deployments.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1636) Implement timeline related web-services inside AHS for storing and retrieving entities+eventies

Vinod Kumar Vavilapalli created YARN-1636:
-

 Summary: Implement timeline related web-services inside AHS for 
storing and retrieving entities+eventies
 Key: YARN-1636
 URL: https://issues.apache.org/jira/browse/YARN-1636
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Assigned] (YARN-1635) Implement a Leveldb based ApplicationTimelineStore


 [ 
https://issues.apache.org/jira/browse/YARN-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-1635:
-

Assignee: Vinod Kumar Vavilapalli

 Implement a Leveldb based ApplicationTimelineStore
 --

 Key: YARN-1635
 URL: https://issues.apache.org/jira/browse/YARN-1635
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 As per the design doc, we need a levelDB + local-filesystem based 
 implementation to start with and for small deployments.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1637) Implement a client library for java users to post entities+events

Vinod Kumar Vavilapalli created YARN-1637:
-

 Summary: Implement a client library for java users to post 
entities+events
 Key: YARN-1637
 URL: https://issues.apache.org/jira/browse/YARN-1637
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


This is a wrapper around the web-service to facilitate easy posting of 
entity+event data to the time-line server.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1638) Add an integration test validating post, storage and retrival of entites+events

Vinod Kumar Vavilapalli created YARN-1638:
-

 Summary: Add an integration test validating post, storage and 
retrival of entites+events
 Key: YARN-1638
 URL: https://issues.apache.org/jira/browse/YARN-1638
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1600) RM does not startup when security is enabled without spnego configured

2014-01-24 Thread Haohui Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated YARN-1600:
-

Attachment: YARN-1600.000.patch

This patch ports the solution in the earlier patches of YARN-1463.

 RM does not startup when security is enabled without spnego configured
 --

 Key: YARN-1600
 URL: https://issues.apache.org/jira/browse/YARN-1600
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Haohui Mai
Priority: Blocker
 Attachments: YARN-1600.000.patch


 We have a custom auth filter in front of our various UI pages that handles 
 user authentication.  However currently the RM assumes that if security is 
 enabled then the user must have configured spnego as well for the RM web 
 pages which is not true in our case.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package

2014-01-24 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1632:
--

Attachment: yarn-1632.patch

 TestApplicationMasterServices should be under 
 org.apache.hadoop.yarn.server.resourcemanager package
 ---

 Key: YARN-1632
 URL: https://issues.apache.org/jira/browse/YARN-1632
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9, 2.2.0
Reporter: Chen He
Assignee: Chen He
Priority: Minor
 Attachments: yarn-1632.patch


 ApplicationMasterService is under 
 org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test 
 file TestApplicationMasterService is placed under 
 org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice 
 package which only contains one file (TestApplicationMasterService). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-01-24 Thread Cindy Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cindy Li updated YARN-1525:
---

Attachment: YARN1525.patch.v3

I've added a test case file TestRMWebHA.java, a HA util file named 
RMHAUtils.java. It is easier to add a new HA util file specifically for RM than 
to add the getActiveRMId functions to HAUtils.java, which is part of yarn conf 
package. Talked with Xuan and Vinod offline, we don't need to add 
MAXIMUM_waiting_time here, as the scan for active RM only happens once. If no 
active RM is found, it returns null. I've also minimized the format related 
change in the new patch. 

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-24 Thread Aditya Acharya (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Acharya updated YARN-1630:
-

Attachment: (was: diff.txt)

 Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
 forever
 ---

 Key: YARN-1630
 URL: https://issues.apache.org/jira/browse/YARN-1630
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Aditya Acharya
Assignee: Aditya Acharya
 Attachments: diff.txt


 I ran an MR2 application that would have been long running, and killed it 
 programmatically using a YarnClient. The app was killed, but the client hung 
 forever. The message that I saw, which spammed the logs, was Watiting for 
 application application_1389036507624_0018 to be killed.
 The RM log indicated that the app had indeed transitioned from RUNNING to 
 KILLED, but for some reason future responses to the RPC to kill the 
 application did not indicate that the app had been terminated.
 I tracked this down to YarnClientImpl.java, and though I was unable to 
 reproduce the bug, I wrote a patch to introduce a bound on the number of 
 times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-24 Thread Aditya Acharya (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Acharya updated YARN-1630:
-

Attachment: diff.txt

 Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
 forever
 ---

 Key: YARN-1630
 URL: https://issues.apache.org/jira/browse/YARN-1630
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Aditya Acharya
Assignee: Aditya Acharya
 Attachments: diff.txt


 I ran an MR2 application that would have been long running, and killed it 
 programmatically using a YarnClient. The app was killed, but the client hung 
 forever. The message that I saw, which spammed the logs, was Watiting for 
 application application_1389036507624_0018 to be killed.
 The RM log indicated that the app had indeed transitioned from RUNNING to 
 KILLED, but for some reason future responses to the RPC to kill the 
 application did not indicate that the app had been terminated.
 I tracked this down to YarnClientImpl.java, and though I was unable to 
 reproduce the bug, I wrote a patch to introduce a bound on the number of 
 times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1639) YARM RM HA requires different configs on different RM hosts

2014-01-24 Thread Arpit Gupta (JIRA)

Arpit Gupta created YARN-1639:
-

 Summary: YARM RM HA requires different configs on different RM 
hosts
 Key: YARN-1639
 URL: https://issues.apache.org/jira/browse/YARN-1639
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Arpit Gupta
Assignee: Xuan Gong


We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you 
want to first or second.
This means we have different configs on different RM nodes. This is unlike HDFS 
HA where the same configs are pushed to both NN's and it would be better to 
have the same setup for RM as this would make installation and managing easier.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-24 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881568#comment-13881568
 ] 

Sandy Ryza commented on YARN-1630:
--

Thanks Aditya.  To make the naming less ambiguous, we should call the property 
client.application-client-protocol.poll-timeout-ms.

After that, LGTM.

 Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
 forever
 ---

 Key: YARN-1630
 URL: https://issues.apache.org/jira/browse/YARN-1630
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Aditya Acharya
Assignee: Aditya Acharya
 Attachments: diff.txt


 I ran an MR2 application that would have been long running, and killed it 
 programmatically using a YarnClient. The app was killed, but the client hung 
 forever. The message that I saw, which spammed the logs, was Watiting for 
 application application_1389036507624_0018 to be killed.
 The RM log indicated that the app had indeed transitioned from RUNNING to 
 KILLED, but for some reason future responses to the RPC to kill the 
 application did not indicate that the app had been terminated.
 I tracked this down to YarnClientImpl.java, and though I was unable to 
 reproduce the bug, I wrote a patch to introduce a bound on the number of 
 times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1639) YARM RM HA requires different configs on different RM hosts


[ 
https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881576#comment-13881576
 ] 

Karthik Kambatla commented on YARN-1639:


Agree being able to use the same configs for both RMs would be simpler to 
deploy. While working on YARN-1232, I discussed this with Bikas and Alejandro. 
I can't remember the reason, but I think there was a reason we decided to go 
with different ha.ids/configs. [~bikassaha], [~tucu00] - do you remember why?

 YARM RM HA requires different configs on different RM hosts
 ---

 Key: YARN-1639
 URL: https://issues.apache.org/jira/browse/YARN-1639
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Arpit Gupta
Assignee: Xuan Gong

 We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you 
 want to first or second.
 This means we have different configs on different RM nodes. This is unlike 
 HDFS HA where the same configs are pushed to both NN's and it would be better 
 to have the same setup for RM as this would make installation and managing 
 easier.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1639) YARM RM HA requires different configs on different RM hosts


 [ 
https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1639:
---

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-149

 YARM RM HA requires different configs on different RM hosts
 ---

 Key: YARN-1639
 URL: https://issues.apache.org/jira/browse/YARN-1639
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Arpit Gupta
Assignee: Xuan Gong

 We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you 
 want to first or second.
 This means we have different configs on different RM nodes. This is unlike 
 HDFS HA where the same configs are pushed to both NN's and it would be better 
 to have the same setup for RM as this would make installation and managing 
 easier.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1639) YARM RM HA requires different configs on different RM hosts


[ 
https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881577#comment-13881577
 ] 

Karthik Kambatla commented on YARN-1639:


In any case, if we decide to go ahead with simplifying this, can we do it such 
that explicitly specifying the ha.id also works. 

 YARM RM HA requires different configs on different RM hosts
 ---

 Key: YARN-1639
 URL: https://issues.apache.org/jira/browse/YARN-1639
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Arpit Gupta
Assignee: Xuan Gong

 We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you 
 want to first or second.
 This means we have different configs on different RM nodes. This is unlike 
 HDFS HA where the same configs are pushed to both NN's and it would be better 
 to have the same setup for RM as this would make installation and managing 
 easier.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1640) Manual Failover does not work in secure clusters

2014-01-24 Thread Xuan Gong (JIRA)

Xuan Gong created YARN-1640:
---

 Summary: Manual Failover does not work in secure clusters
 Key: YARN-1640
 URL: https://issues.apache.org/jira/browse/YARN-1640
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1640) Manual Failover does not work in secure clusters


[ 
https://issues.apache.org/jira/browse/YARN-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881592#comment-13881592
 ] 

Karthik Kambatla commented on YARN-1640:


[~xgong] - can you confirm YARN-1598 is part of what you are testing. 

 Manual Failover does not work in secure clusters
 

 Key: YARN-1640
 URL: https://issues.apache.org/jira/browse/YARN-1640
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong

 NodeManager gets rejected after manually making one RM as active.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1640) Manual Failover does not work in secure clusters

2014-01-24 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1640:


Description: NodeManager gets rejected after manually making one RM as 
active.

 Manual Failover does not work in secure clusters
 

 Key: YARN-1640
 URL: https://issues.apache.org/jira/browse/YARN-1640
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong

 NodeManager gets rejected after manually making one RM as active.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1640) Manual Failover does not work in secure clusters

2014-01-24 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881594#comment-13881594
 ] 

Xuan Gong commented on YARN-1640:
-

I think so, I am using the latest trunk code..

 Manual Failover does not work in secure clusters
 

 Key: YARN-1640
 URL: https://issues.apache.org/jira/browse/YARN-1640
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong

 NodeManager gets rejected after manually making one RM as active.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1641) ZK store - periodically create a dummy file to kick in fencing

Karthik Kambatla created YARN-1641:
--

 Summary: ZK store - periodically create a dummy file to kick in 
fencing
 Key: YARN-1641
 URL: https://issues.apache.org/jira/browse/YARN-1641
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


Fencing in ZK store kicks in when the RM tries to write something to the store. 
If the RM doesn't write anything to the store, it doesn't get fenced and can 
continue to assume being the Active. 

By periodically writing a file (say, every RM_ZK_TIMEOUT_MS seconds), we can 
ensure it gets fenced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-01-24 Thread Cindy Li (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881600#comment-13881600
 ] 

Cindy Li commented on YARN-1525:


I'll work on a new ticket for the web service redirection part.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package


[ 
https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881672#comment-13881672
 ] 

Hadoop QA commented on YARN-1632:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625145/yarn-1632.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2931//console

This message is automatically generated.

 TestApplicationMasterServices should be under 
 org.apache.hadoop.yarn.server.resourcemanager package
 ---

 Key: YARN-1632
 URL: https://issues.apache.org/jira/browse/YARN-1632
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9, 2.2.0
Reporter: Chen He
Assignee: Chen He
Priority: Minor
 Attachments: yarn-1632.patch


 ApplicationMasterService is under 
 org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test 
 file TestApplicationMasterService is placed under 
 org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice 
 package which only contains one file (TestApplicationMasterService). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1573) ZK store should use a private password for root-node-acls

2014-01-24 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881687#comment-13881687
 ] 

Hudson commented on YARN-1573:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5036 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5036/])
YARN-1573. ZK store should use a private password for root-node-acls. (kasha) 
(kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1560594)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java


 ZK store should use a private password for root-node-acls
 -

 Key: YARN-1573
 URL: https://issues.apache.org/jira/browse/YARN-1573
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.4.0

 Attachments: yarn-1573-1.patch, yarn-1573-2.patch


 Currently, when HA is enabled, ZK store uses cluster-timestamp as the 
 password for root node ACLs to give the Active RM exclusive access to the 
 store. A more private value like a random number might be better. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1575) Public localizer crashes with Localized unkown resource

2014-01-24 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881685#comment-13881685
 ] 

Hudson commented on YARN-1575:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5036 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5036/])
YARN-1575. Public localizer crashes with Localized unkown resource. 
Contributed by Jason Lowe (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561110)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


 Public localizer crashes with Localized unkown resource
 -

 Key: YARN-1575
 URL: https://issues.apache.org/jira/browse/YARN-1575
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 3.0.0, 2.4.0, 0.23.11

 Attachments: YARN-1575.branch-0.23.patch, YARN-1575.patch


 The public localizer can crash with the error:
 {noformat}
 2014-01-08 14:11:43,212 [Thread-467] ERROR 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Localized unkonwn resource to java.util.concurrent.FutureTask@852e26
 2014-01-08 14:11:43,212 [Thread-467] INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Public cache exiting
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1628) TestContainerManagerSecurity fails on trunk


[ 
https://issues.apache.org/jira/browse/YARN-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881699#comment-13881699
 ] 

Hadoop QA commented on YARN-1628:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12624860/YARN-1628.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2932//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2932//console

This message is automatically generated.

 TestContainerManagerSecurity fails on trunk
 ---

 Key: YARN-1628
 URL: https://issues.apache.org/jira/browse/YARN-1628
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-1628.patch


 The Test fails with the following error
 {noformat}
 java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost
   at 
 org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
   at 
 org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145)
   at 
 org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever


[ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881714#comment-13881714
 ] 

Hadoop QA commented on YARN-1630:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625151/diff.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2934//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2934//console

This message is automatically generated.

 Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
 forever
 ---

 Key: YARN-1630
 URL: https://issues.apache.org/jira/browse/YARN-1630
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Aditya Acharya
Assignee: Aditya Acharya
 Attachments: diff.txt


 I ran an MR2 application that would have been long running, and killed it 
 programmatically using a YarnClient. The app was killed, but the client hung 
 forever. The message that I saw, which spammed the logs, was Watiting for 
 application application_1389036507624_0018 to be killed.
 The RM log indicated that the app had indeed transitioned from RUNNING to 
 KILLED, but for some reason future responses to the RPC to kill the 
 application did not indicate that the app had been terminated.
 I tracked this down to YarnClientImpl.java, and though I was unable to 
 reproduce the bug, I wrote a patch to introduce a bound on the number of 
 times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer


[ 
https://issues.apache.org/jira/browse/YARN-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881715#comment-13881715
 ] 

Hadoop QA commented on YARN-1629:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12624937/YARN-1629-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2933//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2933//console

This message is automatically generated.

 IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer
 --

 Key: YARN-1629
 URL: https://issues.apache.org/jira/browse/YARN-1629
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1629-1.patch, YARN-1629-2.patch, YARN-1629.patch


 This can occur when the second-to-last app in a queue's pending app list is 
 made runnable.  The app is pulled out from under the iterator. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command