[jira] [Updated] (YARN-3264) [Storage implementation] Create a POC only file based storage implementation for ATS writes

2015-03-04 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3264:
-
Attachment: YARN-3264.003.patch


Updated patch as per review suggestions 

> [Storage implementation] Create a POC only file based storage implementation 
> for ATS writes
> ---
>
> Key: YARN-3264
> URL: https://issues.apache.org/jira/browse/YARN-3264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3264.001.patch, YARN-3264.002.patch, 
> YARN-3264.003.patch
>
>
> For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-03-04 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-3271:
---

Assignee: nijel

> FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
> to TestAppRunnability
> ---
>
> Key: YARN-3271
> URL: https://issues.apache.org/jira/browse/YARN-3271
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Karthik Kambatla
>Assignee: nijel
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-03-04 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346588#comment-14346588
 ] 

nijel commented on YARN-3271:
-

I would like to work no this task
As per initial analysis the following test cases using the concept of runnable 
apps.

testUserAsDefaultQueue
testNotUserAsDefaultQueue
testAppAdditionAndRemoval
testPreemptionVariablesForQueueCreatedRuntime
testDontAllowUndeclaredPools
testMoveRunnableApp
testMoveNonRunnableApp
testMoveMakesAppRunnable

Can i move these tests to the new class ? Correct me if i misunderstood the task

> FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
> to TestAppRunnability
> ---
>
> Key: YARN-3271
> URL: https://issues.apache.org/jira/browse/YARN-3271
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Karthik Kambatla
>Assignee: nijel
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-04 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346602#comment-14346602
 ] 

Jian He commented on YARN-3021:
---

bq. RM should check if the renewer is null
Actually, YARN can also provide a constant string say "SKIP_RENEW_TOKEN", MR 
uses this string as the renewer for tokens it doesn't want to renew. RM detects 
if the renewer equals the constant string and skip renew if it is.

> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-03-04 Thread Ryu Kobayashi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346638#comment-14346638
 ] 

Ryu Kobayashi commented on YARN-3249:
-

[~jianhe] I see. Sure, looks good that. I'll try it.

> Add the kill application to the Resource Manager Web UI
> ---
>
> Key: YARN-3249
> URL: https://issues.apache.org/jira/browse/YARN-3249
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
> YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.patch, killapp-failed.log, 
> killapp-failed2.log, screenshot.png, screenshot2.png
>
>
> It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3264) [Storage implementation] Create a POC only file based storage implementation for ATS writes

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346652#comment-14346652
 ] 

Hadoop QA commented on YARN-3264:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702408/YARN-3264.003.patch
  against trunk revision 3560180.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6835//console

This message is automatically generated.

> [Storage implementation] Create a POC only file based storage implementation 
> for ATS writes
> ---
>
> Key: YARN-3264
> URL: https://issues.apache.org/jira/browse/YARN-3264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3264.001.patch, YARN-3264.002.patch, 
> YARN-3264.003.patch
>
>
> For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346710#comment-14346710
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #122 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/122/])
YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is 
reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev 
b2f1ec312ee431aef762cfb49cb29cd6f4661e86)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3272) Surface container locality info in RM web UI

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346708#comment-14346708
 ] 

Hudson commented on YARN-3272:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #122 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/122/])
YARN-3272. Surface container locality info in RM web UI (Jian He via wangda) 
(wangda: rev e17e5ba9d7e2bd45ba6884f59f8045817594b284)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java


> Surface container locality info in RM web UI
> 
>
> Key: YARN-3272
> URL: https://issues.apache.org/jira/browse/YARN-3272
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.7.0
>
> Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, 
> YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, 
> YARN-3272.6.patch, container locality table.png
>
>
> We can surface the container locality info on the web UI. This is useful to 
> debug "why my applications are progressing slow", especially when locality is 
> bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346720#comment-14346720
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #856 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/856/])
YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is 
reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev 
b2f1ec312ee431aef762cfb49cb29cd6f4661e86)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3272) Surface container locality info in RM web UI

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346718#comment-14346718
 ] 

Hudson commented on YARN-3272:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #856 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/856/])
YARN-3272. Surface container locality info in RM web UI (Jian He via wangda) 
(wangda: rev e17e5ba9d7e2bd45ba6884f59f8045817594b284)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java


> Surface container locality info in RM web UI
> 
>
> Key: YARN-3272
> URL: https://issues.apache.org/jira/browse/YARN-3272
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.7.0
>
> Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, 
> YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, 
> YARN-3272.6.patch, container locality table.png
>
>
> We can surface the container locality info on the web UI. This is useful to 
> debug "why my applications are progressing slow", especially when locality is 
> bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group

2015-03-04 Thread Gururaj Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gururaj Shetty updated YARN-3187:
-
Attachment: YARN-3187.3.patch

> Documentation of Capacity Scheduler Queue mapping based on user or group
> 
>
> Key: YARN-3187
> URL: https://issues.apache.org/jira/browse/YARN-3187
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, documentation
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Gururaj Shetty
>  Labels: documentation
> Fix For: 2.6.0
>
> Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch
>
>
> YARN-2411 exposes a very useful feature {{support simple user and group 
> mappings to queues}} but its not captured in the documentation. So in this 
> jira we plan to document this feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group

2015-03-04 Thread Gururaj Shetty (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346776#comment-14346776
 ] 

Gururaj Shetty commented on YARN-3187:
--

Hi [~jianhe]/[~Naganarasimha Garla]

Attached the patch for the Markdown (.md). Please review.

> Documentation of Capacity Scheduler Queue mapping based on user or group
> 
>
> Key: YARN-3187
> URL: https://issues.apache.org/jira/browse/YARN-3187
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, documentation
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Gururaj Shetty
>  Labels: documentation
> Fix For: 2.6.0
>
> Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch
>
>
> YARN-2411 exposes a very useful feature {{support simple user and group 
> mappings to queues}} but its not captured in the documentation. So in this 
> jira we plan to document this feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-03-04 Thread Ryu Kobayashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi updated YARN-3249:

Attachment: YARN-3249.6.patch

> Add the kill application to the Resource Manager Web UI
> ---
>
> Key: YARN-3249
> URL: https://issues.apache.org/jira/browse/YARN-3249
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
> YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.6.patch, YARN-3249.patch, 
> killapp-failed.log, killapp-failed2.log, screenshot.png, screenshot2.png
>
>
> It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-03-04 Thread Ryu Kobayashi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346792#comment-14346792
 ] 

Ryu Kobayashi commented on YARN-3249:
-

I was to call directly RMWebService. And, I was changed enable so that by 
default.

> Add the kill application to the Resource Manager Web UI
> ---
>
> Key: YARN-3249
> URL: https://issues.apache.org/jira/browse/YARN-3249
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
> YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.6.patch, YARN-3249.patch, 
> killapp-failed.log, killapp-failed2.log, screenshot.png, screenshot2.png
>
>
> It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3170) YARN architecture document needs updating

2015-03-04 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346835#comment-14346835
 ] 

Brahma Reddy Battula commented on YARN-3170:


[~aw] can I go head like above..? please give your inputs ..

> YARN architecture document needs updating
> -
>
> Key: YARN-3170
> URL: https://issues.apache.org/jira/browse/YARN-3170
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Allen Wittenauer
>Assignee: Brahma Reddy Battula
>
> The marketing paragraph at the top, "NextGen MapReduce", etc are all 
> marketing rather than actual descriptions. It also needs some general 
> updates, esp given it reads as though 0.23 was just released yesterday.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346849#comment-14346849
 ] 

Hadoop QA commented on YARN-3249:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702452/YARN-3249.6.patch
  against trunk revision 3560180.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 7 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6836//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6836//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6836//console

This message is automatically generated.

> Add the kill application to the Resource Manager Web UI
> ---
>
> Key: YARN-3249
> URL: https://issues.apache.org/jira/browse/YARN-3249
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
> YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.6.patch, YARN-3249.patch, 
> killapp-failed.log, killapp-failed2.log, screenshot.png, screenshot2.png
>
>
> It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2015-03-04 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346876#comment-14346876
 ] 

Varun Vasudev commented on YARN-2190:
-

[~chuanliu] here's the error I got on my mac -
{noformat}
HWx:hadoop vvasudev$ cat ~/Downloads/YARN-2190.10.patch | patch -p0 
--dry-run
patching file hadoop-common-project/hadoop-common/src/main/winutils/task.c
patching file 
hadoop-common-project/hadoop-common/src/main/winutils/win8sdk.props
patching file 
hadoop-common-project/hadoop-common/src/main/winutils/winutils.vcxproj
Hunk #1 FAILED at 67.
1 out of 1 hunk FAILED -- saving rejects to file 
hadoop-common-project/hadoop-common/src/main/winutils/winutils.vcxproj.rej
patching file 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestWinUtils.java
patching file 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
patching file 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
patching file 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsContainerExecutor.java
patching file 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestWindowsContainerExecutor.java
{noformat}

> Provide a Windows container executor that can limit memory and CPU
> --
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.10.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, 
> YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, 
> YARN-2190.9.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to create a Windows container executor that sets 
> the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2015-03-04 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346889#comment-14346889
 ] 

Varun Vasudev commented on YARN-2190:
-

Also, the java portion of the patch looks ok to me. Some comments -

1. What about WindowsSecureContainerExecutor? Does this feature not apply to 
secure environments?
2. Can you please add documentation on the new config variables to 
yarn-default.xml?

> Provide a Windows container executor that can limit memory and CPU
> --
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.10.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, 
> YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, 
> YARN-2190.9.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to create a Windows container executor that sets 
> the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2015-03-04 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346911#comment-14346911
 ] 

Varun Vasudev commented on YARN-2190:
-

[~chuanliu] one more issue with the patch -
{noformat}
 if (conf.getBoolean(YarnConfiguration.NM_WINDOWS_CONTAINER_CPU_LIMIT_ENABLED,
+  YarnConfiguration.DEFAULT_NM_WINDOWS_CONTAINER_CPU_LIMIT_ENABLED)) {
+int vcores = resource.getVirtualCores();
+// cap overall usage to the number of cores allocated to YARN
+float yarnProcessors = NodeManagerHardwareUtils.getContainersCores(
+ResourceCalculatorPlugin.getResourceCalculatorPlugin(null, conf),
+conf);
+// CPU should be set to a percentage * 100, e.g. 20% cpu rate limit
+// should be set as 20 * 100. The following setting is equal to:
+// 100 * (100 * (vcores / Total # of cores allocated to YARN))
+cpuRate = Math.min(1, (int) ((vcores * 1) / yarnProcessors));
+  }
{noformat}

This may not behave as users expected. The 'yarnProcessors' that you receive 
from the NodeManagerHardwareUtils is the number of physical cores allocated to 
YARN containers. However resource.getVirtualCores() returns a number that the 
user submits(and can be potentially greater than 'yarnProcessors'). For 
example, an admin sets 'yarn.nodemanager.resource.cpu-vcores' to 24 on a node 
with 4 cores(this can be done by admins who wish to oversubscribe nodes). He 
also sets 'yarn.nodemanager.resource.percentage-physical-cpu-limit' to 50 
indicating that only 2 physical cores are to be used for YARN containers. The 
RM allocates two containers with 12 vcores each on the node. According to your 
math both containers would get 100% cpu, when each container should only get 
25% cpu.

What you need to do is scale the container vcores to the number of physical 
cores and not use the value as provided.


> Provide a Windows container executor that can limit memory and CPU
> --
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.10.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, 
> YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, 
> YARN-2190.9.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to create a Windows container executor that sets 
> the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346925#comment-14346925
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2054 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2054/])
YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is 
reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev 
b2f1ec312ee431aef762cfb49cb29cd6f4661e86)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3272) Surface container locality info in RM web UI

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346932#comment-14346932
 ] 

Hudson commented on YARN-3272:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #113 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/113/])
YARN-3272. Surface container locality info in RM web UI (Jian He via wangda) 
(wangda: rev e17e5ba9d7e2bd45ba6884f59f8045817594b284)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java


> Surface container locality info in RM web UI
> 
>
> Key: YARN-3272
> URL: https://issues.apache.org/jira/browse/YARN-3272
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.7.0
>
> Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, 
> YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, 
> YARN-3272.6.patch, container locality table.png
>
>
> We can surface the container locality info on the web UI. This is useful to 
> debug "why my applications are progressing slow", especially when locality is 
> bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346934#comment-14346934
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #113 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/113/])
YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is 
reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev 
b2f1ec312ee431aef762cfb49cb29cd6f4661e86)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3272) Surface container locality info in RM web UI

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346923#comment-14346923
 ] 

Hudson commented on YARN-3272:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2054 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2054/])
YARN-3272. Surface container locality info in RM web UI (Jian He via wangda) 
(wangda: rev e17e5ba9d7e2bd45ba6884f59f8045817594b284)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java


> Surface container locality info in RM web UI
> 
>
> Key: YARN-3272
> URL: https://issues.apache.org/jira/browse/YARN-3272
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.7.0
>
> Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, 
> YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, 
> YARN-3272.6.patch, container locality table.png
>
>
> We can surface the container locality info on the web UI. This is useful to 
> debug "why my applications are progressing slow", especially when locality is 
> bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347000#comment-14347000
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #122 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/122/])
YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is 
reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev 
b2f1ec312ee431aef762cfb49cb29cd6f4661e86)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3272) Surface container locality info in RM web UI

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346998#comment-14346998
 ] 

Hudson commented on YARN-3272:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #122 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/122/])
YARN-3272. Surface container locality info in RM web UI (Jian He via wangda) 
(wangda: rev e17e5ba9d7e2bd45ba6884f59f8045817594b284)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* hadoop-yarn-project/CHANGES.txt


> Surface container locality info in RM web UI
> 
>
> Key: YARN-3272
> URL: https://issues.apache.org/jira/browse/YARN-3272
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.7.0
>
> Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, 
> YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, 
> YARN-3272.6.patch, container locality table.png
>
>
> We can surface the container locality info on the web UI. This is useful to 
> debug "why my applications are progressing slow", especially when locality is 
> bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-04 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347026#comment-14347026
 ] 

Yongjun Zhang commented on YARN-3021:
-

Many thanks Jian.

{quote}
Change MR client to set null renewer for the token coming from a different 
cluster
{quote}
In the special case that we are dealing with in this jira, cluster A and 
cluster B don't trust each other. However, in other scenarios, two clusters may 
trust each other. So we can't always set null renewer based on which cluster 
the token is from. 
Maybe we can combine our approaches, set null renewer for external cluster only 
when 
{{-Dmapreduce.job.delegation.tokenrenewer.for.external.cluster=null}} is 
specified for a job?

{quote}
Actually, YARN can also provide a constant string say "SKIP_RENEW_TOKEN", MR 
uses this string as the renewer for tokens it doesn't want to renew. RM detects 
if the renewer equals the constant string and skip renew if it is.
{quote}
Maybe we can use string "null" for SKIP_RENEW_TOKEN? we need to document 
whatever string here as a special string so application don't use it for tokens 
that need to be renewed.

There is still chance of changing existing applications behavior for those who 
happen to set the renewer to our special string. So what about we still 
introduce {{yarn.resourcemanager.validate.tokenrenewer}} described in my last 
comment (enable renewer validation only when the config is true)?

Thanks.



> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3272) Surface container locality info in RM web UI

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347035#comment-14347035
 ] 

Hudson commented on YARN-3272:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/])
YARN-3272. Surface container locality info in RM web UI (Jian He via wangda) 
(wangda: rev e17e5ba9d7e2bd45ba6884f59f8045817594b284)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java


> Surface container locality info in RM web UI
> 
>
> Key: YARN-3272
> URL: https://issues.apache.org/jira/browse/YARN-3272
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.7.0
>
> Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, 
> YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, 
> YARN-3272.6.patch, container locality table.png
>
>
> We can surface the container locality info on the web UI. This is useful to 
> debug "why my applications are progressing slow", especially when locality is 
> bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347037#comment-14347037
 ] 

Hudson commented on YARN-3222:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/])
YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is 
reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev 
b2f1ec312ee431aef762cfb49cb29cd6f4661e86)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration

2015-03-04 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2693:
--
Attachment: 0006-YARN-2693.patch

Rebasing against trunk. Errors look unrelated

> Priority Label Manager in RM to manage application priority based on 
> configuration
> --
>
> Key: YARN-2693
> URL: https://issues.apache.org/jira/browse/YARN-2693
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 
> 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, 
> 0006-YARN-2693.patch
>
>
> Focus of this JIRA is to have a centralized service to handle priority labels.
> Support operations such as
> * Add/Delete priority label to a specified queue
> * Manage integer mapping associated with each priority label
> * Support managing default priority label of a given queue
> * Expose interface to RM to validate priority label
> TO have simplified interface, Priority Manager will support only 
> configuration file in contrast with admin cli and REST. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

2015-03-04 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-3039:
---

Assignee: Naganarasimha G R  (was: Junping Du)

> [Aggregator wireup] Implement ATS app-appgregator service discovery
> ---
>
> Key: YARN-3039
> URL: https://issues.apache.org/jira/browse/YARN-3039
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: Service Binding for applicationaggregator of ATS 
> (draft).pdf, YARN-3039-no-test.patch
>
>
> Per design in YARN-2928, implement ATS writer service discovery. This is 
> essential for off-node clients to send writes to the right ATS writer. This 
> should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration

2015-03-04 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2693:
--
Attachment: (was: 0006-YARN-2693.patch)

> Priority Label Manager in RM to manage application priority based on 
> configuration
> --
>
> Key: YARN-2693
> URL: https://issues.apache.org/jira/browse/YARN-2693
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 
> 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch
>
>
> Focus of this JIRA is to have a centralized service to handle priority labels.
> Support operations such as
> * Add/Delete priority label to a specified queue
> * Manage integer mapping associated with each priority label
> * Support managing default priority label of a given queue
> * Expose interface to RM to validate priority label
> TO have simplified interface, Priority Manager will support only 
> configuration file in contrast with admin cli and REST. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

2015-03-04 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347053#comment-14347053
 ] 

Naganarasimha G R commented on YARN-3039:
-

Hi [~djp],
 bq. for security reason it make NM to take AMRMTokens for using TimelineClient 
in future which make less sense. To get rid of rack condition you mentioned 
above, we propose to use observer pattern to make TimelineClient can listen 
aggregator address update in AM or NM (wrap with retry logic to tolerant 
connection failure).
Even if we are not able to have "AMRMClient can be wrapped into TimelineClient" 
i feel other suggestion from vinod was right 
{{to add a blocking call in AMRMClient to get aggregator address directly from 
RM.}} instead of observer pattern @ the AM side.  thoughts ?

bq. There are other ways (check from diagram in YARN-3033) that app aggregators 
could be deployed in a separate process or an independent container which make 
less sense to have a protocol between AUX service and RM. I think now we should 
plan to add a protocol between aggregator and NM, and then notify RM through 
NM-RM heartbeat on registering/rebind for aggregator.
Yes i have gone through 3033, but earlier was trying to mention as our current 
approach was with NM AUX service. But anyway what i wanted was some kind of 
protocol between appAggregators with either NM or RM. Protocol between NM and 
appAgregator should suffice all other ways to launch AppAgregators.

bq. app aggregator should have logic to consolidate all messages (events and 
metrics) for one application into more complex and flexible new data model. If 
each NM do aggregation separately, then it still a writer (like old timeline 
service), but not an aggregator
Well if there is no logic/requirement to aggregate/consolidate all messages 
(events and metrics) for an App, then in my opinion it better not to have 
additional instances of aggregators and we can  keep it similar to old Timline 
service.

bq. Will update proposal to reflect all these discussions (JIRA's and offline).
Thanks it will be more clear to implement if we have the proposals documented.



> [Aggregator wireup] Implement ATS app-appgregator service discovery
> ---
>
> Key: YARN-3039
> URL: https://issues.apache.org/jira/browse/YARN-3039
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: Service Binding for applicationaggregator of ATS 
> (draft).pdf, YARN-3039-no-test.patch
>
>
> Per design in YARN-2928, implement ATS writer service discovery. This is 
> essential for off-node clients to send writes to the right ATS writer. This 
> should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

2015-03-04 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347045#comment-14347045
 ] 

Sunil G commented on YARN-3136:
---

HI [~jlowe] and [~jianhe]
Could u please take a look on the patch.

> getTransferredContainers can be a bottleneck during AM registration
> ---
>
> Key: YARN-3136
> URL: https://issues.apache.org/jira/browse/YARN-3136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 
> 0003-YARN-3136.patch
>
>
> While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
> stuck waiting for the scheduler lock trying to call getTransferredContainers. 
>  The scheduler lock is highly contended, especially on a large cluster with 
> many nodes heartbeating, and it would be nice if we could find a way to 
> eliminate the need to grab this lock during this call.  We've already done 
> similar work during AM allocate calls to make sure they don't needlessly grab 
> the scheduler lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

2015-03-04 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-3039:


Assignee: Junping Du  (was: Naganarasimha G R)

> [Aggregator wireup] Implement ATS app-appgregator service discovery
> ---
>
> Key: YARN-3039
> URL: https://issues.apache.org/jira/browse/YARN-3039
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Junping Du
> Attachments: Service Binding for applicationaggregator of ATS 
> (draft).pdf, YARN-3039-no-test.patch
>
>
> Per design in YARN-2928, implement ATS writer service discovery. This is 
> essential for off-node clients to send writes to the right ATS writer. This 
> should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

2015-03-04 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3136:
--
Attachment: 0004-YARN-3136.patch

> getTransferredContainers can be a bottleneck during AM registration
> ---
>
> Key: YARN-3136
> URL: https://issues.apache.org/jira/browse/YARN-3136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 
> 0003-YARN-3136.patch, 0004-YARN-3136.patch
>
>
> While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
> stuck waiting for the scheduler lock trying to call getTransferredContainers. 
>  The scheduler lock is highly contended, especially on a large cluster with 
> many nodes heartbeating, and it would be nice if we could find a way to 
> eliminate the need to grab this lock during this call.  We've already done 
> similar work during AM allocate calls to make sure they don't needlessly grab 
> the scheduler lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347172#comment-14347172
 ] 

Hadoop QA commented on YARN-2693:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702487/0006-YARN-2693.patch
  against trunk revision 3560180.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1152 javac 
compiler warnings (more than the trunk's current 1151 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 7 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationpriority.TestApplicationPriorityManager

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6837//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6837//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6837//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6837//console

This message is automatically generated.

> Priority Label Manager in RM to manage application priority based on 
> configuration
> --
>
> Key: YARN-2693
> URL: https://issues.apache.org/jira/browse/YARN-2693
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 
> 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, 
> 0006-YARN-2693.patch
>
>
> Focus of this JIRA is to have a centralized service to handle priority labels.
> Support operations such as
> * Add/Delete priority label to a specified queue
> * Manage integer mapping associated with each priority label
> * Support managing default priority label of a given queue
> * Expose interface to RM to validate priority label
> TO have simplified interface, Priority Manager will support only 
> configuration file in contrast with admin cli and REST. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347199#comment-14347199
 ] 

Hadoop QA commented on YARN-3136:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699495/0003-YARN-3136.patch
  against trunk revision 3560180.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 8 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6838//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6838//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6838//console

This message is automatically generated.

> getTransferredContainers can be a bottleneck during AM registration
> ---
>
> Key: YARN-3136
> URL: https://issues.apache.org/jira/browse/YARN-3136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 
> 0003-YARN-3136.patch, 0004-YARN-3136.patch
>
>
> While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
> stuck waiting for the scheduler lock trying to call getTransferredContainers. 
>  The scheduler lock is highly contended, especially on a large cluster with 
> many nodes heartbeating, and it would be nice if we could find a way to 
> eliminate the need to grab this lock during this call.  We've already done 
> similar work during AM allocate calls to make sure they don't needlessly grab 
> the scheduler lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347232#comment-14347232
 ] 

Hadoop QA commented on YARN-3136:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702491/0004-YARN-3136.patch
  against trunk revision 3560180.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 8 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6839//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6839//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6839//console

This message is automatically generated.

> getTransferredContainers can be a bottleneck during AM registration
> ---
>
> Key: YARN-3136
> URL: https://issues.apache.org/jira/browse/YARN-3136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 
> 0003-YARN-3136.patch, 0004-YARN-3136.patch
>
>
> While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
> stuck waiting for the scheduler lock trying to call getTransferredContainers. 
>  The scheduler lock is highly contended, especially on a large cluster with 
> many nodes heartbeating, and it would be nice if we could find a way to 
> eliminate the need to grab this lock during this call.  We've already done 
> similar work during AM allocate calls to make sure they don't needlessly grab 
> the scheduler lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2854) The document about timeline service and generic service needs to be updated

2015-03-04 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2854:

Attachment: YARN-2854.20150304.1.patch

Thanks [~gururaj] for helping me to convert the doc to markdown,
 [~jianhe], can you please review the patch .

> The document about timeline service and generic service needs to be updated
> ---
>
> Key: YARN-2854
> URL: https://issues.apache.org/jira/browse/YARN-2854
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, 
> YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, timeline_structure.jpg
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

2015-03-04 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3039:
-
Attachment: YARN-3039-v2-incomplete.patch

Update a patch (haven't finished yet) to reflect some of discussions above.
Including:
+ maintain the app aggregator info in RMApp, with event model (done)
+ aggregator update in NM-RM heartbeat (done)
+ aggregator update in AM-RM allocation request/response (done)
+ Persistent aggregator update in RMStateStore (fix previous patch)
+ a new API to ResourceTrackerService to register app aggregator to RM 
(done)
+ adding a new protocol between aggregator and NM
  - new proto file (and proto structure for request and response)  -- done.
  - interfaces: (protocol, request, response)
 - AggregatorNodemanagerProtocol (done)
 - AggregatorNodemanagerProtocolPBClientImpl (TODO)
 - NMAggregatorService (TODO, server impl)
 - AggregatorNodemanagerProtocolPB (done)
 - AggregatorNodemanagerProtocolPBServiceImpl (done)
 - ReportNewAggregatorsInfoRequest/Response (and PBs) (done)
 - ReportNewAggregatorsInfoRequestPBImpl (done)
 - ReportNewAggregatorsInfoResponse (done)
 - ReportNewAggregatorsInfoResponsePBImpl (done)
 - AppAggregatorsMap (done)
   AppAggregatorsMapPBImpl (done)

Not included yet:
+ NM hosting new protocol
+ Aggregator call new protocol client
+ aggregator info get recovered during NM restart
+ make TimelineClient Observer pattern to observe the change of aggregator 
address.

Will update the proposal afterwards.

> [Aggregator wireup] Implement ATS app-appgregator service discovery
> ---
>
> Key: YARN-3039
> URL: https://issues.apache.org/jira/browse/YARN-3039
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Junping Du
> Attachments: Service Binding for applicationaggregator of ATS 
> (draft).pdf, YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch
>
>
> Per design in YARN-2928, implement ATS writer service discovery. This is 
> essential for off-node clients to send writes to the right ATS writer. This 
> should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2868) Add metric for initial container launch time to FairScheduler

2015-03-04 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2868:
-
Attachment: YARN-2868.009.patch

Update with latest feedback.

> Add metric for initial container launch time to FairScheduler
> -
>
> Key: YARN-2868
> URL: https://issues.apache.org/jira/browse/YARN-2868
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: metrics, supportability
> Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
> YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
> YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, 
> YARN-2868.009.patch
>
>
> Add a metric to measure the latency between "starting container allocation" 
> and "first container actually allocated".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.

2015-03-04 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3242:

Attachment: (was: YARN-3242.004.patch)

> Old ZK client session watcher event causes ZKRMStateStore out of sync with 
> current ZK client session due to ZooKeeper asynchronously closing client 
> session.
> 
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> try {
> RequestHeader h = new RequestHeader();
> h.setType(ZooDefs.OpCode.closeSession);
> submitRequest(h, null, null, null);
> } catch (InterruptedException e) {
> // ignore, close the send/event threads
> } finally {
> disconnect();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347295#comment-14347295
 ] 

Hudson commented on YARN-3131:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7255 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7255/])
YARN-3131. YarnClientImpl should check FAILED and KILLED state in 
submitApplication. Contributed by Chang Li (jlowe: rev 
03cc22945e5d4e953c06a313b8158389554a6aa7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java


> YarnClientImpl should check FAILED and KILLED state in submitApplication
> 
>
> Key: YARN-3131
> URL: https://issues.apache.org/jira/browse/YARN-3131
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Fix For: 2.7.0
>
> Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, 
> yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, 
> yarn_3131_v6.patch, yarn_3131_v7.patch
>
>
> Just run into a issue when submit a job into a non-existent queue and 
> YarnClient raise no exception. Though that job indeed get submitted 
> successfully and just failed immediately after, it will be better if 
> YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.

2015-03-04 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3242:

Attachment: YARN-3242.004.patch

> Old ZK client session watcher event causes ZKRMStateStore out of sync with 
> current ZK client session due to ZooKeeper asynchronously closing client 
> session.
> 
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> try {
> RequestHeader h = new RequestHeader();
> h.setType(ZooDefs.OpCode.closeSession);
> submitRequest(h, null, null, null);
> } catch (InterruptedException e) {
> // ignore, close the send/event threads
> } finally {
> disconnect();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period

2015-03-04 Thread Varun Vasudev (JIRA)
Varun Vasudev created YARN-3294:
---

 Summary: Allow dumping of Capacity Scheduler debug logs via web UI 
for a fixed time period
 Key: YARN-3294
 URL: https://issues.apache.org/jira/browse/YARN-3294
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev


It would be nice to have a button on the web UI that would allow dumping of 
debug logs for just the capacity scheduler for a fixed period of time(1 min, 5 
min or so) in a separate log file. It would be useful when debugging scheduler 
behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3293) Track and display capacity scheduler health metrics in web UI

2015-03-04 Thread Varun Vasudev (JIRA)
Varun Vasudev created YARN-3293:
---

 Summary: Track and display capacity scheduler health metrics in 
web UI
 Key: YARN-3293
 URL: https://issues.apache.org/jira/browse/YARN-3293
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev


It would be good to display metrics that let users know about the health of the 
capacity scheduler in the web UI. Today it is hard to get an idea if the 
capacity scheduler is functioning correctly. Metrics such as the time for the 
last allocation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-03-04 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-3267:
---
Attachment: YARN_3267_V1.patch

> Timelineserver applies the ACL rules after applying the limit on the number 
> of records
> --
>
> Key: YARN-3267
> URL: https://issues.apache.org/jira/browse/YARN-3267
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Prakash Ramachandran
>Assignee: Chang Li
> Attachments: YARN_3267_V1.patch, YARN_3267_WIP.patch, 
> YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch
>
>
> While fetching the entities from timelineserver, the limit is applied on the 
> entities to be fetched from leveldb, the ACL filters are applied after this 
> (TimelineDataManager.java::getEntities). 
> this could mean that even if there are entities available which match the 
> query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-03-04 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347320#comment-14347320
 ] 

Chang Li commented on YARN-3267:


Have implemented unit test for this patch

> Timelineserver applies the ACL rules after applying the limit on the number 
> of records
> --
>
> Key: YARN-3267
> URL: https://issues.apache.org/jira/browse/YARN-3267
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Prakash Ramachandran
>Assignee: Chang Li
> Attachments: YARN_3267_V1.patch, YARN_3267_WIP.patch, 
> YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch
>
>
> While fetching the entities from timelineserver, the limit is applied on the 
> entities to be fetched from leveldb, the ACL filters are applied after this 
> (TimelineDataManager.java::getEntities). 
> this could mean that even if there are entities available which match the 
> query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period

2015-03-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347348#comment-14347348
 ] 

Jason Lowe commented on YARN-3294:
--

Do we really need a dedicated button for a specific system/scheduler when 
there's already the logLevel applet that let's us control log levels of 
arbitrary loggers in the process?

> Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
> period
> -
>
> Key: YARN-3294
> URL: https://issues.apache.org/jira/browse/YARN-3294
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>
> It would be nice to have a button on the web UI that would allow dumping of 
> debug logs for just the capacity scheduler for a fixed period of time(1 min, 
> 5 min or so) in a separate log file. It would be useful when debugging 
> scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-03-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3031.
---
Resolution: Duplicate

Since the patch there covers the code of the writer interface. Let's resolve 
this one as the duplicate of YARN-3264.

> [Storage abstraction] Create backing storage write interface for ATS writers
> 
>
> Key: YARN-3031
> URL: https://issues.apache.org/jira/browse/YARN-3031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
> Attachments: Sequence_diagram_write_interaction.2.png, 
> Sequence_diagram_write_interaction.png, YARN-3031.01.patch, 
> YARN-3031.02.patch, YARN-3031.03.patch
>
>
> Per design in YARN-2928, come up with the interface for the ATS writer to 
> write to various backing storages. The interface should be created to capture 
> the right level of abstractions so that it will enable all backing storage 
> implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation

2015-03-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3264:
--
Summary: [Storage implementation] Create backing storage write interface 
and  a POC only file based storage implementation  (was: [Storage 
implementation] Create a POC only file based storage implementation for ATS 
writes)

> [Storage implementation] Create backing storage write interface and  a POC 
> only file based storage implementation
> -
>
> Key: YARN-3264
> URL: https://issues.apache.org/jira/browse/YARN-3264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3264.001.patch, YARN-3264.002.patch, 
> YARN-3264.003.patch
>
>
> For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation

2015-03-04 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347387#comment-14347387
 ] 

Li Lu commented on YARN-3264:
-

Hi [~vrushalic], thanks for the patch! In general it looks good to me. I have a 
few quick questions about it:

# In the following lines:
{code}
+String tmpRoot = 
FileSystemTimelineServiceWriterImpl.TIMELINE_SERVICE_STORAGE_DIR_ROOT;
+if (tmpRoot == null || tmpRoot.isEmpty()) {
+  tmpRoot = "/tmp/";
+}
{code}
TIMELINE_SERVICE_STORAGE_DIR_ROOT is defined as final in 
FileSystemTimelineServiceWriterImpl (with a not-null initial value), why are we 
still checking if it's null here? (Am I missing anything? )

# Why we're removing the abstract keyword for the TimelineAggregator class? I 
thought this class was supposed to be abstract? 
{code}
-public abstract class TimelineAggregator extends CompositeService {
+public class TimelineAggregator extends CompositeService {
{code}

> [Storage implementation] Create backing storage write interface and  a POC 
> only file based storage implementation
> -
>
> Key: YARN-3264
> URL: https://issues.apache.org/jira/browse/YARN-3264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3264.001.patch, YARN-3264.002.patch, 
> YARN-3264.003.patch
>
>
> For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

2015-03-04 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347384#comment-14347384
 ] 

Junping Du commented on YARN-3039:
--

Thanks [~Naganarasimha] for comments!
bq. Even if we are not able to have "AMRMClient can be wrapped into 
TimelineClient" i feel other suggestion from vinod was right
to add a blocking call in AMRMClient to get aggregator address directly from 
RM. instead of observer pattern @ the AM side. thoughts?
I am open for this way. However, more to treat this as an optimization (don't 
have to wait heartbeat interval). Within this JIRA scope, I think we should 
have heartbeat in ApplicationMasterService as basic mechanism because some 
applications (like MR) doesn't use AMRMClient for now. We can have separated 
JIRA to address this optimization if necessary. BTW, what's your concern for 
observer (listener) pattern in AM?

bq. Yes i have gone through 3033, but earlier was trying to mention as our 
current approach was with NM AUX service. But anyway what i wanted was some 
kind of protocol between appAggregators with either NM or RM. Protocol between 
NM and appAgregator should suffice all other ways to launch AppAgregators.
Yes. Agree that not too much differences for aggregator talk to NM or RM. Just 
as demo patch shows, I would prefer slightly for NM because it seems RM already 
host too many RPC services today.

bq. Well if there is no logic/requirement to aggregate/consolidate all messages 
(events and metrics) for an App, then in my opinion it better not to have 
additional instances of aggregators and we can keep it similar to old Timeline 
service.
I am not sure on this but assume this is one part of motivation that we need 
new TimelineService (not only for performance reasons)? 

bq. Thanks it will be more clear to implement if we have the proposals 
documented.
No problem. I will upload a new one when figuring out the demo patch which 
force me to address more details.

> [Aggregator wireup] Implement ATS app-appgregator service discovery
> ---
>
> Key: YARN-3039
> URL: https://issues.apache.org/jira/browse/YARN-3039
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Junping Du
> Attachments: Service Binding for applicationaggregator of ATS 
> (draft).pdf, YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch
>
>
> Per design in YARN-2928, implement ATS writer service discovery. This is 
> essential for off-node clients to send writes to the right ATS writer. This 
> should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347420#comment-14347420
 ] 

Hadoop QA commented on YARN-2854:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702513/YARN-2854.20150304.1.patch
  against trunk revision ed70fa1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6843//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6843//console

This message is automatically generated.

> The document about timeline service and generic service needs to be updated
> ---
>
> Key: YARN-2854
> URL: https://issues.apache.org/jira/browse/YARN-2854
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, 
> YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, timeline_structure.jpg
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.

2015-03-04 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3242:

Attachment: YARN-3242.004.patch

> Old ZK client session watcher event causes ZKRMStateStore out of sync with 
> current ZK client session due to ZooKeeper asynchronously closing client 
> session.
> 
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> try {
> RequestHeader h = new RequestHeader();
> h.setType(ZooDefs.OpCode.closeSession);
> submitRequest(h, null, null, null);
> } catch (InterruptedException e) {
> // ignore, close the send/event threads
> } finally {
> disconnect();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.

2015-03-04 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3242:

Attachment: (was: YARN-3242.004.patch)

> Old ZK client session watcher event causes ZKRMStateStore out of sync with 
> current ZK client session due to ZooKeeper asynchronously closing client 
> session.
> 
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> try {
> RequestHeader h = new RequestHeader();
> h.setType(ZooDefs.OpCode.closeSession);
> submitRequest(h, null, null, null);
> } catch (InterruptedException e) {
> // ignore, close the send/event threads
> } finally {
> disconnect();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2015-03-04 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347431#comment-14347431
 ] 

Remus Rusanu commented on YARN-2190:


>From my experience patches containing .sln or .vcxproj changes need to have 
>Windows style CRLF terminators *for the lines in the .sln/.vcxproj thunks*. 
>The rest of the patch should be normal Unix style terminators. If this is not 
>true then the patch will apply fine on Windows, but fail on Linux/Mac. 

> Provide a Windows container executor that can limit memory and CPU
> --
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.10.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, 
> YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, 
> YARN-2190.9.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to create a Windows container executor that sets 
> the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-03-04 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347457#comment-14347457
 ] 

Varun Saxena commented on YARN-2928:


What is meant by manual reader ?

> Application Timeline Server (ATS) next gen: phase 1
> ---
>
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-03-04 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2928:
--
Attachment: Timeline Service Next Gen - Planning - ppt.pptx

I made up some notes (attached) on how we collectively work on this - to help 
surface some clarity of project execution for everyone involved. Divided the 
effort into phases. Feedback welcome. I'll keep this updated as things progress.

> Application Timeline Server (ATS) next gen: phase 1
> ---
>
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-03-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347464#comment-14347464
 ] 

Vinod Kumar Vavilapalli commented on YARN-2786:
---

Looks good. Checking this in.

> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
> YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
> YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, 
> YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, 
> YARN-2786-20150303-1-trunk.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation

2015-03-04 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347460#comment-14347460
 ] 

Varun Saxena commented on YARN-3264:


We can probably use try with resources construct as well.

> [Storage implementation] Create backing storage write interface and  a POC 
> only file based storage implementation
> -
>
> Key: YARN-3264
> URL: https://issues.apache.org/jira/browse/YARN-3264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3264.001.patch, YARN-3264.002.patch, 
> YARN-3264.003.patch
>
>
> For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage

2015-03-04 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3122:

Attachment: YARN-3122.006.patch

Fixed using constant CpuTimeTracker.UNAVAILABLE instead of hard coded -1

> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, 
> YARN-3122.006.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-03-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347481#comment-14347481
 ] 

Vinod Kumar Vavilapalli commented on YARN-2786:
---

Actually, this won't apply to branch-2. Can you upload the branch-2 patch too? 
Tx.

> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
> YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
> YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, 
> YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, 
> YARN-2786-20150303-1-trunk.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-03-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347437#comment-14347437
 ] 

Vinod Kumar Vavilapalli commented on YARN-2928:
---

bq. Are there any plans to include intermediate routing/forwarding systems for 
ATS v2?
We have a storage/forwarder interface that can definitely be plugged into to do 
something like this.

> Application Timeline Server (ATS) next gen: phase 1
> ---
>
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation

2015-03-04 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3264:
-
Attachment: YARN-3264.004.patch

updating as per [~gtCarrera9] 's suggestions

> [Storage implementation] Create backing storage write interface and  a POC 
> only file based storage implementation
> -
>
> Key: YARN-3264
> URL: https://issues.apache.org/jira/browse/YARN-3264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3264.001.patch, YARN-3264.002.patch, 
> YARN-3264.003.patch, YARN-3264.004.patch
>
>
> For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347490#comment-14347490
 ] 

Hadoop QA commented on YARN-3264:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702599/YARN-3264.004.patch
  against trunk revision ed70fa1.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6846//console

This message is automatically generated.

> [Storage implementation] Create backing storage write interface and  a POC 
> only file based storage implementation
> -
>
> Key: YARN-3264
> URL: https://issues.apache.org/jira/browse/YARN-3264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3264.001.patch, YARN-3264.002.patch, 
> YARN-3264.003.patch, YARN-3264.004.patch
>
>
> For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3289) Docker images should be downloaded during localization

2015-03-04 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347491#comment-14347491
 ] 

Chen He commented on YARN-3289:
---

Thank [~jlowe] for the comments. IMHO, we can move the docker image 
localization into a preparation task. 
If we are using DCE for running applications. For example, we have 10 task in a 
job, we create extra 1 "tasks" for each real task.
I mean, start a extra dummy task that can heartbeat and do the image 
downloading work. Once it is done, the real task can start to run. 

The benefit is that we can control the placement of those dummy tasks and 
achieve "data locality" for docker image localization. 
For example:
   we have node1 which has already downloaded the docker image and AM starts to 
run on it. If possible, RM scheduler should put other dummy and real task on 
this node since node1 has already has the image. Comparing with job input data 
(a block? maybe), the docker image "locality" (more than 10 min to download a 
image, it will be more than 2GB) may be more important. 

> Docker images should be downloaded during localization
> --
>
> Key: YARN-3289
> URL: https://issues.apache.org/jira/browse/YARN-3289
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ravi Prakash
>
> We currently call docker run on images while launching containers. If the 
> image size if sufficiently big, the task will timeout. We should download the 
> image we want to run during localization (if possible) to prevent this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3289) Docker images should be downloaded during localization

2015-03-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347524#comment-14347524
 ] 

Jason Lowe commented on YARN-3289:
--

Regarding a separate prepping task, localization already is a separate 
preparation task for non-public resources.  See ContainerLocalizer.  I don't 
think docker image download and localization as is done today is fundamentally 
different at a high level -- in both cases we are prepping the node to be able 
to run the container.  No need to complicate the process with a specialized 
extra step just for docker.  What we're missing here is progress reporting 
during localization so AMs can properly monitor progress of container launch 
requests before their code starts running, and that's useful for non-docker 
localization scenarios as well.

Adjusting locality based on the cost of localization is an interesting idea, 
and applies to the non-docker case as well.  However the docker case can be a 
bit tricky.  One node may take tens of minutes to localize a docker image, but 
another node might only take a few seconds.  Docker images are often derived 
from other images, and docker only downloads the deltas.  So it will be 
difficult for YARN that is not aware of the docker contents of a node or image 
deltas to predict how long any node will take to localize a given docker image.

> Docker images should be downloaded during localization
> --
>
> Key: YARN-3289
> URL: https://issues.apache.org/jira/browse/YARN-3289
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ravi Prakash
>
> We currently call docker run on images while launching containers. If the 
> image size if sufficiently big, the task will timeout. We should download the 
> image we want to run during localization (if possible) to prevent this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347550#comment-14347550
 ] 

Hadoop QA commented on YARN-3122:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702598/YARN-3122.006.patch
  against trunk revision ed70fa1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6845//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6845//console

This message is automatically generated.

> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, 
> YARN-3122.006.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347548#comment-14347548
 ] 

Hadoop QA commented on YARN-3242:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702592/YARN-3242.004.patch
  against trunk revision ed70fa1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 7 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.http.TestHttpServerLifecycle

  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6844//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6844//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6844//console

This message is automatically generated.

> Old ZK client session watcher event causes ZKRMStateStore out of sync with 
> current ZK client session due to ZooKeeper asynchronously closing client 
> session.
> 
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   

[jira] [Commented] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.

2015-03-04 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347561#comment-14347561
 ] 

zhihai xu commented on YARN-3242:
-

Hi Rohith, thanks for the review and verifying the patch.
I restarted the test,  TestHttpServerLifecycle failure is not related to my 
change. it passed in my local latest test.
{code}
---
 T E S T S
---
Running org.apache.hadoop.http.TestHttpServerLifecycle
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.24 sec - in 
org.apache.hadoop.http.TestHttpServerLifecycle
Results :
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0
{code}
Also findbugs warning is not related to my change.
many thanks
zhihai


> Old ZK client session watcher event causes ZKRMStateStore out of sync with 
> current ZK client session due to ZooKeeper asynchronously closing client 
> session.
> 
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> try {
> RequestHeader h = new RequestHeader();
> h.setType(ZooDefs.OpCode.closeSession);
> submitRequest(h, 

[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347576#comment-14347576
 ] 

Hadoop QA commented on YARN-3267:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702573/YARN_3267_V1.patch
  against trunk revision 03cc229.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice:

  
org.apache.hadoop.yarn.server.timeline.TestLeveldbTimelineStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6842//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6842//console

This message is automatically generated.

> Timelineserver applies the ACL rules after applying the limit on the number 
> of records
> --
>
> Key: YARN-3267
> URL: https://issues.apache.org/jira/browse/YARN-3267
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Prakash Ramachandran
>Assignee: Chang Li
> Attachments: YARN_3267_V1.patch, YARN_3267_WIP.patch, 
> YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch
>
>
> While fetching the entities from timelineserver, the limit is applied on the 
> entities to be fetched from leveldb, the ACL filters are applied after this 
> (TimelineDataManager.java::getEntities). 
> this could mean that even if there are entities available which match the 
> query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3289) Docker images should be downloaded during localization

2015-03-04 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347583#comment-14347583
 ] 

Chen He commented on YARN-3289:
---

Thank you for the quick feedback, [~jlowe].

{quote} What we're missing here is progress reporting during localization so 
AMs can properly monitor progress of container launch requests before their 
code starts running, and that's useful for non-docker localization scenarios as 
well.{quote}

I agree. That will be great. The idea that I proposed is based on the condition 
that we do not chance localization part.

{quote} One node may take tens of minutes to localize a docker image, but 
another node might only take a few seconds. Docker images are often derived 
from other images, and docker only downloads the deltas. So it will be 
difficult for YARN that is not aware of the docker contents of a node or image 
deltas to predict how long any node will take to localize a given docker image. 
So it will be difficult for YARN that is not aware of the docker contents of a 
node or image deltas to predict how long any node will take to localize a given 
docker image.{quote}

That is true. Docker image localization is a little bit different from other 
APP localization process (from HDFS to localFS). They all pull from docker 
registry. The network bandwidth from docker registry to each NM could be a 
bottleneck no matter whether the docker image deltas is large or small (we may 
need higher bandwidth, let's say 30G infi-band. But for a larger Hadoop 
cluster, more than 10 thousand task running, it may still be a problem). This 
is another reason that we need to consider docker image locality. 


> Docker images should be downloaded during localization
> --
>
> Key: YARN-3289
> URL: https://issues.apache.org/jira/browse/YARN-3289
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ravi Prakash
>
> We currently call docker run on images while launching containers. If the 
> image size if sufficiently big, the task will timeout. We should download the 
> image we want to run during localization (if possible) to prevent this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-03-04 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2786:
-
Attachment: YARN-2786-20150304-1-branch2.patch
YARN-2786-20150304-1-trunk.patch

Uploaded latest patch for trunk/branch-2, verified all work properly in local 
standalone cluster.

> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
> YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
> YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, 
> YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, 
> YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, 
> YARN-2786-20150304-1-trunk.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347598#comment-14347598
 ] 

Hadoop QA commented on YARN-2786:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702610/YARN-2786-20150304-1-branch2.patch
  against trunk revision ed70fa1.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6847//console

This message is automatically generated.

> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
> YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
> YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, 
> YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, 
> YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, 
> YARN-2786-20150304-1-trunk.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-03-04 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2786:
-
Attachment: YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch

Jenkins failed is because it wanted to apply branch-2 patch against trunk, 
upload trunk patch again just to rekick Jenkins.

> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
> YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
> YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, 
> YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, 
> YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, 
> YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch, 
> YARN-2786-20150304-1-trunk.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-03-04 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-3267:
---
Attachment: YARN_3267_V2.patch

> Timelineserver applies the ACL rules after applying the limit on the number 
> of records
> --
>
> Key: YARN-3267
> URL: https://issues.apache.org/jira/browse/YARN-3267
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Prakash Ramachandran
>Assignee: Chang Li
> Attachments: YARN_3267_V1.patch, YARN_3267_V2.patch, 
> YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, 
> YARN_3267_WIP3.patch
>
>
> While fetching the entities from timelineserver, the limit is applied on the 
> entities to be fetched from leveldb, the ACL filters are applied after this 
> (TimelineDataManager.java::getEntities). 
> this could mean that even if there are entities available which match the 
> query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3289) Docker images should be downloaded during localization

2015-03-04 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347619#comment-14347619
 ] 

Chen He commented on YARN-3289:
---

Or, maybe, we add some module on NM that can automatically pull deltas from 
registry. User can configure the frequency and schedule.

> Docker images should be downloaded during localization
> --
>
> Key: YARN-3289
> URL: https://issues.apache.org/jira/browse/YARN-3289
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ravi Prakash
>
> We currently call docker run on images while launching containers. If the 
> image size if sufficiently big, the task will timeout. We should download the 
> image we want to run during localization (if possible) to prevent this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation

2015-03-04 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347626#comment-14347626
 ] 

Li Lu commented on YARN-3264:
-

Hi [~vrushalic], thanks for the update. Unfortunately the 
TestTimelineAggregator part failed to compile on my machine, due to the 
abstract TimelineAggregator class. Here, to test the basic features of 
TimelineAggregators, maybe we'd like to set up a SimpleTimelineAggregator class 
that only inherits TimelineAggregator but performs nothing else, and use it in 
TestTimelineAggregator?

Also, I briefly skimmed through the patch and there are some unused imports. 
Maybe we would like to do a final cleanup before it's committed? (It's quite 
simple with an IDE, so let's put that to the final round. ) Thanks! 

> [Storage implementation] Create backing storage write interface and  a POC 
> only file based storage implementation
> -
>
> Key: YARN-3264
> URL: https://issues.apache.org/jira/browse/YARN-3264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3264.001.patch, YARN-3264.002.patch, 
> YARN-3264.003.patch, YARN-3264.004.patch
>
>
> For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage

2015-03-04 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347633#comment-14347633
 ] 

Karthik Kambatla commented on YARN-3122:


One last nit: javadoc for ResourceCalaculatorProcessTree#getCpuUsagePercent 
still says "return 0 if it cannot be calculated". 

> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, 
> YARN-3122.006.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation

2015-03-04 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347641#comment-14347641
 ] 

Vrushali C commented on YARN-3264:
--

[~gtCarrera] thanks! Will fix that test and remove the unused imports in the 
next update.

> [Storage implementation] Create backing storage write interface and  a POC 
> only file based storage implementation
> -
>
> Key: YARN-3264
> URL: https://issues.apache.org/jira/browse/YARN-3264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3264.001.patch, YARN-3264.002.patch, 
> YARN-3264.003.patch, YARN-3264.004.patch
>
>
> For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3275) CapacityScheduler: Preemption happening on non-preemptable queues

2015-03-04 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-3275:
-
Attachment: YARN-3275.v2.txt

[~jlowe] and [~leftnoteasy], thank you for the reviews.

Attached is an updated patch (v2) with your suggested changes.

> CapacityScheduler: Preemption happening on non-preemptable queues
> -
>
> Key: YARN-3275
> URL: https://issues.apache.org/jira/browse/YARN-3275
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>  Labels: capacity-scheduler
> Attachments: YARN-3275.v1.txt, YARN-3275.v2.txt
>
>
> YARN-2056 introduced the ability to turn preemption on and off at the queue 
> level. In cases where a queue goes over its absolute max capacity (YARN-3243, 
> for example), containers can be preempted from that queue, even though the 
> queue is marked as non-preemptable.
> We are using this feature in large, busy clusters and seeing this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347672#comment-14347672
 ] 

Hadoop QA commented on YARN-2786:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702617/YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch
  against trunk revision ed70fa1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6848//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6848//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6848//console

This message is automatically generated.

> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
> YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
> YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, 
> YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, 
> YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, 
> YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch, 
> YARN-2786-20150304-1-trunk.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1310) Get rid of MR settings in YARN configuration

2015-03-04 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347684#comment-14347684
 ] 

Brahma Reddy Battula commented on YARN-1310:


[~kasha] ,[~djp] and [~hitesh] can I go head like I mentioned..? can you please 
give your inputs..? Thanks!!

> Get rid of MR settings in YARN configuration
> 
>
> Key: YARN-1310
> URL: https://issues.apache.org/jira/browse/YARN-1310
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Junping Du
>Assignee: Brahma Reddy Battula
>
> Per discussion in YARN-1289, we should get rid of MR settings (like below) 
> and default values in YARN configuration which put unnecessary dependency for 
> YARN on MR. 
> {code}
>   
>   
> yarn.nodemanager.aux-services.mapreduce_shuffle.class
> org.apache.hadoop.mapred.ShuffleHandler
>   
>   
> mapreduce.job.jar
> 
>   
>   
> mapreduce.job.hdfs-servers
> ${fs.defaultFS}
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-03-04 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2786:
-
Attachment: YARN-2786-20150304-2-trunk.patch

> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
> YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
> YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, 
> YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, 
> YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, 
> YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch, 
> YARN-2786-20150304-1-trunk.patch, YARN-2786-20150304-2-trunk.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3292) [Umbrella] Tests/documentation and/or tools for YARN rolling upgrades backwards/forward compatibility verification

2015-03-04 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347710#comment-14347710
 ] 

Li Lu commented on YARN-3292:
-

So far as we can see, YARN requires the following components to be compatible 
in a rolling upgrade (please feel free to add more in the discussion):

- Protocols: both public protocols and private wiring protocols
- RM/NM/ATS state stores: RM/NM/ATS data version numbers and the data 
store/read schema for each state store.
- APIs 
- Security tokens
- Configurations

We may want to provide a suite of tools and/or unit tests that can verify if an 
incoming YARN patch will break the compatibility to the previous version. In 
the very first stage, we may want to finish the following tasks:

# Implement a protobuf compatibility checker to check if a patch breaks the 
compatibility with existing client and internal protocols
# Extend the protobuf compatibility checker in step 1 to check RM state store
# Look into the possibility to further extend the protobuf checker to 
NM/ATS(v1) state store (I’m not very sure now, we can merge this with step 2 if 
a simple extension is possible). 
# Implement a diff-based java API compatibility checker
# Wire up the implemented tools to Jenkins test runs
# Finish formal write ups for the YARN rolling upgrade standard

Please feel free to discuss more about our first step goal. Thanks! 

> [Umbrella] Tests/documentation and/or tools for YARN rolling upgrades 
> backwards/forward compatibility verification
> --
>
> Key: YARN-3292
> URL: https://issues.apache.org/jira/browse/YARN-3292
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Li Lu
>Assignee: Li Lu
>  Labels: compatibility, rolling_upgrade, test, tools
>
> YARN-666 added the support to YARN rolling upgrade. In order to support this 
> feature, we made changes from many perspectives. There were many assumptions 
> made together with these existing changes. Future code changes may break 
> these assumptions by accident, and hence break the YARN rolling upgrades 
> feature. 
> To simplify YARN RU regression tests, maybe we would like to create a set of 
> tools/tests that can verify YARN RU backward compatibility. 
> On the very first step, we may want to have a compatibility checker for 
> important protocols and APIs. We may also want to incorporate these tools 
> into our test Jenkins runs, if necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-03-04 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2786:
-
Attachment: YARN-2786-20150304-2-branch2.patch

Attached both trunk/branch-2 patch, fixed findbugs warning. 

> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
> YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
> YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, 
> YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, 
> YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, 
> YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch, 
> YARN-2786-20150304-1-trunk.patch, YARN-2786-20150304-2-branch2.patch, 
> YARN-2786-20150304-2-trunk.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347728#comment-14347728
 ] 

Hadoop QA commented on YARN-2786:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702629/YARN-2786-20150304-2-branch2.patch
  against trunk revision c66c3ac.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6851//console

This message is automatically generated.

> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
> YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
> YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, 
> YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, 
> YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, 
> YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch, 
> YARN-2786-20150304-1-trunk.patch, YARN-2786-20150304-2-branch2.patch, 
> YARN-2786-20150304-2-trunk.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage

2015-03-04 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3122:

Attachment: YARN-3122.007.patch

Fixed javadoc

> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, 
> YARN-3122.006.patch, YARN-3122.007.patch, YARN-3122.prelim.patch, 
> YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3111) Fix ratio problem on FairScheduler page

2015-03-04 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347735#comment-14347735
 ] 

Karthik Kambatla commented on YARN-3111:


When cluster capacity is 0, do we want to show the ratio as 1? Also, instead of 
showing the shares as a single percentage, would it make sense to show it as % 
mem, %cpu? 

[~ashwinshankar77], [~peng.zhang] - thoughts? 

> Fix ratio problem on FairScheduler page
> ---
>
> Key: YARN-3111
> URL: https://issues.apache.org/jira/browse/YARN-3111
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3111.1.patch, YARN-3111.png
>
>
> Found 3 problems on FairScheduler page:
> 1. Only compute memory for ratio even when queue schedulingPolicy is DRF.
> 2. When min resources is configured larger than real resources, the steady 
> fair share ratio is so long that it is out the page.
> 3. When cluster resources is 0(no nodemanager start), ratio is displayed as 
> "NaN% used"
> Attached image shows the snapshot of above problems. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage

2015-03-04 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347739#comment-14347739
 ] 

Karthik Kambatla commented on YARN-3122:


+1, pending Jenkins. 

> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, 
> YARN-3122.006.patch, YARN-3122.007.patch, YARN-3122.prelim.patch, 
> YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2015-03-04 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash reassigned YARN-1964:
--

Assignee: Ravi Prakash  (was: Abin Shahab)

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Ravi Prakash
> Fix For: 2.6.0
>
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2015-03-04 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated YARN-1964:
---
Assignee: Abin Shahab  (was: Ravi Prakash)

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Fix For: 2.6.0
>
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347799#comment-14347799
 ] 

Hadoop QA commented on YARN-2786:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702629/YARN-2786-20150304-2-branch2.patch
  against trunk revision 722b479.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6852//console

This message is automatically generated.

> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
> YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
> YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, 
> YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, 
> YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, 
> YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch, 
> YARN-2786-20150304-1-trunk.patch, YARN-2786-20150304-2-branch2.patch, 
> YARN-2786-20150304-2-trunk.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3275) CapacityScheduler: Preemption happening on non-preemptable queues

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347803#comment-14347803
 ] 

Hadoop QA commented on YARN-3275:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702623/YARN-3275.v2.txt
  against trunk revision ed70fa1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 7 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6850//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6850//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6850//console

This message is automatically generated.

> CapacityScheduler: Preemption happening on non-preemptable queues
> -
>
> Key: YARN-3275
> URL: https://issues.apache.org/jira/browse/YARN-3275
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>  Labels: capacity-scheduler
> Attachments: YARN-3275.v1.txt, YARN-3275.v2.txt
>
>
> YARN-2056 introduced the ability to turn preemption on and off at the queue 
> level. In cases where a queue goes over its absolute max capacity (YARN-3243, 
> for example), containers can be preempted from that queue, even though the 
> queue is marked as non-preemptable.
> We are using this feature in large, busy clusters and seeing this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347824#comment-14347824
 ] 

Hadoop QA commented on YARN-3267:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702618/YARN_3267_V2.patch
  against trunk revision ed70fa1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6849//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6849//console

This message is automatically generated.

> Timelineserver applies the ACL rules after applying the limit on the number 
> of records
> --
>
> Key: YARN-3267
> URL: https://issues.apache.org/jira/browse/YARN-3267
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Prakash Ramachandran
>Assignee: Chang Li
> Attachments: YARN_3267_V1.patch, YARN_3267_V2.patch, 
> YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, 
> YARN_3267_WIP3.patch
>
>
> While fetching the entities from timelineserver, the limit is applied on the 
> entities to be fetched from leveldb, the ACL filters are applied after this 
> (TimelineDataManager.java::getEntities). 
> this could mean that even if there are entities available which match the 
> query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2436) yarn application help doesn't work

2015-03-04 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2436:
---
Release Note:   (was: test)

> yarn application help doesn't work
> --
>
> Key: YARN-2436
> URL: https://issues.apache.org/jira/browse/YARN-2436
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scripts
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: YARN-2436.patch
>
>
> The previous version of the yarn command plays games with the command stack 
> for some commands.  The new code needs duplicate this wackiness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-03-04 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347858#comment-14347858
 ] 

Vrushali C commented on YARN-3134:
--


There is a draft on some flow (and user and queue) based queries to be 
supported put up on jira YARN-3050 that could help us with the schema design. 
  
https://issues.apache.org/jira/secure/attachment/12695071/Flow%20based%20queries.docx

Sharing the schema of some of the hbase tables in hRaven:  (detailed schema at 
https://github.com/twitter/hraven/blob/master/bin/create_schema.rb)

{code}
create 'job_history', {NAME => 'i', COMPRESSION => 'LZO'}
create 'job_history_task', {NAME => 'i', COMPRESSION => 'LZO'}
# job_history (indexed) by jobId table contains 1 column family:
# i: job-level information specifically the rowkey into the
create 'job_history-by_jobId', {NAME => 'i', COMPRESSION => 'LZO'}
# job_history_app_version - stores all version numbers seen for a single app ID
# i: "info" -- version information
create 'job_history_app_version', {NAME => 'i', COMPRESSION => 'LZO'}
# job_history_agg_daily - stores daily aggregated job info
# the s column family has a TTL of 30 days, it's used as a scratch col family
# it stores the run ids that are seen for that day
# we assume that a flow will not run for more than 30 days, hence it's fine to 
"expire" that data
create 'job_history_agg_daily', {NAME => 'i', COMPRESSION => 'LZO', BLOOMFILTER 
=> 'ROWCOL'},
{NAME => 's', VERSIONS => 1, COMPRESSION => 'LZO', BLOCKCACHE => false, TTL => 
'2592000'}
# job_history_agg_weekly - stores weekly aggregated job info
# the s column family has a TTL of 30 days
# it stores the run ids that are seen for that week
# we assume that a flow will not run for more than 30 days, hence it's fine to 
"expire" that data
create 'job_history_agg_weekly', {NAME => 'i', COMPRESSION => 'LZO', 
BLOOMFILTER => 'ROWCOL'},
{NAME => 's', VERSIONS => 1, COMPRESSION => 'LZO', BLOCKCACHE => false, TTL => 
'2592000'}

{code}

job_history is the main table. 
It's row key:  cluster!user!application!timestamp!jobID 
cluster, user, application are stored as Strings. timestamp and jobID are 
stored as longs. 
cluster - unique cluster name (ie. “cluster1@dc1”) 
user - user running the application (“edgar”) 
application - application ID (aka flow name) derived from job configuration: 
uses “batch.desc” property if set otherwise parses a consistent ID from 
“mapreduce.job.name” 
timestamp - inverted (Long.MAX_VALUE - value) value of submission time. Storing 
the value as an inverted timestamp ensures the latest jobs are stored first for 
that cluster!user!app. This enables faster retrieval of more recent jobs for 
this flow.
jobID - stored as Job Tracker/Resource Manager start time (long), concatenated 
with job sequence number job_201306271100_0001 -> [1372352073732L][1L] 

How the columns are named in hRaven:
- each key in the job history file becomes the column name. For example, for 
finishedMaps, it would be stored as

{code}
column=i:finished_maps,
timestamp= 1425515902000, 
value=\x00\x00\x00\x00\x00\x00\x00\x05
{code}

In the output above, timestamp is the hbase cell timestamp. 

- we store the configuration information with a column name prefix of "c!"
{code}
column=i:c!yarn.sharedcache.manager.client.thread-count, 
timestamp= 1425515902000,
value=50
{code}

- each counter is stored with a prefix of "g!" or "gr!" or "gm!" 
{code}
For reducer counters, there is a prefix of gr! 
 column=i:gr!org.apache.hadoop.mapreduce.TaskCounter!SPILLED_RECORDS, 
timestamp= 1425515902000
value=\x00\x00\x00\x00\x00\x00\x00\x02

For mapper counters, there is a prefix of gm! 
column=i:gm!org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter!BYTES_READ,
timestamp= 1425515902000, 
value=\x00\x00\x00\x00\x00\x00\x00\x02
{code} 


> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---
>
> Key: YARN-3134
> URL: https://issues.apache.org/jira/browse/YARN-3134
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the ord

[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347866#comment-14347866
 ] 

Hadoop QA commented on YARN-3122:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702634/YARN-3122.007.patch
  against trunk revision 722b479.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6853//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6853//console

This message is automatically generated.

> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, 
> YARN-3122.006.patch, YARN-3122.007.patch, YARN-3122.prelim.patch, 
> YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation

2015-03-04 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3264:
-
Attachment: YARN-3264.005.patch


Uploading new patch with review suggestions.
- updated unit test for FileSystemTimelineServiceWriterImpl. 
- updated FileSystemTimelineServiceWriterImpl # serviceInit to initialize the 
local file system output directory
- ensured the directory is read from config
- fixed unused imports 

> [Storage implementation] Create backing storage write interface and  a POC 
> only file based storage implementation
> -
>
> Key: YARN-3264
> URL: https://issues.apache.org/jira/browse/YARN-3264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3264.001.patch, YARN-3264.002.patch, 
> YARN-3264.003.patch, YARN-3264.004.patch, YARN-3264.005.patch
>
>
> For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >