[jira] [Updated] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model

2015-07-08 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3836:

Attachment: YARN-3836-YARN-2928.002.patch

Hi [~sjlee0], thanks for the prompt feedback! I updated the patch according to 
your comments. Specifically:

bq. What I would prefer is to override equals() and hashCode() for Identifier 
instead, and have simple equals() and hashCode() implementations for 
TimelineEntity that mostly delegate to Identifier. The rationale is that 
Identifier can be useful as keys to collections in its own right, and thus 
should override those methods.
That's a nice suggestion! Fixed. 

bq. One related question for your use case of putting entities into a map: I 
notice that you're using the TimelineEntity instances directly as keys to maps. 
Wouldn't it be better to use their Identifier instances as keys instead? 
Identifier instances are easier and cheaper to construct and compare.
I think I used an inappropriate example here. I meant to say HashSet but not 
HashMap.

bq. We should make isValid() a proper javadoc hyperlink
Fixed. 

bq. Since we're checking the entity type and the id, wouldn't it be sufficient 
to check whether the object is an instance of TimelineEntity?
I agree. Fixed all related ones. 


> add equals and hashCode to TimelineEntity and other classes in the data model
> -
>
> Key: YARN-3836
> URL: https://issues.apache.org/jira/browse/YARN-3836
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Attachments: YARN-3836-YARN-2928.001.patch, 
> YARN-3836-YARN-2928.002.patch
>
>
> Classes in the data model API (e.g. {{TimelineEntity}}, 
> {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
> {{hashCode()}}. This can cause problems when these objects are used in a 
> collection such as a {{HashSet}}. We should implement these methods wherever 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-07-08 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619997#comment-14619997
 ] 

Tsuyoshi Ozawa commented on YARN-3798:
--

[~zxu] Sorry for the delay. I missed you comment. Agree. fixing it shortly.

> ZKRMStateStore shouldn't create new session without occurrance of 
> SESSIONEXPIED
> ---
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: RM.log, YARN-3798-2.7.002.patch, 
> YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, 
> YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateSt

[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2015-07-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619994#comment-14619994
 ] 

Hudson commented on YARN-2194:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8138 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8138/])
YARN-2194. Addendum patch to fix failing unit test in 
TestPrivilegedOperationExecutor. Contributed by Sidharta Seethana. (vvasudev: 
rev 63d0365088ff9fca0baaf3c4c3c01f80c72d3281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/TestPrivilegedOperationExecutor.java


> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Sidharta Seethana
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3381) A typographical error in "InvalidStateTransitonException"

2015-07-08 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619992#comment-14619992
 ] 

Brahma Reddy Battula commented on YARN-3381:


[~ajisakaa] thanks a lot for taking a look into this issue..Updated the patch 
based on your comment.Kindly review..

> A typographical error in "InvalidStateTransitonException"
> -
>
> Key: YARN-3381
> URL: https://issues.apache.org/jira/browse/YARN-3381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.6.0
>Reporter: Xiaoshuang LU
>Assignee: Brahma Reddy Battula
>  Labels: BB2015-05-TBR
> Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
> YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
> YARN-3381.patch
>
>
> Appears that "InvalidStateTransitonException" should be 
> "InvalidStateTransitionException".  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3381) A typographical error in "InvalidStateTransitonException"

2015-07-08 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3381:
---
Attachment: YARN-3381-005.patch

> A typographical error in "InvalidStateTransitonException"
> -
>
> Key: YARN-3381
> URL: https://issues.apache.org/jira/browse/YARN-3381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.6.0
>Reporter: Xiaoshuang LU
>Assignee: Brahma Reddy Battula
>  Labels: BB2015-05-TBR
> Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
> YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
> YARN-3381.patch
>
>
> Appears that "InvalidStateTransitonException" should be 
> "InvalidStateTransitionException".  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level

2015-07-08 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619989#comment-14619989
 ] 

Ajith S commented on YARN-3885:
---

  /root
A
/\
C B
  /\
  D E


+*Before fix:*+
NAME: queueA CUR:  PEN:  GAR: 
 NORM: NaN IDEAL_ASSIGNED:  
IDEAL_PREEMPT:  ACTUAL_PREEMPT:  
*{color:red}UNTOUCHABLE:  PREEMPTABLE: {color}*
NAME: queueB CUR:  PEN:  GAR: 
 NORM: NaN IDEAL_ASSIGNED:  
IDEAL_PREEMPT:  ACTUAL_PREEMPT:  
*UNTOUCHABLE:  PREEMPTABLE: *
NAME: queueC CUR:  PEN:  GAR: 
 NORM: 1.0 IDEAL_ASSIGNED:  
IDEAL_PREEMPT:  ACTUAL_PREEMPT:  
*UNTOUCHABLE:  PREEMPTABLE: *
NAME: queueD CUR:  PEN:  GAR: 
 NORM: NaN IDEAL_ASSIGNED:  
IDEAL_PREEMPT:  ACTUAL_PREEMPT:  
*UNTOUCHABLE:  PREEMPTABLE: *
NAME: queueE CUR:  PEN:  GAR: 
 NORM: 1.0 IDEAL_ASSIGNED:  
IDEAL_PREEMPT:  ACTUAL_PREEMPT:  
*UNTOUCHABLE:  PREEMPTABLE: *

+*After:*+
NAME: queueA CUR:  PEN:  GAR: 
 NORM: 1.0 IDEAL_ASSIGNED:  
IDEAL_PREEMPT:  ACTUAL_PREEMPT:  
*{color:green}UNTOUCHABLE:  PREEMPTABLE: {color}*
NAME: queueB CUR:  PEN:  GAR: 
 NORM: NaN IDEAL_ASSIGNED:  
IDEAL_PREEMPT:  ACTUAL_PREEMPT:  
*UNTOUCHABLE:  PREEMPTABLE: *
NAME: queueC CUR:  PEN:  GAR: 
 NORM: 1.0 IDEAL_ASSIGNED:  
IDEAL_PREEMPT:  ACTUAL_PREEMPT:  
*UNTOUCHABLE:  PREEMPTABLE: *
NAME: queueD CUR:  PEN:  GAR: 
 NORM: NaN IDEAL_ASSIGNED:  
IDEAL_PREEMPT:  ACTUAL_PREEMPT:  
*UNTOUCHABLE:  PREEMPTABLE: *
NAME: queueE CUR:  PEN:  GAR: 
 NORM: 1.0 IDEAL_ASSIGNED:  
IDEAL_PREEMPT:  ACTUAL_PREEMPT:  
*UNTOUCHABLE:  PREEMPTABLE: *

> ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
> level
> --
>
> Key: YARN-3885
> URL: https://issues.apache.org/jira/browse/YARN-3885
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: Ajith S
>Priority: Critical
> Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
> YARN-3885.04.patch, YARN-3885.patch
>
>
> when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
> this piece of code, to calculate {{untoucable}} doesnt consider al the 
> children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2015-07-08 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619988#comment-14619988
 ] 

Varun Vasudev commented on YARN-2194:
-

My apologies for missing the failing unit test [~sidharta-s]. I've committed 
the fix for the failing unit test.

> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Sidharta Seethana
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.

2015-07-08 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619984#comment-14619984
 ] 

nijel commented on YARN-3813:
-

Thanks [~sunilg] and [~devaraj.k] for the comments

bq.How frequently are you going to check this condition for each application?
Plan is to have a configurable interval default to 30 sec 
(yarn.app.timeout.monitor.interval)

bq.Could we have a new TIMEOUT event in RMAppImpl for this. In that case, we 
may not need a flag.
bq.I feel having a TIMEOUT state for RMAppImpl would be proper here. 

ok. We will add a TIMEOUT state and handle the changes
Due to this there will be few changes in app transitions, client package and 
the WEBUI

bq.I have a suggestion here.We can have a BasicAppMonitoringManager which can 
keep an entry of .
bq. when the application gets submitted to RM then we can register the 
application with RMAppTimeOutMonitor using the user specified timeout.

Yes. Good suggestion. This we will update as a registration mechanism. But 
since each application can have its own timeout period, the code reusability 
looks like minimal.

{code}
RMAppTimeOutMonitor 
local map (appid, timeout)
add/register(appid, timeout)  --> from RMAppImpl
Run -> if app is running/submitted and elapsed the time, kill it. If 
already completed, remove from map.
No delete/unregister method  --> this application will be be removed 
from map from run method
{code}

> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
> Attachments: YARN Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level

2015-07-08 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-3885:
--
Attachment: YARN-3885.04.patch

> ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
> level
> --
>
> Key: YARN-3885
> URL: https://issues.apache.org/jira/browse/YARN-3885
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: Ajith S
>Priority: Critical
> Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
> YARN-3885.04.patch, YARN-3885.patch
>
>
> when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
> this piece of code, to calculate {{untoucable}} doesnt consider al the 
> children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level

2015-07-08 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619971#comment-14619971
 ] 

Ajith S commented on YARN-3885:
---

Hi [~sunilg] 
Sorry for the delay, i have added the testcase

> ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
> level
> --
>
> Key: YARN-3885
> URL: https://issues.apache.org/jira/browse/YARN-3885
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: Ajith S
>Priority: Critical
> Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
> YARN-3885.04.patch, YARN-3885.patch
>
>
> when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
> this piece of code, to calculate {{untoucable}} doesnt consider al the 
> children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3903) Disable preemption at Queue level for Fair Scheduler

2015-07-08 Thread He Tianyi (JIRA)
He Tianyi created YARN-3903:
---

 Summary: Disable preemption at Queue level for Fair Scheduler
 Key: YARN-3903
 URL: https://issues.apache.org/jira/browse/YARN-3903
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.3.0
 Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 
(2014-12-08) x86_64
Reporter: He Tianyi
Priority: Trivial


YARN-2056 supports disabling preemption at queue level for CapacityScheduler.
As for fair scheduler, we recently encountered the same need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3902) Fair scheduler preempts ApplicationMaster

2015-07-08 Thread He Tianyi (JIRA)
He Tianyi created YARN-3902:
---

 Summary: Fair scheduler preempts ApplicationMaster
 Key: YARN-3902
 URL: https://issues.apache.org/jira/browse/YARN-3902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.3.0
 Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 
(2014-12-08) x86_64
Reporter: He Tianyi


YARN-2022 have fixed the similar issue related to CapacityScheduler.
However, FairScheduler still suffer, preempting AM while other normal 
containers running out there.

I think we should take the same approach, avoid AM being preempted unless there 
is no container running other than AM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3381) A typographical error in "InvalidStateTransitonException"

2015-07-08 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619951#comment-14619951
 ] 

Akira AJISAKA commented on YARN-3381:
-

Would you modify the old class {{InvalidStateTransitonException}} to extend the 
new class {{InvalidStateTransitionException}}? That way we can simply remove 
the old class as incompatible change after fixing this issue.

> A typographical error in "InvalidStateTransitonException"
> -
>
> Key: YARN-3381
> URL: https://issues.apache.org/jira/browse/YARN-3381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.6.0
>Reporter: Xiaoshuang LU
>Assignee: Brahma Reddy Battula
>  Labels: BB2015-05-TBR
> Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
> YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381.patch
>
>
> Appears that "InvalidStateTransitonException" should be 
> "InvalidStateTransitionException".  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619885#comment-14619885
 ] 

Hadoop QA commented on YARN-3116:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  21m  9s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   7m 54s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  4s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 36s | The applied patch generated  1 
new checkstyle issues (total was 9, now 10). |
| {color:green}+1{color} | whitespace |   0m  3s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m 13s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   6m 15s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  27m 51s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  87m 29s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.security.TestAMRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel
 |
|   | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates |
|   | hadoop.yarn.server.resourcemanager.TestResourceManager |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId |
|   | hadoop.yarn.server.resourcemanager.TestRM |
|   | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterService |
|   | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMReconnect |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerHealth |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | hadoop.yarn.server.resourcemanager.resourcetracker.TestRMNMRPCResponseId |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.TestRMHA |
|   | hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler |
|   | hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA |
|   | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
 |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.logaggregationstatus.TestRMAppLogAggregationStatus
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744380/YARN-3116.v8.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b8832fc |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8467/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8467/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 

[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat

2015-07-08 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619829#comment-14619829
 ] 

Inigo Goiri commented on YARN-1012:
---

Thank you [~kasha]!
Once commited, I'm moving to YARN-3534 to reuse the ResourceUtilization.


> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-1012
> URL: https://issues.apache.org/jira/browse/YARN-1012
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Arun C Murthy
>Assignee: Inigo Goiri
> Attachments: YARN-1012-1.patch, YARN-1012-10.patch, 
> YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, 
> YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, 
> YARN-1012-9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3800) Simplify inmemory state for ReservationAllocation

2015-07-08 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619767#comment-14619767
 ] 

Anubhav Dhoot commented on YARN-3800:
-

Test failure seems unrelated and filed as flaky test in YARN-3342
Checkstyle issue is preexisting (number of parameters > 7)

> Simplify inmemory state for ReservationAllocation
> -
>
> Key: YARN-3800
> URL: https://issues.apache.org/jira/browse/YARN-3800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3800.001.patch, YARN-3800.002.patch, 
> YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch, 
> YARN-3800.005.patch
>
>
> Instead of storing the ReservationRequest we store the Resource for 
> allocations, as thats the only thing we need. Ultimately we convert 
> everything to resources anyway



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3800) Simplify inmemory state for ReservationAllocation

2015-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619761#comment-14619761
 ] 

Hadoop QA commented on YARN-3800:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 20s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | javac |   7m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 48s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 48s | The applied patch generated  1 
new checkstyle issues (total was 55, now 50). |
| {color:green}+1{color} | whitespace |   0m  4s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  51m  5s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 37s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744375/YARN-3800.005.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 2e3d83f |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8466/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8466/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8466/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8466/console |


This message was automatically generated.

> Simplify inmemory state for ReservationAllocation
> -
>
> Key: YARN-3800
> URL: https://issues.apache.org/jira/browse/YARN-3800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3800.001.patch, YARN-3800.002.patch, 
> YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch, 
> YARN-3800.005.patch
>
>
> Instead of storing the ReservationRequest we store the Resource for 
> allocations, as thats the only thing we need. Ultimately we convert 
> everything to resources anyway



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table

2015-07-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619759#comment-14619759
 ] 

Zhijie Shen commented on YARN-3901:
---

[~vrushalic], just want to confirm with you that the jira won't cover app_flow 
table, right?

I need to flow mapping for implementing the reader apis against HBase backend. 
If it's not covered here, I can help to implement it in the scope of YARN-3049.

> Populate flow run data in the flow_run table
> 
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.

2015-07-08 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619755#comment-14619755
 ] 

Devaraj K commented on YARN-3813:
-

Thanks [~nijel] and [~rohithsharma] for the design proposal.

{quote}
New auxillary service : RMAppTimeOutService
Responsibility is to track the running application. Simple logic

//if job is running and the time elapsed kill
if ((RMAppState == SUBMITTED/ACCEPTED/RUNNING) &&
&& (currentTime - app.getSubmitTime()) >= timeout
{quote}

How frequently are you going to check this condition for each application?

Can we have a monitor something like RMAppTimeOutMonitor which extends 
AbstractLivelinessMonitor, when the application gets submitted to RM then we 
can register the application with RMAppTimeOutMonitor using the user specified 
timeout. And when the timeout reaches, RMAppTimeOutMonitor can trigger an event 
to take an action further.

bq. Yes, having a separate TIMEOUT event and TIMEOUT state is good approach and 
other option. Initially we consider to have new state TIMEOUT which require 
very huge changes across all the modules.
I feel having a TIMEOUT state for RMAppImpl  would be proper here. When 
RMAppTimeOutMonitor triggers an event on timeout for an application, RMAppImpl 
can move the state to TIMEOUT state from any of the non-final states and during 
the transition it can handle stopping the running attempt and the containers. I 
don't see here that there will be so many changes required for achieving it.


> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
> Attachments: YARN Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model

2015-07-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619729#comment-14619729
 ] 

Zhijie Shen commented on YARN-3836:
---

bq. I see that we're implementing the Comparable interface for all 3 types. I'm 
wondering if it makes sense for them. What would it mean to order 
TimelineEntity instances? Does it mean much? Where would it be useful? Do we 
need to implement it? The same questions go for the other 2 types...

For example, compareTo of TimelineEntity is used to order the entities in the 
return set of getEntities query. It would be better to return the entities 
ordered by timestamp instead of randomly.

bq. his is an open question. Is the id alone the identity or does the timestamp 
together form the identity? Do we expect users of TimelineEvent always be able 
to provide the timestamp? Honestly I'm not 100% sure what the contract is, and 
we probably want to make it explicit (and add it to the javadoc). Thoughts?

In ATS v1, we actually use id + timestamp to uniquely identify an event. On 
merit of doing this is to let the app to put the same event multiple times. For 
example, a job can request resource many times. Every time it can put a 
RESOURCE_REQUEST event with a unique timestamp and fill in the resource 
information.

> add equals and hashCode to TimelineEntity and other classes in the data model
> -
>
> Key: YARN-3836
> URL: https://issues.apache.org/jira/browse/YARN-3836
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Attachments: YARN-3836-YARN-2928.001.patch
>
>
> Classes in the data model API (e.g. {{TimelineEntity}}, 
> {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
> {{hashCode()}}. This can cause problems when these objects are used in a 
> collection such as a {{HashSet}}. We should implement these methods wherever 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-08 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3116:
--
Attachment: YARN-3116.v8.patch

Fixed TestAppRunnability as well in the new patch.

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, 
> YARN-3116.v7.patch, YARN-3116.v8.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model

2015-07-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619702#comment-14619702
 ] 

Sangjin Lee commented on YARN-3836:
---

Thanks [~gtCarrera9] for your quick patch! I agree mostly with your 2 points 
above.

I also did take a quick look at the patch, and here are my initial comments.

I see that we're implementing the {{Comparable}} interface for all 3 types. I'm 
wondering if it makes sense for them. What would it mean to order 
{{TimelineEntity}} instances? Does it mean much? Where would it be useful? Do 
we need to implement it? The same questions go for the other 2 types...

(TimelineEntity.java)
What I would prefer is to override {{equals()}} and {{hashCode()}} for 
{{Identifier}} instead, and have simple {{equals()}} and {{hashCode()}} 
implementations for {{TimelineEntity}} that mostly delegate to {{Identifier}}. 
The rationale is that {{Identifier}} can be useful as keys to collections in 
its own right, and thus should override those methods.

One related question for your use case of putting entities into a map: I notice 
that you're using the {{TimelineEntity}} instances directly as keys to maps. 
Wouldn't it be better to use their {{Identifier}} instances as keys instead? 
{{Identifier}} instances are easier and cheaper to construct and compare.

We still need {{equals()}} and {{hashCode()}} on {{TimelineEntity}} itself 
because they can be used in sets too.

- l.42: We should make {{isValid()}} a proper javadoc hyperlink
- l.510: Although this is probably going to be true for the most part, this 
check is a little bit stronger than I expected. We're essentially saying the 
actual class types of two objects must match precisely. People might extend 
classes further. Since we're checking the entity type and the id, wouldn't it 
be sufficient to check whether the object is an instance of {{TimelineEntity}}?

(TimelineEvent.java)
This is an open question. Is the id alone the identity or does the timestamp 
together form the identity? Do we expect users of {{TimelineEvent}} always be 
able to provide the timestamp? Honestly I'm not 100% sure what the contract is, 
and we probably want to make it explicit (and add it to the javadoc). Thoughts?

- l.100: same comment on the class as above

(TimelineMetric.java)
- l.144: same comment on the class as above

> add equals and hashCode to TimelineEntity and other classes in the data model
> -
>
> Key: YARN-3836
> URL: https://issues.apache.org/jira/browse/YARN-3836
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Attachments: YARN-3836-YARN-2928.001.patch
>
>
> Classes in the data model API (e.g. {{TimelineEntity}}, 
> {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
> {{hashCode()}}. This can cause problems when these objects are used in a 
> collection such as a {{HashSet}}. We should implement these methods wherever 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2015-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619689#comment-14619689
 ] 

Hadoop QA commented on YARN-2194:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   6m 29s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 53s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 19s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 38s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 20s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 13s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m 15s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  24m 42s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744373/YARN-2194-7.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 2e3d83f |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8465/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8465/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8465/console |


This message was automatically generated.

> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Sidharta Seethana
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model

2015-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619678#comment-14619678
 ] 

Hadoop QA commented on YARN-3836:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 30s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 43s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 49s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 45s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  6s | Tests passed in 
hadoop-yarn-common. |
| | |  46m 32s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744367/YARN-3836-YARN-2928.001.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 4c5f88f |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8464/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8464/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8464/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8464/console |


This message was automatically generated.

> add equals and hashCode to TimelineEntity and other classes in the data model
> -
>
> Key: YARN-3836
> URL: https://issues.apache.org/jira/browse/YARN-3836
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Attachments: YARN-3836-YARN-2928.001.patch
>
>
> Classes in the data model API (e.g. {{TimelineEntity}}, 
> {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
> {{hashCode()}}. This can cause problems when these objects are used in a 
> collection such as a {{HashSet}}. We should implement these methods wherever 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat

2015-07-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619673#comment-14619673
 ] 

Karthik Kambatla commented on YARN-1012:


Both the test result and findbugs warnings look unrelated.

+1. Checking this in. 

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-1012
> URL: https://issues.apache.org/jira/browse/YARN-1012
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Arun C Murthy
>Assignee: Inigo Goiri
> Attachments: YARN-1012-1.patch, YARN-1012-10.patch, 
> YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, 
> YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, 
> YARN-1012-9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.

2015-07-08 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619669#comment-14619669
 ] 

Rohith Sharma K S commented on YARN-3813:
-

Thanks [~sunilg] for going through the design doc and feedback.
bq. BasicAppMonitoringManager which can keep an entry of . 
Basically we mean Auxillary service is a separate service that start a new 
thread monitoring running applications i.e. very similar to any other service 
in RM like ZKRMStateStore/ClientRMService. 

bq. Could we have a new TIMEOUT event in RMAppImpl for this. In that case, we 
may not need a flag.
Yes, having a separate TIMEOUT event and TIMEOUT state is good approach and 
other option. Initially we consider to have new state TIMEOUT which require 
very huge changes across all the modules. To keep it simple, able to manage in 
KILLED state with proper diagnostic message and having new flag. New flag is 
for identifying whether app is timeout or not, which require in calculating 
metrics and considering RM restart feature.



> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
> Attachments: YARN Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3800) Simplify inmemory state for ReservationAllocation

2015-07-08 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3800:

Attachment: YARN-3800.005.patch

Addressed feedback. 
I feel the types are already on the left side so should be ok to be left out on 
the right side. But not a big deal so removed it.  

> Simplify inmemory state for ReservationAllocation
> -
>
> Key: YARN-3800
> URL: https://issues.apache.org/jira/browse/YARN-3800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3800.001.patch, YARN-3800.002.patch, 
> YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch, 
> YARN-3800.005.patch
>
>
> Instead of storing the ReservationRequest we store the Resource for 
> allocations, as thats the only thing we need. Ultimately we convert 
> everything to resources anyway



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3852) Add docker container support to container-executor

2015-07-08 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619660#comment-14619660
 ] 

Sidharta Seethana commented on YARN-3852:
-

One of the test failures ( {{TestPrivilegedOperationExecutor}} ) is unrelated 
to this patch. Please see the update to YARN-2194


> Add docker container support to container-executor 
> ---
>
> Key: YARN-3852
> URL: https://issues.apache.org/jira/browse/YARN-3852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Abin Shahab
> Attachments: YARN-3852.patch
>
>
> For security reasons, we need to ensure that access to the docker daemon and 
> the ability to run docker containers is restricted to privileged users ( i.e 
> users running applications should not have direct access to docker). In order 
> to ensure the node manager can run docker commands, we need to add docker 
> support to the container-executor binary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2015-07-08 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619656#comment-14619656
 ] 

Sidharta Seethana commented on YARN-2194:
-

submitted to jenkins. [~vinodkv] , please take a quick look and commit ?

> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Sidharta Seethana
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2194) Cgroups cease to work in RHEL7

2015-07-08 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-2194:

Attachment: YARN-2194-7.patch

attaching patch with a fix for unit test issue.

> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Sidharta Seethana
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2194) Cgroups cease to work in RHEL7

2015-07-08 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana reassigned YARN-2194:
---

Assignee: Sidharta Seethana  (was: Wei Yan)

> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Sidharta Seethana
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-2194) Cgroups cease to work in RHEL7

2015-07-08 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana reopened YARN-2194:
-

[~ywskycn] , [~vvasudev]

So, it looks like the final version of the patch that was eventually committed 
didn't actually go through jenkins ( wasn't submitted to jenkins or something 
else went wrong during submission ). There seems to be a test failing that 
needs to be fixed ( see below ) 

{code}
testSquashCGroupOperationsWithValidOperations(org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.TestPrivilegedOperationExecutor)
  Time elapsed: 0.178 sec  <<< FAILURE!
org.junit.ComparisonFailure: 
expected:<...n/container_01/tasks[,net_cls/hadoop_yarn/container_01/tasks,]blkio/hadoop_yarn/co...>
 but 
was:<...n/container_01/tasks[%net_cls/hadoop_yarn/container_01/tasks%]blkio/hadoop_yarn/co...>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.TestPrivilegedOperationExecutor.testSquashCGroupOperationsWithValidOperations(TestPrivilegedOperationExecutor.java:225)
{code}

thanks,
-Sidharta

> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619637#comment-14619637
 ] 

Hadoop QA commented on YARN-3116:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  19m 26s | Findbugs (version 3.0.0) 
appears to be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   7m 53s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 44s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 25s | The applied patch generated  1 
new checkstyle issues (total was 9, now 10). |
| {color:green}+1{color} | whitespace |   0m  4s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 47s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   6m 17s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  51m  6s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 107m 48s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability 
|
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744340/YARN-3116.v7.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 2e3d83f |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8461/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8461/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8461/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8461/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8461/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8461/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8461/console |


This message was automatically generated.

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3852) Add docker container support to container-executor

2015-07-08 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619628#comment-14619628
 ] 

Abin Shahab commented on YARN-3852:
---

[~vvasudev] I'm looking at the test failures. However, can you review the patch?

> Add docker container support to container-executor 
> ---
>
> Key: YARN-3852
> URL: https://issues.apache.org/jira/browse/YARN-3852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Abin Shahab
> Attachments: YARN-3852.patch
>
>
> For security reasons, we need to ensure that access to the docker daemon and 
> the ability to run docker containers is restricted to privileged users ( i.e 
> users running applications should not have direct access to docker). In order 
> to ensure the node manager can run docker commands, we need to add docker 
> support to the container-executor binary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-07-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619607#comment-14619607
 ] 

Sangjin Lee commented on YARN-3047:
---

Sounds good. I'll commit the patch shortly.

> [Data Serving] Set up ATS reader with basic request serving structure and 
> lifecycle
> ---
>
> Key: YARN-3047
> URL: https://issues.apache.org/jira/browse/YARN-3047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: Timeline_Reader(draft).pdf, 
> YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, 
> YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, 
> YARN-3047-YARN-2928.12.patch, YARN-3047-YARN-2928.13.patch, 
> YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, 
> YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, 
> YARN-3047.04.patch
>
>
> Per design in YARN-2938, set up the ATS reader as a service and implement the 
> basic structure as a service. It includes lifecycle management, request 
> serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model

2015-07-08 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3836:

Attachment: YARN-3836-YARN-2928.001.patch

In this patch I added equals and hashCode methods to timeline entity and 
related classes, and added some javadoc describing issues raised by 
[~jrottinghuis]. There are two things that I think worth a discussion here:

# Possible definitions of equivalence: had some offline discussion with 
[~zjshen] and we thought it would be fine to say two timeline entities are 
equal if their type and id are equal. As raised in this JIRA, oftentimes we'd 
like to put timeline entities in a hashmap (e.g. for aggregations). Our current 
design is sufficient to support use cases like: {{aggregatedEntity = 
map.get(incomingEntity); aggregatedEntity.aggregate(incomingEntity); }}. Of 
course user can always implement a deep comparison afterwards. 
# Checking the validity of objects: due to the requirements of the restful 
interface, we have to expose default constructors. However, this will cause 
several member variables of an timeline data object to be nulls, which is quite 
error pruning. I'm adding the isValid method to assist users to check if an 
object is valid (with all required fields set). 


> add equals and hashCode to TimelineEntity and other classes in the data model
> -
>
> Key: YARN-3836
> URL: https://issues.apache.org/jira/browse/YARN-3836
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Attachments: YARN-3836-YARN-2928.001.patch
>
>
> Classes in the data model API (e.g. {{TimelineEntity}}, 
> {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
> {{hashCode()}}. This can cause problems when these objects are used in a 
> collection such as a {{HashSet}}. We should implement these methods wherever 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3852) Add docker container support to container-executor

2015-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619591#comment-14619591
 ] 

Hadoop QA commented on YARN-3852:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   5m 32s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 39s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 20s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | whitespace |   0m  4s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 19s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | yarn tests |   5m 59s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  21m 31s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.nodemanager.TestDeletionService |
|   | 
hadoop.yarn.server.nodemanager.containermanager.linux.privileged.TestPrivilegedOperationExecutor
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744358/YARN-3852.patch |
| Optional Tests | javac unit |
| git revision | trunk / 2e3d83f |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8463/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8463/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8463/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8463/console |


This message was automatically generated.

> Add docker container support to container-executor 
> ---
>
> Key: YARN-3852
> URL: https://issues.apache.org/jira/browse/YARN-3852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Abin Shahab
> Attachments: YARN-3852.patch
>
>
> For security reasons, we need to ensure that access to the docker daemon and 
> the ability to run docker containers is restricted to privileged users ( i.e 
> users running applications should not have direct access to docker). In order 
> to ensure the node manager can run docker commands, we need to add docker 
> support to the container-executor binary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing

2015-07-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619590#comment-14619590
 ] 

Jian He commented on YARN-3866:
---

[~mding], thanks for the work ! some comments on the patch:

- Mark all getters/setters unstable for now.
- DecreasedContainer.java/IncreasedContainer.java - how about reusing the 
Container.java object?
- increaseRequests/decreaseRequests - We may just pass one list of 
changeResourceRequests instead of differentiating whether it’s increase or 
decrease ? as the underlying implementations are the same. IMO, this also saves 
application writers from differentiating them programmatically. 
{code}
List increaseRequests,

List decreaseRequests) 
{code}


> AM-RM protocol changes to support container resizing
> 
>
> Key: YARN-3866
> URL: https://issues.apache.org/jira/browse/YARN-3866
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-3866.1.patch, YARN-3866.2.patch
>
>
> YARN-1447 and YARN-1448 are outdated. 
> This ticket deals with AM-RM Protocol changes to support container resize 
> according to the latest design in YARN-1197.
> 1) Add increase/decrease requests in AllocateRequest
> 2) Get approved increase/decrease requests from RM in AllocateResponse
> 3) Add relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-08 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619586#comment-14619586
 ] 

Giovanni Matteo Fumarola commented on YARN-3116:


[~xgong] thanks for the comment, it's an accurate observation. 
[~zjshen], I think it is a good idea. I can start remove the flag and insert a 
new enum as you suggested. 

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619564#comment-14619564
 ] 

Zhijie Shen commented on YARN-3116:
---

Xuan, thanks for your comment. I think this is a good point. To be forward 
compatible, it's better to use the enum here instead of the boolean flag. In 
this case, we can add more enum, such as SystemContainer and so on in the 
future without adding new flag and breaking the compatibility. 
[~giovanni.fumarola], [~subru], how do you think?

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619551#comment-14619551
 ] 

Hadoop QA commented on YARN-3878:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 33s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 50s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 50s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 28s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 35s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| | |  39m 37s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744343/YARN-3878.08.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 2e3d83f |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8462/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8462/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8462/console |


This message was automatically generated.

> AsyncDispatcher can hang while stopping if it is configured for draining 
> events on stop
> ---
>
> Key: YARN-3878
> URL: https://issues.apache.org/jira/browse/YARN-3878
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
> YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
> YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch
>
>
> The sequence of events is as under :
> # RM is stopped while putting a RMStateStore Event to RMStateStore's 
> AsyncDispatcher. This leads to an Interrupted Exception being thrown.
> # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
> {{serviceStop}}, we will check if all events have been drained and wait for 
> event queue to drain(as RM State Store dispatcher is configured for queue to 
> drain on stop). 
> # This condition never becomes true and AsyncDispatcher keeps on waiting 
> incessantly for dispatcher event queue to drain till JVM exits.
> *Initial exception while posting RM State store event to queue*
> {noformat}
> 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
> (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
> STOPPED
> 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RM

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-08 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619552#comment-14619552
 ] 

Giovanni Matteo Fumarola commented on YARN-3116:


TestAppRunnability::testNotUserAsDefaultQueue is related to this patch.
The fix to solve the issue on TestFairScheduler::testQueueMaxAMShare brings the 
previous test to failure. 

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3800) Simplify inmemory state for ReservationAllocation

2015-07-08 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619550#comment-14619550
 ] 

Carlo Curino commented on YARN-3800:


Patch generally looks good, and I spoke with [~subru] that explained me why you 
are making these changes, and overall makes sense. 

A couple nits and then I am ok to commit this:
1) I think it is nicer to have types in HashMap<> and TreeMap<> initializations.
2) In other places you did this change already, but in 
TestRLESparseResourceAllocation  you have a generateAllocation that still 
produces ReservationRequests and then you immediately convert to Resource. 
Probably easier to change generateAllocation

Thanks for the work on this patch.

> Simplify inmemory state for ReservationAllocation
> -
>
> Key: YARN-3800
> URL: https://issues.apache.org/jira/browse/YARN-3800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3800.001.patch, YARN-3800.002.patch, 
> YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch
>
>
> Instead of storing the ReservationRequest we store the Resource for 
> allocations, as thats the only thing we need. Ultimately we convert 
> everything to resources anyway



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3852) Add docker container support to container-executor

2015-07-08 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-3852:
--
Attachment: YARN-3852.patch

Changes to container executor for running docker from LCE

> Add docker container support to container-executor 
> ---
>
> Key: YARN-3852
> URL: https://issues.apache.org/jira/browse/YARN-3852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Abin Shahab
> Attachments: YARN-3852.patch
>
>
> For security reasons, we need to ensure that access to the docker daemon and 
> the ability to run docker containers is restricted to privileged users ( i.e 
> users running applications should not have direct access to docker). In order 
> to ensure the node manager can run docker commands, we need to add docker 
> support to the container-executor binary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-08 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619526#comment-14619526
 ] 

Xuan Gong commented on YARN-3116:
-

The patch looks fine overall. Only one comment:
Instead of just specify a boolean flag to indicate the AM container, how about 
adding an enum of the containerType for the future extensibility ? such as 
https://issues.apache.org/jira/browse/YARN-2261, we will have the post 
application clean up container.

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619518#comment-14619518
 ] 

Zhijie Shen commented on YARN-3116:
---

Is TestAppRunnability failure related to this patch? The normal practice is to 
check if the test failure is related to the code change in this jira. If not, 
you can go ahead to fix a separate jira to tackling it.

Thanks for fixing TestPrivilegedOperationExecutor. It seems to be 
straightforward. So let's keep it here.

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619500#comment-14619500
 ] 

Hadoop QA commented on YARN-3047:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m  7s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 50s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 47s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 23s | There were no new checkstyle 
issues. |
| {color:blue}0{color} | shellcheck |   1m 23s | Shellcheck was not available. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 41s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 56s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 21s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   1m 23s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  46m 31s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744326/YARN-3047-YARN-2928.13.patch
 |
| Optional Tests | shellcheck javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 499ce52 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8460/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8460/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8460/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8460/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8460/console |


This message was automatically generated.

> [Data Serving] Set up ATS reader with basic request serving structure and 
> lifecycle
> ---
>
> Key: YARN-3047
> URL: https://issues.apache.org/jira/browse/YARN-3047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: Timeline_Reader(draft).pdf, 
> YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, 
> YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, 
> YARN-3047-YARN-2928.12.patch, YARN-3047-YARN-2928.13.patch, 
> YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, 
> YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, 
> YARN-3047.04.patch
>
>
> Per design in YARN-2938, set up the ATS reader as a service and implement the 
> basic structure as a service. It includes lifecycle management, request 
> serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3900) Protobuf layout of yarn_security_token causes errors in other protos that include it

2015-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619496#comment-14619496
 ] 

Hadoop QA commented on YARN-3900:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 44s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 51s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 46s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 55s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   3m 10s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:red}-1{color} | yarn tests |  50m 56s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 100m 29s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
|   | hadoop.yarn.server.resourcemanager.reservation.TestFairReservationSystem |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744322/YARN-3900.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 2e3d83f |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8458/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8458/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8458/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8458/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8458/console |


This message was automatically generated.

> Protobuf layout  of yarn_security_token causes errors in other protos that 
> include it
> -
>
> Key: YARN-3900
> URL: https://issues.apache.org/jira/browse/YARN-3900
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3900.001.patch
>
>
> Because of the subdirectory server used in 
> {{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto}}
>  there are errors in other protos that include them.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619478#comment-14619478
 ] 

Karthik Kambatla commented on YARN-3878:


+1, pending Jenkins.

Will go ahead and commit this once Jenkins is okay. 

> AsyncDispatcher can hang while stopping if it is configured for draining 
> events on stop
> ---
>
> Key: YARN-3878
> URL: https://issues.apache.org/jira/browse/YARN-3878
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
> YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
> YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch
>
>
> The sequence of events is as under :
> # RM is stopped while putting a RMStateStore Event to RMStateStore's 
> AsyncDispatcher. This leads to an Interrupted Exception being thrown.
> # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
> {{serviceStop}}, we will check if all events have been drained and wait for 
> event queue to drain(as RM State Store dispatcher is configured for queue to 
> drain on stop). 
> # This condition never becomes true and AsyncDispatcher keeps on waiting 
> incessantly for dispatcher event queue to drain till JVM exits.
> *Initial exception while posting RM State store event to queue*
> {noformat}
> 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
> (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
> STOPPED
> 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
> {noformat}
> *JStack of AsyncDispatcher hanging on stop*
> {noformat}
> "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e 
> waiting on condition [0x7fb9654e9000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000700b79250> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
> at java.lang.Thread.run(Thread.java:744)
> "main" prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
> [0x7fb989851000]
>  

[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat

2015-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619445#comment-14619445
 ] 

Hadoop QA commented on YARN-1012:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 39s | Pre-patch trunk has 3 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 44s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 23s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m 15s | The patch appears to introduce 3 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   7m  1s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |   6m  4s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  55m 45s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-common |
| Failed unit tests | 
hadoop.yarn.server.nodemanager.containermanager.linux.privileged.TestPrivilegedOperationExecutor
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744329/YARN-1012-11.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / 2e3d83f |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8459/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8459/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8459/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8459/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8459/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8459/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8459/console |


This message was automatically generated.

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-1012
> URL: https://issues.apache.org/jira/browse/YARN-1012
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Arun C Murthy
>Assignee: Inigo Goiri
> Attachments: YARN-1012-1.patch, YARN-1012-10.patch, 
> YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, 
> YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, 
> YARN-1012-9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-08 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: YARN-3878.08.patch

> AsyncDispatcher can hang while stopping if it is configured for draining 
> events on stop
> ---
>
> Key: YARN-3878
> URL: https://issues.apache.org/jira/browse/YARN-3878
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
> YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
> YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch
>
>
> The sequence of events is as under :
> # RM is stopped while putting a RMStateStore Event to RMStateStore's 
> AsyncDispatcher. This leads to an Interrupted Exception being thrown.
> # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
> {{serviceStop}}, we will check if all events have been drained and wait for 
> event queue to drain(as RM State Store dispatcher is configured for queue to 
> drain on stop). 
> # This condition never becomes true and AsyncDispatcher keeps on waiting 
> incessantly for dispatcher event queue to drain till JVM exits.
> *Initial exception while posting RM State store event to queue*
> {noformat}
> 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
> (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
> STOPPED
> 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
> {noformat}
> *JStack of AsyncDispatcher hanging on stop*
> {noformat}
> "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e 
> waiting on condition [0x7fb9654e9000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000700b79250> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
> at java.lang.Thread.run(Thread.java:744)
> "main" prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
> [0x7fb989851000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method

[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-07-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619440#comment-14619440
 ] 

Zhijie Shen commented on YARN-3047:
---

Thanks for kicking another jenkins build. IAC, the patch looks good to me.

> [Data Serving] Set up ATS reader with basic request serving structure and 
> lifecycle
> ---
>
> Key: YARN-3047
> URL: https://issues.apache.org/jira/browse/YARN-3047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: Timeline_Reader(draft).pdf, 
> YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, 
> YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, 
> YARN-3047-YARN-2928.12.patch, YARN-3047-YARN-2928.13.patch, 
> YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, 
> YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, 
> YARN-3047.04.patch
>
>
> Per design in YARN-2938, set up the ATS reader as a service and implement the 
> basic structure as a service. It includes lifecycle management, request 
> serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-08 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619434#comment-14619434
 ] 

Giovanni Matteo Fumarola commented on YARN-3116:


Thanks [~zjshen] for fixing the test failure. 
For the TestPrivilegedOperationExecutor I just applied a new patch with the 
fix. 
I got the same problem with TestAppRunnability when I was working on 
TestFairScheduler::testQueueMaxAMShare. 

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-08 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-3116:
---
Attachment: YARN-3116.v7.patch

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-07-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619428#comment-14619428
 ] 

Sangjin Lee commented on YARN-3047:
---

The build seems to be horked. Kicked off another jenkins run to see if it 
clears up.

> [Data Serving] Set up ATS reader with basic request serving structure and 
> lifecycle
> ---
>
> Key: YARN-3047
> URL: https://issues.apache.org/jira/browse/YARN-3047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: Timeline_Reader(draft).pdf, 
> YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, 
> YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, 
> YARN-3047-YARN-2928.12.patch, YARN-3047-YARN-2928.13.patch, 
> YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, 
> YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, 
> YARN-3047.04.patch
>
>
> Per design in YARN-2938, set up the ATS reader as a service and implement the 
> basic structure as a service. It includes lifecycle management, request 
> serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3901) Populate flow run data in the flow_run table

2015-07-08 Thread Vrushali C (JIRA)
Vrushali C created YARN-3901:


 Summary: Populate flow run data in the flow_run table
 Key: YARN-3901
 URL: https://issues.apache.org/jira/browse/YARN-3901
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vrushali C
Assignee: Vrushali C



As per the schema proposed in YARN-3815 in 
https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf

filing jira to track creation and population of data in the flow run table. 

Some points that are being  considered:
- Stores per flow run information aggregated across applications, flow version
RM’s collector writes to on app creation and app completion
- Per App collector writes to it for metric updates at a slower frequency than 
the metric updates to application table
primary key: cluster ! user ! flow ! flow run id
- Only the latest version of flow-level aggregated metrics will be kept, even 
if the entity and application level keep a timeseries.
- The running_apps column will be incremented on app creation, and decremented 
on app completion.
- For min_start_time the RM writer will simply write a value with the tag for 
the applicationId. A coprocessor will return the min value of all written 
values. - 
- Upon flush and compactions, the min value between all the cells of this 
column will be written to the cell without any tag (empty tag) and all the 
other cells will be discarded.
- Ditto for the max_end_time, but then the max will be kept.
- Tags are represented as #type:value. The type can be not set (0), or can 
indicate running (1) or complete (2). In those cases (for metrics) only 
complete app metrics are collapsed on compaction.
- The m! values are aggregated (summed) upon read. Only when applications are 
completed (indicated by tag type 2) can the values be collapsed.
- The application ids that have completed and been aggregated into the flow 
numbers are retained in a separate column for historical tracking: we don’t 
want to re-aggregate for those upon replay






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619408#comment-14619408
 ] 

Hadoop QA commented on YARN-3047:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 38s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m  0s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  4s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 23s | The applied patch generated  1 
new checkstyle issues (total was 214, now 214). |
| {color:blue}0{color} | shellcheck |   1m 44s | Shellcheck was not available. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 40s | mvn install still works. |
| {color:red}-1{color} | eclipse:eclipse |   0m 14s | The patch failed to build 
with eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 54s | Post-patch findbugs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common compilation is broken. |
| {color:red}-1{color} | findbugs |   2m 12s | Post-patch findbugs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 compilation is broken. |
| {color:green}+1{color} | findbugs |   2m 12s | The patch does not introduce 
any new Findbugs (version ) warnings. |
| {color:red}-1{color} | yarn tests |   0m 18s | Tests failed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   0m 19s | Tests failed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   0m 12s | Tests failed in 
hadoop-yarn-server-timelineservice. |
| | |  42m 50s | |
\\
\\
|| Reason || Tests ||
| Failed build | hadoop-yarn-api |
|   | hadoop-yarn-common |
|   | hadoop-yarn-server-timelineservice |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744326/YARN-3047-YARN-2928.13.patch
 |
| Optional Tests | shellcheck javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 499ce52 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8457/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8457/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8457/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8457/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8457/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8457/console |


This message was automatically generated.

> [Data Serving] Set up ATS reader with basic request serving structure and 
> lifecycle
> ---
>
> Key: YARN-3047
> URL: https://issues.apache.org/jira/browse/YARN-3047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: Timeline_Reader(draft).pdf, 
> YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, 
> YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, 
> YARN-3047-YARN-2928.12.patch, YARN-3047-YARN-2928.13.patch, 
> YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, 
> YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, 
> YARN-3047.04.patch
>
>
> Per design in YARN-2938, set up the ATS reader as a service and implement the 
> basic structure as a service. It includes lifecycle management, request 
> serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2015-07-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619399#comment-14619399
 ] 

Karthik Kambatla commented on YARN-2962:


Oh, sorry. 

For changing split index post upgrade to 3.x.y, it would be nice to make the 
change seamlessly. If that is not possible, requiring a format should be okay 
as long as we document it clearly. 

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch
>
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-07-08 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619400#comment-14619400
 ] 

Naganarasimha G R commented on YARN-3045:
-

+1, This seems to be a good idea for having priority in events... 

> [Event producers] Implement NM writing container lifecycle events to ATS
> 
>
> Key: YARN-3045
> URL: https://issues.apache.org/jira/browse/YARN-3045
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3045-YARN-2928.002.patch, 
> YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, 
> YARN-3045-YARN-2928.005.patch, YARN-3045.20150420-1.patch
>
>
> Per design in YARN-2928, implement NM writing container lifecycle events and 
> container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1012) Report NM aggregated container resource utilization in heartbeat

2015-07-08 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-1012:
--
Attachment: YARN-1012-11.patch

Fixed checkstyle issues (I hope the package-info is done properly).
I could not find a reason for the FindBug; in another patch it just disappeared 
so let's hope this is the case.

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-1012
> URL: https://issues.apache.org/jira/browse/YARN-1012
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Arun C Murthy
>Assignee: Inigo Goiri
> Attachments: YARN-1012-1.patch, YARN-1012-10.patch, 
> YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, 
> YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, 
> YARN-1012-9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619347#comment-14619347
 ] 

Zhijie Shen commented on YARN-3049:
---

Updated the title accordingly to describe the scope of this jira more 
accurately.

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-08 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3049:
--
Summary: [Storage Implementation] Implement storage reader interface to 
fetch raw data from HBase backend  (was: [Storage Implementation] Implement the 
storage reader interface to fetch raw data)

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3049) [Storage Implementation] Implement the storage reader interface to fetch raw data

2015-07-08 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3049:
--
Summary: [Storage Implementation] Implement the storage reader interface to 
fetch raw data  (was: [Compatiblity] Implement existing ATS queries in the new 
ATS design)

> [Storage Implementation] Implement the storage reader interface to fetch raw 
> data
> -
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-07-08 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3047:
---
Attachment: YARN-3047-YARN-2928.13.patch

> [Data Serving] Set up ATS reader with basic request serving structure and 
> lifecycle
> ---
>
> Key: YARN-3047
> URL: https://issues.apache.org/jira/browse/YARN-3047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: Timeline_Reader(draft).pdf, 
> YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, 
> YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, 
> YARN-3047-YARN-2928.12.patch, YARN-3047-YARN-2928.13.patch, 
> YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, 
> YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, 
> YARN-3047.04.patch
>
>
> Per design in YARN-2938, set up the ATS reader as a service and implement the 
> basic structure as a service. It includes lifecycle management, request 
> serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3900) Protobuf layout of yarn_security_token causes errors in other protos that include it

2015-07-08 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3900:

Attachment: YARN-3900.001.patch

> Protobuf layout  of yarn_security_token causes errors in other protos that 
> include it
> -
>
> Key: YARN-3900
> URL: https://issues.apache.org/jira/browse/YARN-3900
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3900.001.patch
>
>
> Because of the subdirectory server used in 
> {{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto}}
>  there are errors in other protos that include them.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2015-07-08 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619317#comment-14619317
 ] 

Wei Yan commented on YARN-2194:
---

[~vinodkv], Thanks for pointing it out. IMO, I don't think we need additional 
documentation as the patch doesn't bring new configuration or new 
implementation mechanism. We need a new documentation when we bring the systemd.

> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2015-07-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619306#comment-14619306
 ] 

Vinod Kumar Vavilapalli commented on YARN-2194:
---

[~ywskycn] / [~vvasudev], do we need any additional documentation for this? Say 
at 
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html
 ?

> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3900) Protobuf layout of yarn_security_token causes errors in other protos that include it

2015-07-08 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619307#comment-14619307
 ] 

Anubhav Dhoot commented on YARN-3900:
-

Simply printing EpochProto.toString causes the following error which on 
debugging shows the culprit.
The exception is thrown in Descriptors  
{noformat}
 for (int i = 0; i < proto.getDependencyCount(); i++) {
if (!dependencies[i].getName().equals(proto.getDependency(i))) {
  throw new DescriptorValidationException(result,
"Dependencies passed to FileDescriptor.buildFrom() don't match " +
"those listed in the FileDescriptorProto.");
{noformat}
And looking at the variables the mismatch is for
{noformat}
dependencies[i].getName() = {java.lang.String@856} 
"server/yarn_security_token.proto"
proto.getDependency(i) = {java.lang.String@857} "yarn_security_token.proto"
{noformat}

Here is the error
{noformat}
java.lang.ExceptionInInitializerError
at 
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$EpochProto.internalGetFieldAccessorTable(YarnServerResourceManagerRecoveryProtos.java:3522)
at 
com.google.protobuf.GeneratedMessage.getAllFieldsMutable(GeneratedMessage.java:105)
at 
com.google.protobuf.GeneratedMessage.getAllFields(GeneratedMessage.java:153)
at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:272)
at 
com.google.protobuf.TextFormat$Printer.access$400(TextFormat.java:248)
at com.google.protobuf.TextFormat.print(TextFormat.java:71)
at com.google.protobuf.TextFormat.printToString(TextFormat.java:118)
at 
com.google.protobuf.AbstractMessage.toString(AbstractMessage.java:106)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestProtos.testResourceProto(TestProtos.java:32)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:78)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:212)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.lang.IllegalArgumentException: Invalid embedded descriptor for 
"yarn_server_resourcemanager_recovery.proto".
at 
com.google.protobuf.Descriptors$FileDescriptor.internalBuildGeneratedFileFrom(Descriptors.java:301)
at 
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos.(YarnServerResourceManagerRecoveryProtos.java:5370)
... 35 more
Caused by: com.google.protobuf.Descriptors$DescriptorValidationException: 
yarn_server_resourcemanager_recovery.proto: Dependencies passed to 
FileDescriptor.buildFrom() don't match those listed in the FileDescriptorProto.
at 
com.google.protobuf.Descriptors$FileDescriptor.buildFrom(Descriptors.java:246)
at 
com.google.protobuf.Descriptors$FileDescriptor.internalBuildGeneratedFileFrom(Descriptors.java:299)
... 36 more
{noformat}



> Protobuf layout  of yarn_security_token causes errors in other protos that 
> include it
> -
>
> Key: YARN-3900
> URL: https://issues.apache.org/jira/browse/YARN-3900
> Project:

[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.

2015-07-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619291#comment-14619291
 ] 

Vinod Kumar Vavilapalli commented on YARN-3813:
---

A few years ago when this came up, I recommended doing this on top of YARN. But 
I've seen this enough in the wild to yield now.

It's a useful feature to come out of the box in YARN. Small enough, so I think 
we should go ahead with the implementation - not a lot of design dimensions.

> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
> Attachments: YARN Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat

2015-07-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619270#comment-14619270
 ] 

Karthik Kambatla commented on YARN-1012:


[~elgoiri] - could you look into the checkstyle and findbugs warnings please? 

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-1012
> URL: https://issues.apache.org/jira/browse/YARN-1012
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Arun C Murthy
>Assignee: Inigo Goiri
> Attachments: YARN-1012-1.patch, YARN-1012-10.patch, 
> YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, YARN-1012-5.patch, 
> YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, YARN-1012-9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3643) Provide a way to store only running applications in the state store

2015-07-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619264#comment-14619264
 ] 

Karthik Kambatla commented on YARN-3643:


That looks sufficient. Thanks for checking, Varun. 

> Provide a way to store only running applications in the state store
> ---
>
> Key: YARN-3643
> URL: https://issues.apache.org/jira/browse/YARN-3643
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>
> Today, we have a config that determines the number of applications that can 
> be stored in the state-store. Since there is no easy way to figure out the 
> maximum number of running applications at any point in time, users are forced 
> to use a conservative estimate. Our default ends up being even more 
> conservative.
> It would be nice to allow storing all running applications with a 
> conservative upper bound for it. This should allow for shorter recovery times 
> in most deployments. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2015-07-08 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619262#comment-14619262
 ] 

Varun Saxena commented on YARN-2962:


[~kasha], I mean if let us say somebody configures the split index as 3 
initially but later wants to change it to 2. In such a case we assume state 
store will have to be formatted ?

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch
>
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2015-07-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619254#comment-14619254
 ] 

Karthik Kambatla commented on YARN-2962:


Since we don't support rolling upgrades across major versions, it should be 
okay to require a state-store format. 

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch
>
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3886) Add cumulative wait times of apps at Queue level

2015-07-08 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3886:
---
Component/s: (was: yarn)
 scheduler
 resourcemanager

> Add cumulative wait times of apps at Queue level
> 
>
> Key: YARN-3886
> URL: https://issues.apache.org/jira/browse/YARN-3886
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: resourcemanager, scheduler
>Reporter: Raju Bairishetti
>Assignee: Raju Bairishetti
>
> Right now, we are having number of apps submitted/failed/killed/running at 
> queue level. We don't have any way to find on which queue apps are waiting 
> more time. 
> I hope adding wait times of apps at queue level will be helpful in viewing 
> the overall queue status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model

2015-07-08 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619245#comment-14619245
 ] 

Li Lu commented on YARN-3836:
-

Thanks [~vrushalic]!

> add equals and hashCode to TimelineEntity and other classes in the data model
> -
>
> Key: YARN-3836
> URL: https://issues.apache.org/jira/browse/YARN-3836
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
>
> Classes in the data model API (e.g. {{TimelineEntity}}, 
> {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
> {{hashCode()}}. This can cause problems when these objects are used in a 
> collection such as a {{HashSet}}. We should implement these methods wherever 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3900) Protobuf layout of yarn_security_token causes errors in other protos that include it

2015-07-08 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-3900:
---

 Summary: Protobuf layout  of yarn_security_token causes errors in 
other protos that include it
 Key: YARN-3900
 URL: https://issues.apache.org/jira/browse/YARN-3900
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot


Because of the subdirectory server used in 
{{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto}}
 there are errors in other protos that include them.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line

2015-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619154#comment-14619154
 ] 

Hadoop QA commented on YARN-313:


\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m  2s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 53s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m  9s | The applied patch generated  4 
new checkstyle issues (total was 229, now 232). |
| {color:green}+1{color} | whitespace |   0m  6s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   6m 51s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  51m 24s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 109m 21s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.client.cli.TestRMAdminCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744257/YARN-313-v6.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4119ad3 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8456/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8456/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8456/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8456/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8456/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8456/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8456/console |


This message was automatically generated.

> Add Admin API for supporting node resource configuration in command line
> 
>
> Key: YARN-313
> URL: https://issues.apache.org/jira/browse/YARN-313
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-313-sample.patch, YARN-313-v1.patch, 
> YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, 
> YARN-313-v6.patch
>
>
> We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" 
> to support changes of node's resource specified in a config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model

2015-07-08 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3836:
-
Assignee: Li Lu  (was: Vrushali C)

> add equals and hashCode to TimelineEntity and other classes in the data model
> -
>
> Key: YARN-3836
> URL: https://issues.apache.org/jira/browse/YARN-3836
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
>
> Classes in the data model API (e.g. {{TimelineEntity}}, 
> {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
> {{hashCode()}}. This can cause problems when these objects are used in a 
> collection such as a {{HashSet}}. We should implement these methods wherever 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model

2015-07-08 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619145#comment-14619145
 ] 

Vrushali C commented on YARN-3836:
--

Hi [~gtCarrera]

Sounds good, will reassign to you. 
thanks
Vrushali


> add equals and hashCode to TimelineEntity and other classes in the data model
> -
>
> Key: YARN-3836
> URL: https://issues.apache.org/jira/browse/YARN-3836
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>
> Classes in the data model API (e.g. {{TimelineEntity}}, 
> {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
> {{hashCode()}}. This can cause problems when these objects are used in a 
> collection such as a {{HashSet}}. We should implement these methods wherever 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3899) Add equals and hashCode to TimelineEntity

2015-07-08 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu resolved YARN-3899.
-
Resolution: Duplicate

Duplicate to YARN-3836. 

> Add equals and hashCode to TimelineEntity
> -
>
> Key: YARN-3899
> URL: https://issues.apache.org/jira/browse/YARN-3899
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>
> We need to add equals and hashCode methods for timeline entity so that we can 
> easily tell if two timeline entities are equal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model

2015-07-08 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619140#comment-14619140
 ] 

Li Lu commented on YARN-3836:
-

Hi [~vrushalic], I'd like to check the progress of this JIRA. Currently I'm 
blocked by this when building time-based aggregations. If you have any 
bandwidth problems maybe I can take this over? Thanks! 

> add equals and hashCode to TimelineEntity and other classes in the data model
> -
>
> Key: YARN-3836
> URL: https://issues.apache.org/jira/browse/YARN-3836
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>
> Classes in the data model API (e.g. {{TimelineEntity}}, 
> {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
> {{hashCode()}}. This can cause problems when these objects are used in a 
> collection such as a {{HashSet}}. We should implement these methods wherever 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3899) Add equals and hashCode to TimelineEntity

2015-07-08 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619125#comment-14619125
 ] 

Varun Saxena commented on YARN-3899:


[~gtCarrera9], YARN-3836 is meant for the same thing

> Add equals and hashCode to TimelineEntity
> -
>
> Key: YARN-3899
> URL: https://issues.apache.org/jira/browse/YARN-3899
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>
> We need to add equals and hashCode methods for timeline entity so that we can 
> easily tell if two timeline entities are equal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3899) Add equals and hashCode to TimelineEntity

2015-07-08 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3899:

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-2928

> Add equals and hashCode to TimelineEntity
> -
>
> Key: YARN-3899
> URL: https://issues.apache.org/jira/browse/YARN-3899
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>
> We need to add equals and hashCode methods for timeline entity so that we can 
> easily tell if two timeline entities are equal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3899) Add equals and hashCode to TimelineEntity

2015-07-08 Thread Li Lu (JIRA)
Li Lu created YARN-3899:
---

 Summary: Add equals and hashCode to TimelineEntity
 Key: YARN-3899
 URL: https://issues.apache.org/jira/browse/YARN-3899
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Li Lu
Assignee: Li Lu


We need to add equals and hashCode methods for timeline entity so that we can 
easily tell if two timeline entities are equal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3838) Rest API failing when ip configured in RM address in secure https mode

2015-07-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619072#comment-14619072
 ] 

Karthik Kambatla commented on YARN-3838:


I don't know my way around in this neck of the woods. [~vinodkv], [~xgong] know 
a thing or two. 

> Rest API failing when ip configured in RM address in secure https mode
> --
>
> Key: YARN-3838
> URL: https://issues.apache.org/jira/browse/YARN-3838
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-HADOOP-12096.patch, 0001-YARN-3810.patch, 
> 0001-YARN-3838.patch, 0002-YARN-3810.patch, 0002-YARN-3838.patch
>
>
> Steps to reproduce
> ===
> 1.Configure hadoop.http.authentication.kerberos.principal as below
> {code:xml}
>   
> hadoop.http.authentication.kerberos.principal
> HTTP/_h...@hadoop.com
>   
> {code}
> 2. In RM web address also configure IP 
> 3. Startup RM 
> Call Rest API for RM  {{ curl -i -k  --insecure --negotiate -u : https IP 
> /ws/v1/cluster/info"}}
> *Actual*
> Rest API  failing
> {code}
> 2015-06-16 19:03:49,845 DEBUG 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter: 
> Authentication exception: GSSException: No valid credentials provided 
> (Mechanism level: Failed to find any Kerberos credentails)
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos credentails)
>   at 
> org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:348)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:519)
>   at 
> org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line

2015-07-08 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-313:
-
Attachment: YARN-313-v6.patch

Trying to fix the broken unit test for refreshNodes (I still don't understand 
how it breaks). No success so far. Any ideas?

> Add Admin API for supporting node resource configuration in command line
> 
>
> Key: YARN-313
> URL: https://issues.apache.org/jira/browse/YARN-313
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-313-sample.patch, YARN-313-v1.patch, 
> YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, 
> YARN-313-v6.patch
>
>
> We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" 
> to support changes of node's resource specified in a config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3658) Federation "Capacity Allocation" across sub-cluster

2015-07-08 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618902#comment-14618902
 ] 

Carlo Curino commented on YARN-3658:


See the presentation attached to the umbrella jira YARN-2915.

> Federation "Capacity Allocation" across sub-cluster
> ---
>
> Key: YARN-3658
> URL: https://issues.apache.org/jira/browse/YARN-3658
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> This JIRA will track mechanisms to map federation level capacity allocations 
> to sub-cluster level ones. (Possibly via reservation mechanisms).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2915) Enable YARN RM scale out via federation using multiple RM's

2015-07-08 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618894#comment-14618894
 ] 

Carlo Curino commented on YARN-2915:


Lei, first let me make sure we are on the same page regarding router. The 
router is "soft-state" and a rather lightweight components, so we envision 
multiple routers to run in each data-center, and definitely agreed that we will 
have at least one router per DC if/when we run a federation cross-DC.


Lei, regarding the (good) question you asked about ARMMProxy. 

The comment is derived from some early experimentation we did with the 
AMRMProxy from YARN-2884. The idea is that you could use the mux/demux 
mechanics that the AMRMproxy provides to hide multiple standalone YARN clusters 
(not part of a federation), behind a single AMRMProxy. The scenarios goes as 
follows, you have a (possibly small) cluster that I will call the "launchpad" 
running one or more AMRMProxy(s), and say 2 standalone YARN clusters (C1, C2) 
that are not federation enabled. Jobs can be submitted to C1, C2 directly as 
always, and jobs that want to span, could be submitted to the "launchpad" 
cluster. By customizing the policy in the AMRMProxy that determines how we 
forward requests to clusters, you can have an AM running on the launchpad 
cluster to forward the requests to both C1 and C2. For C1 and C2 this will look 
like as if you submitted an unmanaged AM in each cluster. The job on the other 
hand thinks he is talking with a single RM that happens to run somewhere in the 
"launchpad" cluster (typically on the same node), but this is just the 
AMRMProxy impersonating an RM.

To make this even more clear: we don't strictly need an AMRMProxy on each node 
for the story to work. However, given our current thinking/experimentation we 
see advantages in running the AMRMProxy on each node, such as: we avoid 2 
network hops, we have a better AM-AMRMProxy ratios so we are more resilient to 
DDOS on the AMRMProtocol, less partitioning scenarios to consider, etc... so 
this is what we are advocating for in federation.

In federation, we go a step further and we ask C1 and C2 to commit to sharing 
resources in the federation (by heartbeating to the StateStore), and we provide 
lot more mechanics around it (e.g., UIs that show the overall use of resources 
across clusters, rebalancing mechanisms, fault-tolerance mechanics, etc..), 
that makes for a tighter overall experience. 
Overall, I think running the entire federation code will be better, but I was 
pointing out that some of the pieces we are building could be leveraged in 
isolation for more lightweight / ad-hoc forms of cross-cluster interaction. The 
rule-based global router that [~subru] mentioned above falls in the same 
category. 



> Enable YARN RM scale out via federation using multiple RM's
> ---
>
> Key: YARN-2915
> URL: https://issues.apache.org/jira/browse/YARN-2915
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Sriram Rao
>Assignee: Subru Krishnan
> Attachments: FEDERATION_CAPACITY_ALLOCATION_JIRA.pdf, 
> Yarn_federation_design_v1.pdf, federation-prototype.patch
>
>
> This is an umbrella JIRA that proposes to scale out YARN to support large 
> clusters comprising of tens of thousands of nodes.   That is, rather than 
> limiting a YARN managed cluster to about 4k in size, the proposal is to 
> enable the YARN managed cluster to be elastically scalable.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3888) ApplicationMaster link is broken in RM WebUI when appstate is NEW

2015-07-08 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618861#comment-14618861
 ] 

Bibin A Chundatt commented on YARN-3888:


Please review the patch attached.

> ApplicationMaster link is broken in RM WebUI when appstate is NEW 
> --
>
> Key: YARN-3888
> URL: https://issues.apache.org/jira/browse/YARN-3888
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-3888.patch, 0002-YARN-3888.patch
>
>
> When the application state is NEW in RM Web UI  *Application Master* link is 
> broken.
> {code}
> 15/07/06 19:46:16 INFO impl.YarnClientImpl: Application submission is not 
> finished, submitted application application_1436191509558_0003 is still in NEW
> 15/07/06 19:46:18 INFO impl.YarnClientImpl: Application submission is not 
> finished, submitted application application_1436191509558_0003 is still in NEW
> 15/07/06 19:46:20 INFO impl.YarnClientImpl: Application submission is not 
> finished, submitted application application_1436191509558_0003 is still in NEW
> {code}
> *URL formed* 
> http://:45020/cluster/app/application_1436191509558_0003
> The above link is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration

2015-07-08 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3894:
---
Description: 
Currently in capacity Scheduler when capacity configuration is wrong
RM will shutdown, but not incase of NodeLabels capacity mismatch


In {{CapacityScheduler#initializeQueues}}

{code}
  private void initializeQueues(CapacitySchedulerConfiguration conf)
throws IOException {   
root = 
parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
queues, queues, noop);
labelManager.reinitializeQueueLabels(getQueueToLabels());
root = 
parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
queues, queues, noop);
LOG.info("Initialized root queue " + root);
initializeQueueMappings();
setQueueAcls(authorizer, queues);
  }
{code}

{{labelManager}} is initialized from queues and calculation for Label level 
capacity mismatch happens in {{parseQueue}} . So during initialization 
{{parseQueue}} the labels will be empty . 

*Steps to reproduce*
# Configure RM with capacity scheduler
# Add one or two node label from rmadmin
# Configure capacity xml with nodelabel but issue with capacity configuration 
for already added label
# Restart both RM
# Check on service init of capacity scheduler node label list is populated 

*Expected*

RM should not start 


*Current exception on reintialize check*

{code}
2015-07-07 19:18:25,655 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
usedResources=, usedCapacity=0.0, absoluteUsedCapacity=0.0, 
numApps=0, numContainers=0
2015-07-07 19:18:25,656 WARN 
org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
queues.
java.io.IOException: Failed to re-init queues
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
children of queue root for label=node2
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
... 8 more
2015-07-07 19:18:25,656 WARN 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
OPERATION=refreshQueues TARGET=AdminService RESULT=FAILURE  
DESCRIPTION=Exception refresh queues.   PERMISSIONS=
2015-07-07 19:18:25,656 WARN 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
OPERATION=transitionToActiveTARGET=RMHAProtocolService  RESULT=FAILURE  
DESCRIPTION=Exception transitioning to active   PERMISSIONS=
2015-07-07 19:18:25,656 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
transitioning to Active mode
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:321)
at 
org.apache.hadoop.yarn.server.resourcemanager.

[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-08 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3893:
---
Description: 
Cases that can cause this.

# Capacity scheduler xml is wrongly configured during switch
# Refresh ACL failure due to configuration
# Refresh User group failure due to configuration

Continuously both RM will try to be active

{code}

dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
 ./yarn rmadmin  -getServiceState rm1
15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
active
dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
 ./yarn rmadmin  -getServiceState rm2
15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
active

{code}

# Both Web UI active
# Status shown as active for both RM


  was:
Cases that can cause failure

# Capacity scheduler xml is wrongly configured in switch
# Refresh ACL failure due to configuration
# Refresh User group failure due to configuration


Capacity failure condition have given logs below
Continuously both RM will try to be active

{code}
2015-07-07 19:18:25,655 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
usedResources=, usedCapacity=0.0, absoluteUsedCapacity=0.0, 
numApps=0, numContainers=0
2015-07-07 19:18:25,656 WARN 
org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
queues.
java.io.IOException: Failed to re-init queues
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
children of queue root for label=node2
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
... 8 more
2015-07-07 19:18:25,656 WARN 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
OPERATION=refreshQueues TARGET=AdminService RESULT=FAILURE  
DESCRIPTION=Exception refresh queues.   PERMISSIONS=
2015-07-07 19:18:25,656 WARN 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
OPERATION=transitionToActiveTARGET=RMHAProtocolService  RESULT=FAILURE  
DESCRIPTION=Exception transitioning to active   PERMISSIONS=
2015-07-07 19:18:25,656 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
transitioning to Active mode
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:321)
at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
... 4 more
Caused by: org.apache.hadoop.ha.ServiceFailedExcepti

[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-07-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618831#comment-14618831
 ] 

Sangjin Lee commented on YARN-3047:
---

I need to double check the current state of the code, but in principle the 
writer will not depend on the configuration for the end point. The writer 
(collector) is created dynamically on a per-app basis, and its end point is 
registered on the RM. That's how timeline clients discover the end points. So I 
doubt that the configuration is used any longer to define the writer end point 
(correct me if I'm wrong [~zjshen]).

> [Data Serving] Set up ATS reader with basic request serving structure and 
> lifecycle
> ---
>
> Key: YARN-3047
> URL: https://issues.apache.org/jira/browse/YARN-3047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: Timeline_Reader(draft).pdf, 
> YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, 
> YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, 
> YARN-3047-YARN-2928.12.patch, YARN-3047.001.patch, YARN-3047.003.patch, 
> YARN-3047.005.patch, YARN-3047.006.patch, YARN-3047.007.patch, 
> YARN-3047.02.patch, YARN-3047.04.patch
>
>
> Per design in YARN-2938, set up the ATS reader as a service and implement the 
> basic structure as a service. It includes lifecycle management, request 
> serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618812#comment-14618812
 ] 

Hadoop QA commented on YARN-3896:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 13s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 40s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 23s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  50m 59s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  87m 46s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744222/YARN-3896.01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / bd4e109 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8455/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8455/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8455/console |


This message was automatically generated.

> RMNode transitioned from RUNNING to REBOOTED because its response id had not 
> been reset
> ---
>
> Key: YARN-3896
> URL: https://issues.apache.org/jira/browse/YARN-3896
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3896.01.patch
>
>
> {noformat}
> 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
> Resolved 10.208.132.153 to /default-rack
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Reconnect from the node at: 10.208.132.153
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
> with capability: , assigned nodeId 
> 10.208.132.153:8041
> 2015-07-03 16:49:39,104 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
> behind rm response id:2506413 nm response id:0
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node 10.208.132.153:8041 as it is now REBOOTED
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
> {noformat}
> The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
> set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
> heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration

2015-07-08 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618776#comment-14618776
 ] 

Sunil G commented on YARN-3894:
---

Thanks [~bibinchundatt] for reporting and providing analysis.

During {{initScheduler}} call from *CapacityScheduler#serviceInit*, we will 
initialize the queues. In the same callflow, we also will validate the capacity 
of nodelabel against the queue capacity from {{ParentQueue#setChildQueues}}.
{code}
   // check label capacities
for (String nodeLabel : labelManager.getClusterNodeLabelNames()) {
  float capacityByLabel = queueCapacities.getCapacity(nodeLabel);
  // check children's labels
  float sum = 0;
  for (CSQueue queue : childQueues) {
sum += queue.getQueueCapacities().getCapacity(nodeLabel);
  }
  if ((capacityByLabel > 0 && Math.abs(1.0f - sum) > PRECISION)
  || (capacityByLabel == 0) && (sum > 0)) {
throw new IllegalArgumentException("Illegal" + " capacity of "
+ sum + " for children of queue " + queueName
+ " for label=" + nodeLabel);
  }
}
{code}

As per this code, if there is a mismatch in capacity for nodelabel against the 
queue capacity, it should through *IllegalArgumentException*. But this will not 
happen in a case where we configure a wrong capacity for label in cs xml, and 
restart RM.

*Issue:*
During {{CommonNodeLabelsManager#serviceStart}}, labels will re-populated from 
old mirror file. But {{initScheduler}} and above call flow will happen from 
*serviceInit* instead of *serviceStart*
This will make {{labelManager.getClusterNodeLabelNames()}} call as empty in 
above code. and desired exception wont be thrown.

IMO We can move the node label init and recovery to serviceInit rather than 
serviceStart. [~leftnoteasy], could you please pool in your thoughts.

> RM startup should fail for wrong CS xml NodeLabel capacity configuration 
> -
>
> Key: YARN-3894
> URL: https://issues.apache.org/jira/browse/YARN-3894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: capacity-scheduler.xml
>
>
> Currently in capacity Scheduler when capacity configuration is wrong
> RM shutdown is the current behaviour, but not incase of NodeLabels capacity 
> mismatch
> In {{CapacityScheduler#initializeQueues}}
> {code}
>   private void initializeQueues(CapacitySchedulerConfiguration conf)
> throws IOException {   
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> labelManager.reinitializeQueueLabels(getQueueToLabels());
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> LOG.info("Initialized root queue " + root);
> initializeQueueMappings();
> setQueueAcls(authorizer, queues);
>   }
> {code}
> {{labelManager}} is initialized from queues and calculation for Label level 
> capacity mismatch happens in {{parseQueue}} . So during initialization 
> {{parseQueue}} the labels will be empty . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3898) YARN web console only proxies GET to application master but doesn't provide any feedback for other HTTP methods

2015-07-08 Thread Kam Kasravi (JIRA)
Kam Kasravi created YARN-3898:
-

 Summary: YARN web console only proxies GET to application master 
but doesn't provide any feedback for other HTTP methods
 Key: YARN-3898
 URL: https://issues.apache.org/jira/browse/YARN-3898
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.1
Reporter: Kam Kasravi
Priority: Minor


YARN web console should provide some feedback when filtering (and preventing) 
DELETE, POST, PUT, etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-08 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3896:

Target Version/s: 2.8.0

> RMNode transitioned from RUNNING to REBOOTED because its response id had not 
> been reset
> ---
>
> Key: YARN-3896
> URL: https://issues.apache.org/jira/browse/YARN-3896
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3896.01.patch
>
>
> {noformat}
> 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
> Resolved 10.208.132.153 to /default-rack
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Reconnect from the node at: 10.208.132.153
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
> with capability: , assigned nodeId 
> 10.208.132.153:8041
> 2015-07-03 16:49:39,104 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
> behind rm response id:2506413 nm response id:0
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node 10.208.132.153:8041 as it is now REBOOTED
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
> {noformat}
> The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
> set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
> heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-08 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618750#comment-14618750
 ] 

Devaraj K commented on YARN-3896:
-

Thanks [~hex108] for delivering the patch quickly.

Can you also add a test to simulate the scenario as part of the patch?


> RMNode transitioned from RUNNING to REBOOTED because its response id had not 
> been reset
> ---
>
> Key: YARN-3896
> URL: https://issues.apache.org/jira/browse/YARN-3896
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3896.01.patch
>
>
> {noformat}
> 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
> Resolved 10.208.132.153 to /default-rack
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Reconnect from the node at: 10.208.132.153
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
> with capability: , assigned nodeId 
> 10.208.132.153:8041
> 2015-07-03 16:49:39,104 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
> behind rm response id:2506413 nm response id:0
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node 10.208.132.153:8041 as it is now REBOOTED
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
> {noformat}
> The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
> set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
> heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration

2015-07-08 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618743#comment-14618743
 ] 

Bibin A Chundatt commented on YARN-3894:


Detailed analysis and root cause
# Capacity scheduler queue initialization happens Capacity#serviceInit
# RMNodeLabelsManager#addToCluserNodeLabels store is added in service start on 
recovery

> RM startup should fail for wrong CS xml NodeLabel capacity configuration 
> -
>
> Key: YARN-3894
> URL: https://issues.apache.org/jira/browse/YARN-3894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: capacity-scheduler.xml
>
>
> Currently in capacity Scheduler when capacity configuration is wrong
> RM shutdown is the current behaviour, but not incase of NodeLabels capacity 
> mismatch
> In {{CapacityScheduler#initializeQueues}}
> {code}
>   private void initializeQueues(CapacitySchedulerConfiguration conf)
> throws IOException {   
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> labelManager.reinitializeQueueLabels(getQueueToLabels());
> root = 
> parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
> queues, queues, noop);
> LOG.info("Initialized root queue " + root);
> initializeQueueMappings();
> setQueueAcls(authorizer, queues);
>   }
> {code}
> {{labelManager}} is initialized from queues and calculation for Label level 
> capacity mismatch happens in {{parseQueue}} . So during initialization 
> {{parseQueue}} the labels will be empty . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-08 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618738#comment-14618738
 ] 

Varun Saxena commented on YARN-3878:


Yeah but we still need to check the thread state. Then let me add something in 
DrainDispatcher(subclass of AsyncDispatcher) to return thread state and wait on 
it.

> AsyncDispatcher can hang while stopping if it is configured for draining 
> events on stop
> ---
>
> Key: YARN-3878
> URL: https://issues.apache.org/jira/browse/YARN-3878
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
> YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
> YARN-3878.06.patch, YARN-3878.07.patch
>
>
> The sequence of events is as under :
> # RM is stopped while putting a RMStateStore Event to RMStateStore's 
> AsyncDispatcher. This leads to an Interrupted Exception being thrown.
> # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
> {{serviceStop}}, we will check if all events have been drained and wait for 
> event queue to drain(as RM State Store dispatcher is configured for queue to 
> drain on stop). 
> # This condition never becomes true and AsyncDispatcher keeps on waiting 
> incessantly for dispatcher event queue to drain till JVM exits.
> *Initial exception while posting RM State store event to queue*
> {noformat}
> 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
> (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
> STOPPED
> 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
> {noformat}
> *JStack of AsyncDispatcher hanging on stop*
> {noformat}
> "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e 
> waiting on condition [0x7fb9654e9000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000700b79250> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
> at java.lang.Thread.run(Thread.java:744)
> "main" prio=10 tid=0x7fb980

[jira] [Commented] (YARN-3892) NPE on RMStateStore#serviceStop when CapacityScheduler#serviceInit fails

2015-07-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618697#comment-14618697
 ] 

Hudson commented on YARN-3892:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #238 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/238/])
YARN-3892. Fixed NPE on RMStateStore#serviceStop when 
CapacityScheduler#serviceInit fails. Contributed by Bibin A Chundatt (jianhe: 
rev c9dd2cada055c0beffd04bad0ded8324f66ad1b7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* hadoop-yarn-project/CHANGES.txt


> NPE on RMStateStore#serviceStop when CapacityScheduler#serviceInit fails
> 
>
> Key: YARN-3892
> URL: https://issues.apache.org/jira/browse/YARN-3892
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3892.patch
>
>
> NPE on RMStateStore#serviceStop when CapacityScheduler#serviceInit fails
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.closeInternal(ZKRMStateStore.java:315)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:516)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:598)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:954)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:254)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1184)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >