[jira] [Updated] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3836: Attachment: YARN-3836-YARN-2928.002.patch Hi [~sjlee0], thanks for the prompt feedback! I updated the patch according to your comments. Specifically: bq. What I would prefer is to override equals() and hashCode() for Identifier instead, and have simple equals() and hashCode() implementations for TimelineEntity that mostly delegate to Identifier. The rationale is that Identifier can be useful as keys to collections in its own right, and thus should override those methods. That's a nice suggestion! Fixed. bq. One related question for your use case of putting entities into a map: I notice that you're using the TimelineEntity instances directly as keys to maps. Wouldn't it be better to use their Identifier instances as keys instead? Identifier instances are easier and cheaper to construct and compare. I think I used an inappropriate example here. I meant to say HashSet but not HashMap. bq. We should make isValid() a proper javadoc hyperlink Fixed. bq. Since we're checking the entity type and the id, wouldn't it be sufficient to check whether the object is an instance of TimelineEntity? I agree. Fixed all related ones. > add equals and hashCode to TimelineEntity and other classes in the data model > - > > Key: YARN-3836 > URL: https://issues.apache.org/jira/browse/YARN-3836 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Li Lu > Attachments: YARN-3836-YARN-2928.001.patch, > YARN-3836-YARN-2928.002.patch > > > Classes in the data model API (e.g. {{TimelineEntity}}, > {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or > {{hashCode()}}. This can cause problems when these objects are used in a > collection such as a {{HashSet}}. We should implement these methods wherever > appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619997#comment-14619997 ] Tsuyoshi Ozawa commented on YARN-3798: -- [~zxu] Sorry for the delay. I missed you comment. Agree. fixing it shortly. > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, > YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:745) > 2015-06-09 10:09:44,887 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed > out ZK retries. Giving up! > 2015-06-09 10:09:44,887 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating appAttempt: appattempt_1433764310492_7152_01 > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateSt
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619994#comment-14619994 ] Hudson commented on YARN-2194: -- FAILURE: Integrated in Hadoop-trunk-Commit #8138 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8138/]) YARN-2194. Addendum patch to fix failing unit test in TestPrivilegedOperationExecutor. Contributed by Sidharta Seethana. (vvasudev: rev 63d0365088ff9fca0baaf3c4c3c01f80c72d3281) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/TestPrivilegedOperationExecutor.java > Cgroups cease to work in RHEL7 > -- > > Key: YARN-2194 > URL: https://issues.apache.org/jira/browse/YARN-2194 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Wei Yan >Assignee: Sidharta Seethana >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, > YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch > > > In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the > controller name leads to container launch failure. > RHEL7 deprecates libcgroup and recommends the user of systemd. However, > systemd has certain shortcomings as identified in this JIRA (see comments). > This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in "InvalidStateTransitonException"
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619992#comment-14619992 ] Brahma Reddy Battula commented on YARN-3381: [~ajisakaa] thanks a lot for taking a look into this issue..Updated the patch based on your comment.Kindly review.. > A typographical error in "InvalidStateTransitonException" > - > > Key: YARN-3381 > URL: https://issues.apache.org/jira/browse/YARN-3381 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.6.0 >Reporter: Xiaoshuang LU >Assignee: Brahma Reddy Battula > Labels: BB2015-05-TBR > Attachments: YARN-3381-002.patch, YARN-3381-003.patch, > YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, > YARN-3381.patch > > > Appears that "InvalidStateTransitonException" should be > "InvalidStateTransitionException". Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3381) A typographical error in "InvalidStateTransitonException"
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3381: --- Attachment: YARN-3381-005.patch > A typographical error in "InvalidStateTransitonException" > - > > Key: YARN-3381 > URL: https://issues.apache.org/jira/browse/YARN-3381 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.6.0 >Reporter: Xiaoshuang LU >Assignee: Brahma Reddy Battula > Labels: BB2015-05-TBR > Attachments: YARN-3381-002.patch, YARN-3381-003.patch, > YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, > YARN-3381.patch > > > Appears that "InvalidStateTransitonException" should be > "InvalidStateTransitionException". Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level
[ https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619989#comment-14619989 ] Ajith S commented on YARN-3885: --- /root A /\ C B /\ D E +*Before fix:*+ NAME: queueA CUR: PEN: GAR: NORM: NaN IDEAL_ASSIGNED: IDEAL_PREEMPT: ACTUAL_PREEMPT: *{color:red}UNTOUCHABLE: PREEMPTABLE: {color}* NAME: queueB CUR: PEN: GAR: NORM: NaN IDEAL_ASSIGNED: IDEAL_PREEMPT: ACTUAL_PREEMPT: *UNTOUCHABLE: PREEMPTABLE: * NAME: queueC CUR: PEN: GAR: NORM: 1.0 IDEAL_ASSIGNED: IDEAL_PREEMPT: ACTUAL_PREEMPT: *UNTOUCHABLE: PREEMPTABLE: * NAME: queueD CUR: PEN: GAR: NORM: NaN IDEAL_ASSIGNED: IDEAL_PREEMPT: ACTUAL_PREEMPT: *UNTOUCHABLE: PREEMPTABLE: * NAME: queueE CUR: PEN: GAR: NORM: 1.0 IDEAL_ASSIGNED: IDEAL_PREEMPT: ACTUAL_PREEMPT: *UNTOUCHABLE: PREEMPTABLE: * +*After:*+ NAME: queueA CUR: PEN: GAR: NORM: 1.0 IDEAL_ASSIGNED: IDEAL_PREEMPT: ACTUAL_PREEMPT: *{color:green}UNTOUCHABLE: PREEMPTABLE: {color}* NAME: queueB CUR: PEN: GAR: NORM: NaN IDEAL_ASSIGNED: IDEAL_PREEMPT: ACTUAL_PREEMPT: *UNTOUCHABLE: PREEMPTABLE: * NAME: queueC CUR: PEN: GAR: NORM: 1.0 IDEAL_ASSIGNED: IDEAL_PREEMPT: ACTUAL_PREEMPT: *UNTOUCHABLE: PREEMPTABLE: * NAME: queueD CUR: PEN: GAR: NORM: NaN IDEAL_ASSIGNED: IDEAL_PREEMPT: ACTUAL_PREEMPT: *UNTOUCHABLE: PREEMPTABLE: * NAME: queueE CUR: PEN: GAR: NORM: 1.0 IDEAL_ASSIGNED: IDEAL_PREEMPT: ACTUAL_PREEMPT: *UNTOUCHABLE: PREEMPTABLE: * > ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 > level > -- > > Key: YARN-3885 > URL: https://issues.apache.org/jira/browse/YARN-3885 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0 >Reporter: Ajith S >Priority: Critical > Attachments: YARN-3885.02.patch, YARN-3885.03.patch, > YARN-3885.04.patch, YARN-3885.patch > > > when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}} > this piece of code, to calculate {{untoucable}} doesnt consider al the > children, it considers only immediate childern -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619988#comment-14619988 ] Varun Vasudev commented on YARN-2194: - My apologies for missing the failing unit test [~sidharta-s]. I've committed the fix for the failing unit test. > Cgroups cease to work in RHEL7 > -- > > Key: YARN-2194 > URL: https://issues.apache.org/jira/browse/YARN-2194 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Wei Yan >Assignee: Sidharta Seethana >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, > YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch > > > In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the > controller name leads to container launch failure. > RHEL7 deprecates libcgroup and recommends the user of systemd. However, > systemd has certain shortcomings as identified in this JIRA (see comments). > This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.
[ https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619984#comment-14619984 ] nijel commented on YARN-3813: - Thanks [~sunilg] and [~devaraj.k] for the comments bq.How frequently are you going to check this condition for each application? Plan is to have a configurable interval default to 30 sec (yarn.app.timeout.monitor.interval) bq.Could we have a new TIMEOUT event in RMAppImpl for this. In that case, we may not need a flag. bq.I feel having a TIMEOUT state for RMAppImpl would be proper here. ok. We will add a TIMEOUT state and handle the changes Due to this there will be few changes in app transitions, client package and the WEBUI bq.I have a suggestion here.We can have a BasicAppMonitoringManager which can keep an entry of . bq. when the application gets submitted to RM then we can register the application with RMAppTimeOutMonitor using the user specified timeout. Yes. Good suggestion. This we will update as a registration mechanism. But since each application can have its own timeout period, the code reusability looks like minimal. {code} RMAppTimeOutMonitor local map (appid, timeout) add/register(appid, timeout) --> from RMAppImpl Run -> if app is running/submitted and elapsed the time, kill it. If already completed, remove from map. No delete/unregister method --> this application will be be removed from map from run method {code} > Support Application timeout feature in YARN. > - > > Key: YARN-3813 > URL: https://issues.apache.org/jira/browse/YARN-3813 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: nijel > Attachments: YARN Application Timeout .pdf > > > It will be useful to support Application Timeout in YARN. Some use cases are > not worried about the output of the applications if the application is not > completed in a specific time. > *Background:* > The requirement is to show the CDR statistics of last few minutes, say for > every 5 minutes. The same Job will run continuously with different dataset. > So one job will be started in every 5 minutes. The estimate time for this > task is 2 minutes or lesser time. > If the application is not completing in the given time the output is not > useful. > *Proposal* > So idea is to support application timeout, with which timeout parameter is > given while submitting the job. > Here, user is expecting to finish (complete or kill) the application in the > given time. > One option for us is to move this logic to Application client (who submit the > job). > But it will be nice if it can be generic logic and can make more robust. > Kindly provide your suggestions/opinion on this feature. If it sounds good, i > will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level
[ https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated YARN-3885: -- Attachment: YARN-3885.04.patch > ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 > level > -- > > Key: YARN-3885 > URL: https://issues.apache.org/jira/browse/YARN-3885 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0 >Reporter: Ajith S >Priority: Critical > Attachments: YARN-3885.02.patch, YARN-3885.03.patch, > YARN-3885.04.patch, YARN-3885.patch > > > when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}} > this piece of code, to calculate {{untoucable}} doesnt consider al the > children, it considers only immediate childern -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level
[ https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619971#comment-14619971 ] Ajith S commented on YARN-3885: --- Hi [~sunilg] Sorry for the delay, i have added the testcase > ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 > level > -- > > Key: YARN-3885 > URL: https://issues.apache.org/jira/browse/YARN-3885 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0 >Reporter: Ajith S >Priority: Critical > Attachments: YARN-3885.02.patch, YARN-3885.03.patch, > YARN-3885.04.patch, YARN-3885.patch > > > when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}} > this piece of code, to calculate {{untoucable}} doesnt consider al the > children, it considers only immediate childern -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3903) Disable preemption at Queue level for Fair Scheduler
He Tianyi created YARN-3903: --- Summary: Disable preemption at Queue level for Fair Scheduler Key: YARN-3903 URL: https://issues.apache.org/jira/browse/YARN-3903 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.3.0 Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 (2014-12-08) x86_64 Reporter: He Tianyi Priority: Trivial YARN-2056 supports disabling preemption at queue level for CapacityScheduler. As for fair scheduler, we recently encountered the same need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3902) Fair scheduler preempts ApplicationMaster
He Tianyi created YARN-3902: --- Summary: Fair scheduler preempts ApplicationMaster Key: YARN-3902 URL: https://issues.apache.org/jira/browse/YARN-3902 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.3.0 Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 (2014-12-08) x86_64 Reporter: He Tianyi YARN-2022 have fixed the similar issue related to CapacityScheduler. However, FairScheduler still suffer, preempting AM while other normal containers running out there. I think we should take the same approach, avoid AM being preempted unless there is no container running other than AM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in "InvalidStateTransitonException"
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619951#comment-14619951 ] Akira AJISAKA commented on YARN-3381: - Would you modify the old class {{InvalidStateTransitonException}} to extend the new class {{InvalidStateTransitionException}}? That way we can simply remove the old class as incompatible change after fixing this issue. > A typographical error in "InvalidStateTransitonException" > - > > Key: YARN-3381 > URL: https://issues.apache.org/jira/browse/YARN-3381 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.6.0 >Reporter: Xiaoshuang LU >Assignee: Brahma Reddy Battula > Labels: BB2015-05-TBR > Attachments: YARN-3381-002.patch, YARN-3381-003.patch, > YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381.patch > > > Appears that "InvalidStateTransitonException" should be > "InvalidStateTransitionException". Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619885#comment-14619885 ] Hadoop QA commented on YARN-3116: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 21m 9s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 7m 54s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 4s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 36s | The applied patch generated 1 new checkstyle issues (total was 9, now 10). | | {color:green}+1{color} | whitespace | 0m 3s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 13s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 1s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 6m 15s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 27m 51s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 87m 29s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.security.TestAMRMTokens | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel | | | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | | | hadoop.yarn.server.resourcemanager.TestResourceManager | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId | | | hadoop.yarn.server.resourcemanager.TestRM | | | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry | | | hadoop.yarn.server.resourcemanager.TestApplicationMasterService | | | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMReconnect | | | hadoop.yarn.server.resourcemanager.TestClientRMService | | | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerHealth | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.resourcetracker.TestRMNMRPCResponseId | | | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.TestRMHA | | | hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens | | | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | | | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils | | | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler | | | hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA | | | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA | | | org.apache.hadoop.yarn.server.resourcemanager.logaggregationstatus.TestRMAppLogAggregationStatus | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744380/YARN-3116.v8.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b8832fc | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8467/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8467/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log |
[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619829#comment-14619829 ] Inigo Goiri commented on YARN-1012: --- Thank you [~kasha]! Once commited, I'm moving to YARN-3534 to reuse the ResourceUtilization. > Report NM aggregated container resource utilization in heartbeat > > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Attachments: YARN-1012-1.patch, YARN-1012-10.patch, > YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, > YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, > YARN-1012-9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3800) Simplify inmemory state for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619767#comment-14619767 ] Anubhav Dhoot commented on YARN-3800: - Test failure seems unrelated and filed as flaky test in YARN-3342 Checkstyle issue is preexisting (number of parameters > 7) > Simplify inmemory state for ReservationAllocation > - > > Key: YARN-3800 > URL: https://issues.apache.org/jira/browse/YARN-3800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3800.001.patch, YARN-3800.002.patch, > YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch, > YARN-3800.005.patch > > > Instead of storing the ReservationRequest we store the Resource for > allocations, as thats the only thing we need. Ultimately we convert > everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3800) Simplify inmemory state for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619761#comment-14619761 ] Hadoop QA commented on YARN-3800: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 20s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | javac | 7m 47s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 48s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 48s | The applied patch generated 1 new checkstyle issues (total was 55, now 50). | | {color:green}+1{color} | whitespace | 0m 4s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 24s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 51m 5s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 89m 37s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744375/YARN-3800.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 2e3d83f | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8466/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8466/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8466/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8466/console | This message was automatically generated. > Simplify inmemory state for ReservationAllocation > - > > Key: YARN-3800 > URL: https://issues.apache.org/jira/browse/YARN-3800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3800.001.patch, YARN-3800.002.patch, > YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch, > YARN-3800.005.patch > > > Instead of storing the ReservationRequest we store the Resource for > allocations, as thats the only thing we need. Ultimately we convert > everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619759#comment-14619759 ] Zhijie Shen commented on YARN-3901: --- [~vrushalic], just want to confirm with you that the jira won't cover app_flow table, right? I need to flow mapping for implementing the reader apis against HBase backend. If it's not covered here, I can help to implement it in the scope of YARN-3049. > Populate flow run data in the flow_run table > > > Key: YARN-3901 > URL: https://issues.apache.org/jira/browse/YARN-3901 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > > As per the schema proposed in YARN-3815 in > https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf > filing jira to track creation and population of data in the flow run table. > Some points that are being considered: > - Stores per flow run information aggregated across applications, flow version > RM’s collector writes to on app creation and app completion > - Per App collector writes to it for metric updates at a slower frequency > than the metric updates to application table > primary key: cluster ! user ! flow ! flow run id > - Only the latest version of flow-level aggregated metrics will be kept, even > if the entity and application level keep a timeseries. > - The running_apps column will be incremented on app creation, and > decremented on app completion. > - For min_start_time the RM writer will simply write a value with the tag for > the applicationId. A coprocessor will return the min value of all written > values. - > - Upon flush and compactions, the min value between all the cells of this > column will be written to the cell without any tag (empty tag) and all the > other cells will be discarded. > - Ditto for the max_end_time, but then the max will be kept. > - Tags are represented as #type:value. The type can be not set (0), or can > indicate running (1) or complete (2). In those cases (for metrics) only > complete app metrics are collapsed on compaction. > - The m! values are aggregated (summed) upon read. Only when applications are > completed (indicated by tag type 2) can the values be collapsed. > - The application ids that have completed and been aggregated into the flow > numbers are retained in a separate column for historical tracking: we don’t > want to re-aggregate for those upon replay > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.
[ https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619755#comment-14619755 ] Devaraj K commented on YARN-3813: - Thanks [~nijel] and [~rohithsharma] for the design proposal. {quote} New auxillary service : RMAppTimeOutService Responsibility is to track the running application. Simple logic //if job is running and the time elapsed kill if ((RMAppState == SUBMITTED/ACCEPTED/RUNNING) && && (currentTime - app.getSubmitTime()) >= timeout {quote} How frequently are you going to check this condition for each application? Can we have a monitor something like RMAppTimeOutMonitor which extends AbstractLivelinessMonitor, when the application gets submitted to RM then we can register the application with RMAppTimeOutMonitor using the user specified timeout. And when the timeout reaches, RMAppTimeOutMonitor can trigger an event to take an action further. bq. Yes, having a separate TIMEOUT event and TIMEOUT state is good approach and other option. Initially we consider to have new state TIMEOUT which require very huge changes across all the modules. I feel having a TIMEOUT state for RMAppImpl would be proper here. When RMAppTimeOutMonitor triggers an event on timeout for an application, RMAppImpl can move the state to TIMEOUT state from any of the non-final states and during the transition it can handle stopping the running attempt and the containers. I don't see here that there will be so many changes required for achieving it. > Support Application timeout feature in YARN. > - > > Key: YARN-3813 > URL: https://issues.apache.org/jira/browse/YARN-3813 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: nijel > Attachments: YARN Application Timeout .pdf > > > It will be useful to support Application Timeout in YARN. Some use cases are > not worried about the output of the applications if the application is not > completed in a specific time. > *Background:* > The requirement is to show the CDR statistics of last few minutes, say for > every 5 minutes. The same Job will run continuously with different dataset. > So one job will be started in every 5 minutes. The estimate time for this > task is 2 minutes or lesser time. > If the application is not completing in the given time the output is not > useful. > *Proposal* > So idea is to support application timeout, with which timeout parameter is > given while submitting the job. > Here, user is expecting to finish (complete or kill) the application in the > given time. > One option for us is to move this logic to Application client (who submit the > job). > But it will be nice if it can be generic logic and can make more robust. > Kindly provide your suggestions/opinion on this feature. If it sounds good, i > will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619729#comment-14619729 ] Zhijie Shen commented on YARN-3836: --- bq. I see that we're implementing the Comparable interface for all 3 types. I'm wondering if it makes sense for them. What would it mean to order TimelineEntity instances? Does it mean much? Where would it be useful? Do we need to implement it? The same questions go for the other 2 types... For example, compareTo of TimelineEntity is used to order the entities in the return set of getEntities query. It would be better to return the entities ordered by timestamp instead of randomly. bq. his is an open question. Is the id alone the identity or does the timestamp together form the identity? Do we expect users of TimelineEvent always be able to provide the timestamp? Honestly I'm not 100% sure what the contract is, and we probably want to make it explicit (and add it to the javadoc). Thoughts? In ATS v1, we actually use id + timestamp to uniquely identify an event. On merit of doing this is to let the app to put the same event multiple times. For example, a job can request resource many times. Every time it can put a RESOURCE_REQUEST event with a unique timestamp and fill in the resource information. > add equals and hashCode to TimelineEntity and other classes in the data model > - > > Key: YARN-3836 > URL: https://issues.apache.org/jira/browse/YARN-3836 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Li Lu > Attachments: YARN-3836-YARN-2928.001.patch > > > Classes in the data model API (e.g. {{TimelineEntity}}, > {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or > {{hashCode()}}. This can cause problems when these objects are used in a > collection such as a {{HashSet}}. We should implement these methods wherever > appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3116: -- Attachment: YARN-3116.v8.patch Fixed TestAppRunnability as well in the new patch. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, > YARN-3116.v7.patch, YARN-3116.v8.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619702#comment-14619702 ] Sangjin Lee commented on YARN-3836: --- Thanks [~gtCarrera9] for your quick patch! I agree mostly with your 2 points above. I also did take a quick look at the patch, and here are my initial comments. I see that we're implementing the {{Comparable}} interface for all 3 types. I'm wondering if it makes sense for them. What would it mean to order {{TimelineEntity}} instances? Does it mean much? Where would it be useful? Do we need to implement it? The same questions go for the other 2 types... (TimelineEntity.java) What I would prefer is to override {{equals()}} and {{hashCode()}} for {{Identifier}} instead, and have simple {{equals()}} and {{hashCode()}} implementations for {{TimelineEntity}} that mostly delegate to {{Identifier}}. The rationale is that {{Identifier}} can be useful as keys to collections in its own right, and thus should override those methods. One related question for your use case of putting entities into a map: I notice that you're using the {{TimelineEntity}} instances directly as keys to maps. Wouldn't it be better to use their {{Identifier}} instances as keys instead? {{Identifier}} instances are easier and cheaper to construct and compare. We still need {{equals()}} and {{hashCode()}} on {{TimelineEntity}} itself because they can be used in sets too. - l.42: We should make {{isValid()}} a proper javadoc hyperlink - l.510: Although this is probably going to be true for the most part, this check is a little bit stronger than I expected. We're essentially saying the actual class types of two objects must match precisely. People might extend classes further. Since we're checking the entity type and the id, wouldn't it be sufficient to check whether the object is an instance of {{TimelineEntity}}? (TimelineEvent.java) This is an open question. Is the id alone the identity or does the timestamp together form the identity? Do we expect users of {{TimelineEvent}} always be able to provide the timestamp? Honestly I'm not 100% sure what the contract is, and we probably want to make it explicit (and add it to the javadoc). Thoughts? - l.100: same comment on the class as above (TimelineMetric.java) - l.144: same comment on the class as above > add equals and hashCode to TimelineEntity and other classes in the data model > - > > Key: YARN-3836 > URL: https://issues.apache.org/jira/browse/YARN-3836 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Li Lu > Attachments: YARN-3836-YARN-2928.001.patch > > > Classes in the data model API (e.g. {{TimelineEntity}}, > {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or > {{hashCode()}}. This can cause problems when these objects are used in a > collection such as a {{HashSet}}. We should implement these methods wherever > appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619689#comment-14619689 ] Hadoop QA commented on YARN-2194: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 6m 29s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 53s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 19s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 38s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 20s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 13s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 15s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 24m 42s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744373/YARN-2194-7.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 2e3d83f | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8465/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8465/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8465/console | This message was automatically generated. > Cgroups cease to work in RHEL7 > -- > > Key: YARN-2194 > URL: https://issues.apache.org/jira/browse/YARN-2194 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Wei Yan >Assignee: Sidharta Seethana >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, > YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch > > > In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the > controller name leads to container launch failure. > RHEL7 deprecates libcgroup and recommends the user of systemd. However, > systemd has certain shortcomings as identified in this JIRA (see comments). > This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619678#comment-14619678 ] Hadoop QA commented on YARN-3836: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 30s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 43s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 51s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 49s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 45s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 24s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 6s | Tests passed in hadoop-yarn-common. | | | | 46m 32s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744367/YARN-3836-YARN-2928.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 4c5f88f | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8464/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8464/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8464/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8464/console | This message was automatically generated. > add equals and hashCode to TimelineEntity and other classes in the data model > - > > Key: YARN-3836 > URL: https://issues.apache.org/jira/browse/YARN-3836 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Li Lu > Attachments: YARN-3836-YARN-2928.001.patch > > > Classes in the data model API (e.g. {{TimelineEntity}}, > {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or > {{hashCode()}}. This can cause problems when these objects are used in a > collection such as a {{HashSet}}. We should implement these methods wherever > appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619673#comment-14619673 ] Karthik Kambatla commented on YARN-1012: Both the test result and findbugs warnings look unrelated. +1. Checking this in. > Report NM aggregated container resource utilization in heartbeat > > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Attachments: YARN-1012-1.patch, YARN-1012-10.patch, > YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, > YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, > YARN-1012-9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.
[ https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619669#comment-14619669 ] Rohith Sharma K S commented on YARN-3813: - Thanks [~sunilg] for going through the design doc and feedback. bq. BasicAppMonitoringManager which can keep an entry of . Basically we mean Auxillary service is a separate service that start a new thread monitoring running applications i.e. very similar to any other service in RM like ZKRMStateStore/ClientRMService. bq. Could we have a new TIMEOUT event in RMAppImpl for this. In that case, we may not need a flag. Yes, having a separate TIMEOUT event and TIMEOUT state is good approach and other option. Initially we consider to have new state TIMEOUT which require very huge changes across all the modules. To keep it simple, able to manage in KILLED state with proper diagnostic message and having new flag. New flag is for identifying whether app is timeout or not, which require in calculating metrics and considering RM restart feature. > Support Application timeout feature in YARN. > - > > Key: YARN-3813 > URL: https://issues.apache.org/jira/browse/YARN-3813 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: nijel > Attachments: YARN Application Timeout .pdf > > > It will be useful to support Application Timeout in YARN. Some use cases are > not worried about the output of the applications if the application is not > completed in a specific time. > *Background:* > The requirement is to show the CDR statistics of last few minutes, say for > every 5 minutes. The same Job will run continuously with different dataset. > So one job will be started in every 5 minutes. The estimate time for this > task is 2 minutes or lesser time. > If the application is not completing in the given time the output is not > useful. > *Proposal* > So idea is to support application timeout, with which timeout parameter is > given while submitting the job. > Here, user is expecting to finish (complete or kill) the application in the > given time. > One option for us is to move this logic to Application client (who submit the > job). > But it will be nice if it can be generic logic and can make more robust. > Kindly provide your suggestions/opinion on this feature. If it sounds good, i > will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3800) Simplify inmemory state for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3800: Attachment: YARN-3800.005.patch Addressed feedback. I feel the types are already on the left side so should be ok to be left out on the right side. But not a big deal so removed it. > Simplify inmemory state for ReservationAllocation > - > > Key: YARN-3800 > URL: https://issues.apache.org/jira/browse/YARN-3800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3800.001.patch, YARN-3800.002.patch, > YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch, > YARN-3800.005.patch > > > Instead of storing the ReservationRequest we store the Resource for > allocations, as thats the only thing we need. Ultimately we convert > everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3852) Add docker container support to container-executor
[ https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619660#comment-14619660 ] Sidharta Seethana commented on YARN-3852: - One of the test failures ( {{TestPrivilegedOperationExecutor}} ) is unrelated to this patch. Please see the update to YARN-2194 > Add docker container support to container-executor > --- > > Key: YARN-3852 > URL: https://issues.apache.org/jira/browse/YARN-3852 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: Abin Shahab > Attachments: YARN-3852.patch > > > For security reasons, we need to ensure that access to the docker daemon and > the ability to run docker containers is restricted to privileged users ( i.e > users running applications should not have direct access to docker). In order > to ensure the node manager can run docker commands, we need to add docker > support to the container-executor binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619656#comment-14619656 ] Sidharta Seethana commented on YARN-2194: - submitted to jenkins. [~vinodkv] , please take a quick look and commit ? > Cgroups cease to work in RHEL7 > -- > > Key: YARN-2194 > URL: https://issues.apache.org/jira/browse/YARN-2194 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Wei Yan >Assignee: Sidharta Seethana >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, > YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch > > > In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the > controller name leads to container launch failure. > RHEL7 deprecates libcgroup and recommends the user of systemd. However, > systemd has certain shortcomings as identified in this JIRA (see comments). > This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana updated YARN-2194: Attachment: YARN-2194-7.patch attaching patch with a fix for unit test issue. > Cgroups cease to work in RHEL7 > -- > > Key: YARN-2194 > URL: https://issues.apache.org/jira/browse/YARN-2194 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Wei Yan >Assignee: Sidharta Seethana >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, > YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch > > > In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the > controller name leads to container launch failure. > RHEL7 deprecates libcgroup and recommends the user of systemd. However, > systemd has certain shortcomings as identified in this JIRA (see comments). > This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana reassigned YARN-2194: --- Assignee: Sidharta Seethana (was: Wei Yan) > Cgroups cease to work in RHEL7 > -- > > Key: YARN-2194 > URL: https://issues.apache.org/jira/browse/YARN-2194 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Wei Yan >Assignee: Sidharta Seethana >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, > YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch > > > In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the > controller name leads to container launch failure. > RHEL7 deprecates libcgroup and recommends the user of systemd. However, > systemd has certain shortcomings as identified in this JIRA (see comments). > This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana reopened YARN-2194: - [~ywskycn] , [~vvasudev] So, it looks like the final version of the patch that was eventually committed didn't actually go through jenkins ( wasn't submitted to jenkins or something else went wrong during submission ). There seems to be a test failing that needs to be fixed ( see below ) {code} testSquashCGroupOperationsWithValidOperations(org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.TestPrivilegedOperationExecutor) Time elapsed: 0.178 sec <<< FAILURE! org.junit.ComparisonFailure: expected:<...n/container_01/tasks[,net_cls/hadoop_yarn/container_01/tasks,]blkio/hadoop_yarn/co...> but was:<...n/container_01/tasks[%net_cls/hadoop_yarn/container_01/tasks%]blkio/hadoop_yarn/co...> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.TestPrivilegedOperationExecutor.testSquashCGroupOperationsWithValidOperations(TestPrivilegedOperationExecutor.java:225) {code} thanks, -Sidharta > Cgroups cease to work in RHEL7 > -- > > Key: YARN-2194 > URL: https://issues.apache.org/jira/browse/YARN-2194 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Wei Yan >Assignee: Wei Yan >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, > YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch > > > In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the > controller name leads to container launch failure. > RHEL7 deprecates libcgroup and recommends the user of systemd. However, > systemd has certain shortcomings as identified in this JIRA (see comments). > This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619637#comment-14619637 ] Hadoop QA commented on YARN-3116: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 19m 26s | Findbugs (version 3.0.0) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 7m 53s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 25s | The applied patch generated 1 new checkstyle issues (total was 9, now 10). | | {color:green}+1{color} | whitespace | 0m 4s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 47s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 26s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 6m 17s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 51m 6s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 107m 48s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744340/YARN-3116.v7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 2e3d83f | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8461/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8461/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8461/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8461/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8461/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8461/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8461/console | This message was automatically generated. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3852) Add docker container support to container-executor
[ https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619628#comment-14619628 ] Abin Shahab commented on YARN-3852: --- [~vvasudev] I'm looking at the test failures. However, can you review the patch? > Add docker container support to container-executor > --- > > Key: YARN-3852 > URL: https://issues.apache.org/jira/browse/YARN-3852 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: Abin Shahab > Attachments: YARN-3852.patch > > > For security reasons, we need to ensure that access to the docker daemon and > the ability to run docker containers is restricted to privileged users ( i.e > users running applications should not have direct access to docker). In order > to ensure the node manager can run docker commands, we need to add docker > support to the container-executor binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619607#comment-14619607 ] Sangjin Lee commented on YARN-3047: --- Sounds good. I'll commit the patch shortly. > [Data Serving] Set up ATS reader with basic request serving structure and > lifecycle > --- > > Key: YARN-3047 > URL: https://issues.apache.org/jira/browse/YARN-3047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: Timeline_Reader(draft).pdf, > YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, > YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, > YARN-3047-YARN-2928.12.patch, YARN-3047-YARN-2928.13.patch, > YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, > YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, > YARN-3047.04.patch > > > Per design in YARN-2938, set up the ATS reader as a service and implement the > basic structure as a service. It includes lifecycle management, request > serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3836: Attachment: YARN-3836-YARN-2928.001.patch In this patch I added equals and hashCode methods to timeline entity and related classes, and added some javadoc describing issues raised by [~jrottinghuis]. There are two things that I think worth a discussion here: # Possible definitions of equivalence: had some offline discussion with [~zjshen] and we thought it would be fine to say two timeline entities are equal if their type and id are equal. As raised in this JIRA, oftentimes we'd like to put timeline entities in a hashmap (e.g. for aggregations). Our current design is sufficient to support use cases like: {{aggregatedEntity = map.get(incomingEntity); aggregatedEntity.aggregate(incomingEntity); }}. Of course user can always implement a deep comparison afterwards. # Checking the validity of objects: due to the requirements of the restful interface, we have to expose default constructors. However, this will cause several member variables of an timeline data object to be nulls, which is quite error pruning. I'm adding the isValid method to assist users to check if an object is valid (with all required fields set). > add equals and hashCode to TimelineEntity and other classes in the data model > - > > Key: YARN-3836 > URL: https://issues.apache.org/jira/browse/YARN-3836 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Li Lu > Attachments: YARN-3836-YARN-2928.001.patch > > > Classes in the data model API (e.g. {{TimelineEntity}}, > {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or > {{hashCode()}}. This can cause problems when these objects are used in a > collection such as a {{HashSet}}. We should implement these methods wherever > appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3852) Add docker container support to container-executor
[ https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619591#comment-14619591 ] Hadoop QA commented on YARN-3852: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 32s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 39s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | whitespace | 0m 4s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 19s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | yarn tests | 5m 59s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 21m 31s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.TestDeletionService | | | hadoop.yarn.server.nodemanager.containermanager.linux.privileged.TestPrivilegedOperationExecutor | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744358/YARN-3852.patch | | Optional Tests | javac unit | | git revision | trunk / 2e3d83f | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8463/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8463/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8463/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8463/console | This message was automatically generated. > Add docker container support to container-executor > --- > > Key: YARN-3852 > URL: https://issues.apache.org/jira/browse/YARN-3852 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: Abin Shahab > Attachments: YARN-3852.patch > > > For security reasons, we need to ensure that access to the docker daemon and > the ability to run docker containers is restricted to privileged users ( i.e > users running applications should not have direct access to docker). In order > to ensure the node manager can run docker commands, we need to add docker > support to the container-executor binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619590#comment-14619590 ] Jian He commented on YARN-3866: --- [~mding], thanks for the work ! some comments on the patch: - Mark all getters/setters unstable for now. - DecreasedContainer.java/IncreasedContainer.java - how about reusing the Container.java object? - increaseRequests/decreaseRequests - We may just pass one list of changeResourceRequests instead of differentiating whether it’s increase or decrease ? as the underlying implementations are the same. IMO, this also saves application writers from differentiating them programmatically. {code} List increaseRequests, List decreaseRequests) {code} > AM-RM protocol changes to support container resizing > > > Key: YARN-3866 > URL: https://issues.apache.org/jira/browse/YARN-3866 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-3866.1.patch, YARN-3866.2.patch > > > YARN-1447 and YARN-1448 are outdated. > This ticket deals with AM-RM Protocol changes to support container resize > according to the latest design in YARN-1197. > 1) Add increase/decrease requests in AllocateRequest > 2) Get approved increase/decrease requests from RM in AllocateResponse > 3) Add relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619586#comment-14619586 ] Giovanni Matteo Fumarola commented on YARN-3116: [~xgong] thanks for the comment, it's an accurate observation. [~zjshen], I think it is a good idea. I can start remove the flag and insert a new enum as you suggested. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619564#comment-14619564 ] Zhijie Shen commented on YARN-3116: --- Xuan, thanks for your comment. I think this is a good point. To be forward compatible, it's better to use the enum here instead of the boolean flag. In this case, we can add more enum, such as SystemContainer and so on in the future without adding new flag and breaking the compatibility. [~giovanni.fumarola], [~subru], how do you think? > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619551#comment-14619551 ] Hadoop QA commented on YARN-3878: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 33s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 50s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 50s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 28s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 35s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | | | 39m 37s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744343/YARN-3878.08.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 2e3d83f | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8462/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8462/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8462/console | This message was automatically generated. > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3878.01.patch, YARN-3878.02.patch, > YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, > YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RM
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619552#comment-14619552 ] Giovanni Matteo Fumarola commented on YARN-3116: TestAppRunnability::testNotUserAsDefaultQueue is related to this patch. The fix to solve the issue on TestFairScheduler::testQueueMaxAMShare brings the previous test to failure. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3800) Simplify inmemory state for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619550#comment-14619550 ] Carlo Curino commented on YARN-3800: Patch generally looks good, and I spoke with [~subru] that explained me why you are making these changes, and overall makes sense. A couple nits and then I am ok to commit this: 1) I think it is nicer to have types in HashMap<> and TreeMap<> initializations. 2) In other places you did this change already, but in TestRLESparseResourceAllocation you have a generateAllocation that still produces ReservationRequests and then you immediately convert to Resource. Probably easier to change generateAllocation Thanks for the work on this patch. > Simplify inmemory state for ReservationAllocation > - > > Key: YARN-3800 > URL: https://issues.apache.org/jira/browse/YARN-3800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3800.001.patch, YARN-3800.002.patch, > YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch > > > Instead of storing the ReservationRequest we store the Resource for > allocations, as thats the only thing we need. Ultimately we convert > everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3852) Add docker container support to container-executor
[ https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-3852: -- Attachment: YARN-3852.patch Changes to container executor for running docker from LCE > Add docker container support to container-executor > --- > > Key: YARN-3852 > URL: https://issues.apache.org/jira/browse/YARN-3852 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: Abin Shahab > Attachments: YARN-3852.patch > > > For security reasons, we need to ensure that access to the docker daemon and > the ability to run docker containers is restricted to privileged users ( i.e > users running applications should not have direct access to docker). In order > to ensure the node manager can run docker commands, we need to add docker > support to the container-executor binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619526#comment-14619526 ] Xuan Gong commented on YARN-3116: - The patch looks fine overall. Only one comment: Instead of just specify a boolean flag to indicate the AM container, how about adding an enum of the containerType for the future extensibility ? such as https://issues.apache.org/jira/browse/YARN-2261, we will have the post application clean up container. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619518#comment-14619518 ] Zhijie Shen commented on YARN-3116: --- Is TestAppRunnability failure related to this patch? The normal practice is to check if the test failure is related to the code change in this jira. If not, you can go ahead to fix a separate jira to tackling it. Thanks for fixing TestPrivilegedOperationExecutor. It seems to be straightforward. So let's keep it here. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619500#comment-14619500 ] Hadoop QA commented on YARN-3047: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 7s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 50s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 47s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 23s | There were no new checkstyle issues. | | {color:blue}0{color} | shellcheck | 1m 23s | Shellcheck was not available. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 39s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 56s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 21s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 1m 23s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 46m 31s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744326/YARN-3047-YARN-2928.13.patch | | Optional Tests | shellcheck javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 499ce52 | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8460/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8460/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8460/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8460/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8460/console | This message was automatically generated. > [Data Serving] Set up ATS reader with basic request serving structure and > lifecycle > --- > > Key: YARN-3047 > URL: https://issues.apache.org/jira/browse/YARN-3047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: Timeline_Reader(draft).pdf, > YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, > YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, > YARN-3047-YARN-2928.12.patch, YARN-3047-YARN-2928.13.patch, > YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, > YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, > YARN-3047.04.patch > > > Per design in YARN-2938, set up the ATS reader as a service and implement the > basic structure as a service. It includes lifecycle management, request > serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3900) Protobuf layout of yarn_security_token causes errors in other protos that include it
[ https://issues.apache.org/jira/browse/YARN-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619496#comment-14619496 ] Hadoop QA commented on YARN-3900: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 44s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 47s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 51s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 46s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 55s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 3m 10s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | {color:red}-1{color} | yarn tests | 50m 56s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 100m 29s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | | | hadoop.yarn.server.resourcemanager.reservation.TestFairReservationSystem | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744322/YARN-3900.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 2e3d83f | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8458/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8458/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8458/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8458/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8458/console | This message was automatically generated. > Protobuf layout of yarn_security_token causes errors in other protos that > include it > - > > Key: YARN-3900 > URL: https://issues.apache.org/jira/browse/YARN-3900 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3900.001.patch > > > Because of the subdirectory server used in > {{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto}} > there are errors in other protos that include them. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619478#comment-14619478 ] Karthik Kambatla commented on YARN-3878: +1, pending Jenkins. Will go ahead and commit this once Jenkins is okay. > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3878.01.patch, YARN-3878.02.patch, > YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, > YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000700b79250> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) > at java.lang.Thread.run(Thread.java:744) > "main" prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() > [0x7fb989851000] >
[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619445#comment-14619445 ] Hadoop QA commented on YARN-1012: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 39s | Pre-patch trunk has 3 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 23s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 15s | The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 7m 1s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 6m 4s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 55m 45s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-common | | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.linux.privileged.TestPrivilegedOperationExecutor | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744329/YARN-1012-11.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 2e3d83f | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8459/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8459/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8459/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8459/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8459/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8459/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8459/console | This message was automatically generated. > Report NM aggregated container resource utilization in heartbeat > > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Attachments: YARN-1012-1.patch, YARN-1012-10.patch, > YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, > YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, > YARN-1012-9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3878: --- Attachment: YARN-3878.08.patch > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3878.01.patch, YARN-3878.02.patch, > YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, > YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000700b79250> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) > at java.lang.Thread.run(Thread.java:744) > "main" prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() > [0x7fb989851000] >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619440#comment-14619440 ] Zhijie Shen commented on YARN-3047: --- Thanks for kicking another jenkins build. IAC, the patch looks good to me. > [Data Serving] Set up ATS reader with basic request serving structure and > lifecycle > --- > > Key: YARN-3047 > URL: https://issues.apache.org/jira/browse/YARN-3047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: Timeline_Reader(draft).pdf, > YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, > YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, > YARN-3047-YARN-2928.12.patch, YARN-3047-YARN-2928.13.patch, > YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, > YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, > YARN-3047.04.patch > > > Per design in YARN-2938, set up the ATS reader as a service and implement the > basic structure as a service. It includes lifecycle management, request > serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619434#comment-14619434 ] Giovanni Matteo Fumarola commented on YARN-3116: Thanks [~zjshen] for fixing the test failure. For the TestPrivilegedOperationExecutor I just applied a new patch with the fix. I got the same problem with TestAppRunnability when I was working on TestFairScheduler::testQueueMaxAMShare. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-3116: --- Attachment: YARN-3116.v7.patch > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619428#comment-14619428 ] Sangjin Lee commented on YARN-3047: --- The build seems to be horked. Kicked off another jenkins run to see if it clears up. > [Data Serving] Set up ATS reader with basic request serving structure and > lifecycle > --- > > Key: YARN-3047 > URL: https://issues.apache.org/jira/browse/YARN-3047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: Timeline_Reader(draft).pdf, > YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, > YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, > YARN-3047-YARN-2928.12.patch, YARN-3047-YARN-2928.13.patch, > YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, > YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, > YARN-3047.04.patch > > > Per design in YARN-2938, set up the ATS reader as a service and implement the > basic structure as a service. It includes lifecycle management, request > serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3901) Populate flow run data in the flow_run table
Vrushali C created YARN-3901: Summary: Populate flow run data in the flow_run table Key: YARN-3901 URL: https://issues.apache.org/jira/browse/YARN-3901 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vrushali C Assignee: Vrushali C As per the schema proposed in YARN-3815 in https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf filing jira to track creation and population of data in the flow run table. Some points that are being considered: - Stores per flow run information aggregated across applications, flow version RM’s collector writes to on app creation and app completion - Per App collector writes to it for metric updates at a slower frequency than the metric updates to application table primary key: cluster ! user ! flow ! flow run id - Only the latest version of flow-level aggregated metrics will be kept, even if the entity and application level keep a timeseries. - The running_apps column will be incremented on app creation, and decremented on app completion. - For min_start_time the RM writer will simply write a value with the tag for the applicationId. A coprocessor will return the min value of all written values. - - Upon flush and compactions, the min value between all the cells of this column will be written to the cell without any tag (empty tag) and all the other cells will be discarded. - Ditto for the max_end_time, but then the max will be kept. - Tags are represented as #type:value. The type can be not set (0), or can indicate running (1) or complete (2). In those cases (for metrics) only complete app metrics are collapsed on compaction. - The m! values are aggregated (summed) upon read. Only when applications are completed (indicated by tag type 2) can the values be collapsed. - The application ids that have completed and been aggregated into the flow numbers are retained in a separate column for historical tracking: we don’t want to re-aggregate for those upon replay -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619408#comment-14619408 ] Hadoop QA commented on YARN-3047: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 38s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 8m 0s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 4s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 23s | The applied patch generated 1 new checkstyle issues (total was 214, now 214). | | {color:blue}0{color} | shellcheck | 1m 44s | Shellcheck was not available. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 40s | mvn install still works. | | {color:red}-1{color} | eclipse:eclipse | 0m 14s | The patch failed to build with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 54s | Post-patch findbugs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common compilation is broken. | | {color:red}-1{color} | findbugs | 2m 12s | Post-patch findbugs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice compilation is broken. | | {color:green}+1{color} | findbugs | 2m 12s | The patch does not introduce any new Findbugs (version ) warnings. | | {color:red}-1{color} | yarn tests | 0m 18s | Tests failed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 0m 19s | Tests failed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 0m 12s | Tests failed in hadoop-yarn-server-timelineservice. | | | | 42m 50s | | \\ \\ || Reason || Tests || | Failed build | hadoop-yarn-api | | | hadoop-yarn-common | | | hadoop-yarn-server-timelineservice | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744326/YARN-3047-YARN-2928.13.patch | | Optional Tests | shellcheck javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 499ce52 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8457/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8457/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8457/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8457/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8457/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8457/console | This message was automatically generated. > [Data Serving] Set up ATS reader with basic request serving structure and > lifecycle > --- > > Key: YARN-3047 > URL: https://issues.apache.org/jira/browse/YARN-3047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: Timeline_Reader(draft).pdf, > YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, > YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, > YARN-3047-YARN-2928.12.patch, YARN-3047-YARN-2928.13.patch, > YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, > YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, > YARN-3047.04.patch > > > Per design in YARN-2938, set up the ATS reader as a service and implement the > basic structure as a service. It includes lifecycle management, request > serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode
[ https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619399#comment-14619399 ] Karthik Kambatla commented on YARN-2962: Oh, sorry. For changing split index post upgrade to 3.x.y, it would be nice to make the change seamlessly. If that is not possible, requiring a format should be okay as long as we document it clearly. > ZKRMStateStore: Limit the number of znodes under a znode > > > Key: YARN-2962 > URL: https://issues.apache.org/jira/browse/YARN-2962 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch > > > We ran into this issue where we were hitting the default ZK server message > size configs, primarily because the message had too many znodes even though > they individually they were all small. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619400#comment-14619400 ] Naganarasimha G R commented on YARN-3045: - +1, This seems to be a good idea for having priority in events... > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1012) Report NM aggregated container resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-1012: -- Attachment: YARN-1012-11.patch Fixed checkstyle issues (I hope the package-info is done properly). I could not find a reason for the FindBug; in another patch it just disappeared so let's hope this is the case. > Report NM aggregated container resource utilization in heartbeat > > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Attachments: YARN-1012-1.patch, YARN-1012-10.patch, > YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, > YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, > YARN-1012-9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619347#comment-14619347 ] Zhijie Shen commented on YARN-3049: --- Updated the title accordingly to describe the scope of this jira more accurately. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3049: -- Summary: [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend (was: [Storage Implementation] Implement the storage reader interface to fetch raw data) > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3049) [Storage Implementation] Implement the storage reader interface to fetch raw data
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3049: -- Summary: [Storage Implementation] Implement the storage reader interface to fetch raw data (was: [Compatiblity] Implement existing ATS queries in the new ATS design) > [Storage Implementation] Implement the storage reader interface to fetch raw > data > - > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3047: --- Attachment: YARN-3047-YARN-2928.13.patch > [Data Serving] Set up ATS reader with basic request serving structure and > lifecycle > --- > > Key: YARN-3047 > URL: https://issues.apache.org/jira/browse/YARN-3047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: Timeline_Reader(draft).pdf, > YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, > YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, > YARN-3047-YARN-2928.12.patch, YARN-3047-YARN-2928.13.patch, > YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, > YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, > YARN-3047.04.patch > > > Per design in YARN-2938, set up the ATS reader as a service and implement the > basic structure as a service. It includes lifecycle management, request > serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3900) Protobuf layout of yarn_security_token causes errors in other protos that include it
[ https://issues.apache.org/jira/browse/YARN-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3900: Attachment: YARN-3900.001.patch > Protobuf layout of yarn_security_token causes errors in other protos that > include it > - > > Key: YARN-3900 > URL: https://issues.apache.org/jira/browse/YARN-3900 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3900.001.patch > > > Because of the subdirectory server used in > {{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto}} > there are errors in other protos that include them. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619317#comment-14619317 ] Wei Yan commented on YARN-2194: --- [~vinodkv], Thanks for pointing it out. IMO, I don't think we need additional documentation as the patch doesn't bring new configuration or new implementation mechanism. We need a new documentation when we bring the systemd. > Cgroups cease to work in RHEL7 > -- > > Key: YARN-2194 > URL: https://issues.apache.org/jira/browse/YARN-2194 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Wei Yan >Assignee: Wei Yan >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, > YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch > > > In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the > controller name leads to container launch failure. > RHEL7 deprecates libcgroup and recommends the user of systemd. However, > systemd has certain shortcomings as identified in this JIRA (see comments). > This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619306#comment-14619306 ] Vinod Kumar Vavilapalli commented on YARN-2194: --- [~ywskycn] / [~vvasudev], do we need any additional documentation for this? Say at http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html ? > Cgroups cease to work in RHEL7 > -- > > Key: YARN-2194 > URL: https://issues.apache.org/jira/browse/YARN-2194 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Wei Yan >Assignee: Wei Yan >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, > YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch > > > In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the > controller name leads to container launch failure. > RHEL7 deprecates libcgroup and recommends the user of systemd. However, > systemd has certain shortcomings as identified in this JIRA (see comments). > This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3900) Protobuf layout of yarn_security_token causes errors in other protos that include it
[ https://issues.apache.org/jira/browse/YARN-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619307#comment-14619307 ] Anubhav Dhoot commented on YARN-3900: - Simply printing EpochProto.toString causes the following error which on debugging shows the culprit. The exception is thrown in Descriptors {noformat} for (int i = 0; i < proto.getDependencyCount(); i++) { if (!dependencies[i].getName().equals(proto.getDependency(i))) { throw new DescriptorValidationException(result, "Dependencies passed to FileDescriptor.buildFrom() don't match " + "those listed in the FileDescriptorProto."); {noformat} And looking at the variables the mismatch is for {noformat} dependencies[i].getName() = {java.lang.String@856} "server/yarn_security_token.proto" proto.getDependency(i) = {java.lang.String@857} "yarn_security_token.proto" {noformat} Here is the error {noformat} java.lang.ExceptionInInitializerError at org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$EpochProto.internalGetFieldAccessorTable(YarnServerResourceManagerRecoveryProtos.java:3522) at com.google.protobuf.GeneratedMessage.getAllFieldsMutable(GeneratedMessage.java:105) at com.google.protobuf.GeneratedMessage.getAllFields(GeneratedMessage.java:153) at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:272) at com.google.protobuf.TextFormat$Printer.access$400(TextFormat.java:248) at com.google.protobuf.TextFormat.print(TextFormat.java:71) at com.google.protobuf.TextFormat.printToString(TextFormat.java:118) at com.google.protobuf.AbstractMessage.toString(AbstractMessage.java:106) at org.apache.hadoop.yarn.server.resourcemanager.recovery.TestProtos.testResourceProto(TestProtos.java:32) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:78) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:212) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) Caused by: java.lang.IllegalArgumentException: Invalid embedded descriptor for "yarn_server_resourcemanager_recovery.proto". at com.google.protobuf.Descriptors$FileDescriptor.internalBuildGeneratedFileFrom(Descriptors.java:301) at org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos.(YarnServerResourceManagerRecoveryProtos.java:5370) ... 35 more Caused by: com.google.protobuf.Descriptors$DescriptorValidationException: yarn_server_resourcemanager_recovery.proto: Dependencies passed to FileDescriptor.buildFrom() don't match those listed in the FileDescriptorProto. at com.google.protobuf.Descriptors$FileDescriptor.buildFrom(Descriptors.java:246) at com.google.protobuf.Descriptors$FileDescriptor.internalBuildGeneratedFileFrom(Descriptors.java:299) ... 36 more {noformat} > Protobuf layout of yarn_security_token causes errors in other protos that > include it > - > > Key: YARN-3900 > URL: https://issues.apache.org/jira/browse/YARN-3900 > Project:
[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.
[ https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619291#comment-14619291 ] Vinod Kumar Vavilapalli commented on YARN-3813: --- A few years ago when this came up, I recommended doing this on top of YARN. But I've seen this enough in the wild to yield now. It's a useful feature to come out of the box in YARN. Small enough, so I think we should go ahead with the implementation - not a lot of design dimensions. > Support Application timeout feature in YARN. > - > > Key: YARN-3813 > URL: https://issues.apache.org/jira/browse/YARN-3813 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: nijel > Attachments: YARN Application Timeout .pdf > > > It will be useful to support Application Timeout in YARN. Some use cases are > not worried about the output of the applications if the application is not > completed in a specific time. > *Background:* > The requirement is to show the CDR statistics of last few minutes, say for > every 5 minutes. The same Job will run continuously with different dataset. > So one job will be started in every 5 minutes. The estimate time for this > task is 2 minutes or lesser time. > If the application is not completing in the given time the output is not > useful. > *Proposal* > So idea is to support application timeout, with which timeout parameter is > given while submitting the job. > Here, user is expecting to finish (complete or kill) the application in the > given time. > One option for us is to move this logic to Application client (who submit the > job). > But it will be nice if it can be generic logic and can make more robust. > Kindly provide your suggestions/opinion on this feature. If it sounds good, i > will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619270#comment-14619270 ] Karthik Kambatla commented on YARN-1012: [~elgoiri] - could you look into the checkstyle and findbugs warnings please? > Report NM aggregated container resource utilization in heartbeat > > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Attachments: YARN-1012-1.patch, YARN-1012-10.patch, > YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, YARN-1012-5.patch, > YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, YARN-1012-9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3643) Provide a way to store only running applications in the state store
[ https://issues.apache.org/jira/browse/YARN-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619264#comment-14619264 ] Karthik Kambatla commented on YARN-3643: That looks sufficient. Thanks for checking, Varun. > Provide a way to store only running applications in the state store > --- > > Key: YARN-3643 > URL: https://issues.apache.org/jira/browse/YARN-3643 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Varun Saxena > > Today, we have a config that determines the number of applications that can > be stored in the state-store. Since there is no easy way to figure out the > maximum number of running applications at any point in time, users are forced > to use a conservative estimate. Our default ends up being even more > conservative. > It would be nice to allow storing all running applications with a > conservative upper bound for it. This should allow for shorter recovery times > in most deployments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode
[ https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619262#comment-14619262 ] Varun Saxena commented on YARN-2962: [~kasha], I mean if let us say somebody configures the split index as 3 initially but later wants to change it to 2. In such a case we assume state store will have to be formatted ? > ZKRMStateStore: Limit the number of znodes under a znode > > > Key: YARN-2962 > URL: https://issues.apache.org/jira/browse/YARN-2962 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch > > > We ran into this issue where we were hitting the default ZK server message > size configs, primarily because the message had too many znodes even though > they individually they were all small. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode
[ https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619254#comment-14619254 ] Karthik Kambatla commented on YARN-2962: Since we don't support rolling upgrades across major versions, it should be okay to require a state-store format. > ZKRMStateStore: Limit the number of znodes under a znode > > > Key: YARN-2962 > URL: https://issues.apache.org/jira/browse/YARN-2962 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch > > > We ran into this issue where we were hitting the default ZK server message > size configs, primarily because the message had too many znodes even though > they individually they were all small. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3886) Add cumulative wait times of apps at Queue level
[ https://issues.apache.org/jira/browse/YARN-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3886: --- Component/s: (was: yarn) scheduler resourcemanager > Add cumulative wait times of apps at Queue level > > > Key: YARN-3886 > URL: https://issues.apache.org/jira/browse/YARN-3886 > Project: Hadoop YARN > Issue Type: Task > Components: resourcemanager, scheduler >Reporter: Raju Bairishetti >Assignee: Raju Bairishetti > > Right now, we are having number of apps submitted/failed/killed/running at > queue level. We don't have any way to find on which queue apps are waiting > more time. > I hope adding wait times of apps at queue level will be helpful in viewing > the overall queue status. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619245#comment-14619245 ] Li Lu commented on YARN-3836: - Thanks [~vrushalic]! > add equals and hashCode to TimelineEntity and other classes in the data model > - > > Key: YARN-3836 > URL: https://issues.apache.org/jira/browse/YARN-3836 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Li Lu > > Classes in the data model API (e.g. {{TimelineEntity}}, > {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or > {{hashCode()}}. This can cause problems when these objects are used in a > collection such as a {{HashSet}}. We should implement these methods wherever > appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3900) Protobuf layout of yarn_security_token causes errors in other protos that include it
Anubhav Dhoot created YARN-3900: --- Summary: Protobuf layout of yarn_security_token causes errors in other protos that include it Key: YARN-3900 URL: https://issues.apache.org/jira/browse/YARN-3900 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Because of the subdirectory server used in {{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto}} there are errors in other protos that include them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619154#comment-14619154 ] Hadoop QA commented on YARN-313: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 2s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 9s | The applied patch generated 4 new checkstyle issues (total was 229, now 232). | | {color:green}+1{color} | whitespace | 0m 6s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 6m 51s | Tests failed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 51m 24s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 109m 21s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.client.cli.TestRMAdminCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744257/YARN-313-v6.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4119ad3 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8456/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8456/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8456/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8456/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8456/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8456/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8456/console | This message was automatically generated. > Add Admin API for supporting node resource configuration in command line > > > Key: YARN-313 > URL: https://issues.apache.org/jira/browse/YARN-313 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-313-sample.patch, YARN-313-v1.patch, > YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, > YARN-313-v6.patch > > > We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" > to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3836: - Assignee: Li Lu (was: Vrushali C) > add equals and hashCode to TimelineEntity and other classes in the data model > - > > Key: YARN-3836 > URL: https://issues.apache.org/jira/browse/YARN-3836 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Li Lu > > Classes in the data model API (e.g. {{TimelineEntity}}, > {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or > {{hashCode()}}. This can cause problems when these objects are used in a > collection such as a {{HashSet}}. We should implement these methods wherever > appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619145#comment-14619145 ] Vrushali C commented on YARN-3836: -- Hi [~gtCarrera] Sounds good, will reassign to you. thanks Vrushali > add equals and hashCode to TimelineEntity and other classes in the data model > - > > Key: YARN-3836 > URL: https://issues.apache.org/jira/browse/YARN-3836 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Vrushali C > > Classes in the data model API (e.g. {{TimelineEntity}}, > {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or > {{hashCode()}}. This can cause problems when these objects are used in a > collection such as a {{HashSet}}. We should implement these methods wherever > appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3899) Add equals and hashCode to TimelineEntity
[ https://issues.apache.org/jira/browse/YARN-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu resolved YARN-3899. - Resolution: Duplicate Duplicate to YARN-3836. > Add equals and hashCode to TimelineEntity > - > > Key: YARN-3899 > URL: https://issues.apache.org/jira/browse/YARN-3899 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu > > We need to add equals and hashCode methods for timeline entity so that we can > easily tell if two timeline entities are equal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619140#comment-14619140 ] Li Lu commented on YARN-3836: - Hi [~vrushalic], I'd like to check the progress of this JIRA. Currently I'm blocked by this when building time-based aggregations. If you have any bandwidth problems maybe I can take this over? Thanks! > add equals and hashCode to TimelineEntity and other classes in the data model > - > > Key: YARN-3836 > URL: https://issues.apache.org/jira/browse/YARN-3836 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Vrushali C > > Classes in the data model API (e.g. {{TimelineEntity}}, > {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or > {{hashCode()}}. This can cause problems when these objects are used in a > collection such as a {{HashSet}}. We should implement these methods wherever > appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3899) Add equals and hashCode to TimelineEntity
[ https://issues.apache.org/jira/browse/YARN-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619125#comment-14619125 ] Varun Saxena commented on YARN-3899: [~gtCarrera9], YARN-3836 is meant for the same thing > Add equals and hashCode to TimelineEntity > - > > Key: YARN-3899 > URL: https://issues.apache.org/jira/browse/YARN-3899 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu > > We need to add equals and hashCode methods for timeline entity so that we can > easily tell if two timeline entities are equal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3899) Add equals and hashCode to TimelineEntity
[ https://issues.apache.org/jira/browse/YARN-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3899: Issue Type: Sub-task (was: Improvement) Parent: YARN-2928 > Add equals and hashCode to TimelineEntity > - > > Key: YARN-3899 > URL: https://issues.apache.org/jira/browse/YARN-3899 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu > > We need to add equals and hashCode methods for timeline entity so that we can > easily tell if two timeline entities are equal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3899) Add equals and hashCode to TimelineEntity
Li Lu created YARN-3899: --- Summary: Add equals and hashCode to TimelineEntity Key: YARN-3899 URL: https://issues.apache.org/jira/browse/YARN-3899 Project: Hadoop YARN Issue Type: Improvement Reporter: Li Lu Assignee: Li Lu We need to add equals and hashCode methods for timeline entity so that we can easily tell if two timeline entities are equal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3838) Rest API failing when ip configured in RM address in secure https mode
[ https://issues.apache.org/jira/browse/YARN-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619072#comment-14619072 ] Karthik Kambatla commented on YARN-3838: I don't know my way around in this neck of the woods. [~vinodkv], [~xgong] know a thing or two. > Rest API failing when ip configured in RM address in secure https mode > -- > > Key: YARN-3838 > URL: https://issues.apache.org/jira/browse/YARN-3838 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-HADOOP-12096.patch, 0001-YARN-3810.patch, > 0001-YARN-3838.patch, 0002-YARN-3810.patch, 0002-YARN-3838.patch > > > Steps to reproduce > === > 1.Configure hadoop.http.authentication.kerberos.principal as below > {code:xml} > > hadoop.http.authentication.kerberos.principal > HTTP/_h...@hadoop.com > > {code} > 2. In RM web address also configure IP > 3. Startup RM > Call Rest API for RM {{ curl -i -k --insecure --negotiate -u : https IP > /ws/v1/cluster/info"}} > *Actual* > Rest API failing > {code} > 2015-06-16 19:03:49,845 DEBUG > org.apache.hadoop.security.authentication.server.AuthenticationFilter: > Authentication exception: GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos credentails) > org.apache.hadoop.security.authentication.client.AuthenticationException: > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos credentails) > at > org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:348) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:519) > at > org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-313: - Attachment: YARN-313-v6.patch Trying to fix the broken unit test for refreshNodes (I still don't understand how it breaks). No success so far. Any ideas? > Add Admin API for supporting node resource configuration in command line > > > Key: YARN-313 > URL: https://issues.apache.org/jira/browse/YARN-313 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-313-sample.patch, YARN-313-v1.patch, > YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, > YARN-313-v6.patch > > > We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" > to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3658) Federation "Capacity Allocation" across sub-cluster
[ https://issues.apache.org/jira/browse/YARN-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618902#comment-14618902 ] Carlo Curino commented on YARN-3658: See the presentation attached to the umbrella jira YARN-2915. > Federation "Capacity Allocation" across sub-cluster > --- > > Key: YARN-3658 > URL: https://issues.apache.org/jira/browse/YARN-3658 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > > This JIRA will track mechanisms to map federation level capacity allocations > to sub-cluster level ones. (Possibly via reservation mechanisms). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2915) Enable YARN RM scale out via federation using multiple RM's
[ https://issues.apache.org/jira/browse/YARN-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618894#comment-14618894 ] Carlo Curino commented on YARN-2915: Lei, first let me make sure we are on the same page regarding router. The router is "soft-state" and a rather lightweight components, so we envision multiple routers to run in each data-center, and definitely agreed that we will have at least one router per DC if/when we run a federation cross-DC. Lei, regarding the (good) question you asked about ARMMProxy. The comment is derived from some early experimentation we did with the AMRMProxy from YARN-2884. The idea is that you could use the mux/demux mechanics that the AMRMproxy provides to hide multiple standalone YARN clusters (not part of a federation), behind a single AMRMProxy. The scenarios goes as follows, you have a (possibly small) cluster that I will call the "launchpad" running one or more AMRMProxy(s), and say 2 standalone YARN clusters (C1, C2) that are not federation enabled. Jobs can be submitted to C1, C2 directly as always, and jobs that want to span, could be submitted to the "launchpad" cluster. By customizing the policy in the AMRMProxy that determines how we forward requests to clusters, you can have an AM running on the launchpad cluster to forward the requests to both C1 and C2. For C1 and C2 this will look like as if you submitted an unmanaged AM in each cluster. The job on the other hand thinks he is talking with a single RM that happens to run somewhere in the "launchpad" cluster (typically on the same node), but this is just the AMRMProxy impersonating an RM. To make this even more clear: we don't strictly need an AMRMProxy on each node for the story to work. However, given our current thinking/experimentation we see advantages in running the AMRMProxy on each node, such as: we avoid 2 network hops, we have a better AM-AMRMProxy ratios so we are more resilient to DDOS on the AMRMProtocol, less partitioning scenarios to consider, etc... so this is what we are advocating for in federation. In federation, we go a step further and we ask C1 and C2 to commit to sharing resources in the federation (by heartbeating to the StateStore), and we provide lot more mechanics around it (e.g., UIs that show the overall use of resources across clusters, rebalancing mechanisms, fault-tolerance mechanics, etc..), that makes for a tighter overall experience. Overall, I think running the entire federation code will be better, but I was pointing out that some of the pieces we are building could be leveraged in isolation for more lightweight / ad-hoc forms of cross-cluster interaction. The rule-based global router that [~subru] mentioned above falls in the same category. > Enable YARN RM scale out via federation using multiple RM's > --- > > Key: YARN-2915 > URL: https://issues.apache.org/jira/browse/YARN-2915 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Sriram Rao >Assignee: Subru Krishnan > Attachments: FEDERATION_CAPACITY_ALLOCATION_JIRA.pdf, > Yarn_federation_design_v1.pdf, federation-prototype.patch > > > This is an umbrella JIRA that proposes to scale out YARN to support large > clusters comprising of tens of thousands of nodes. That is, rather than > limiting a YARN managed cluster to about 4k in size, the proposal is to > enable the YARN managed cluster to be elastically scalable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3888) ApplicationMaster link is broken in RM WebUI when appstate is NEW
[ https://issues.apache.org/jira/browse/YARN-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618861#comment-14618861 ] Bibin A Chundatt commented on YARN-3888: Please review the patch attached. > ApplicationMaster link is broken in RM WebUI when appstate is NEW > -- > > Key: YARN-3888 > URL: https://issues.apache.org/jira/browse/YARN-3888 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: 0001-YARN-3888.patch, 0002-YARN-3888.patch > > > When the application state is NEW in RM Web UI *Application Master* link is > broken. > {code} > 15/07/06 19:46:16 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > 15/07/06 19:46:18 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > 15/07/06 19:46:20 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > {code} > *URL formed* > http://:45020/cluster/app/application_1436191509558_0003 > The above link is broken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration
[ https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3894: --- Description: Currently in capacity Scheduler when capacity configuration is wrong RM will shutdown, but not incase of NodeLabels capacity mismatch In {{CapacityScheduler#initializeQueues}} {code} private void initializeQueues(CapacitySchedulerConfiguration conf) throws IOException { root = parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, queues, queues, noop); labelManager.reinitializeQueueLabels(getQueueToLabels()); root = parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, queues, queues, noop); LOG.info("Initialized root queue " + root); initializeQueueMappings(); setQueueAcls(authorizer, queues); } {code} {{labelManager}} is initialized from queues and calculation for Label level capacity mismatch happens in {{parseQueue}} . So during initialization {{parseQueue}} the labels will be empty . *Steps to reproduce* # Configure RM with capacity scheduler # Add one or two node label from rmadmin # Configure capacity xml with nodelabel but issue with capacity configuration for already added label # Restart both RM # Check on service init of capacity scheduler node label list is populated *Expected* RM should not start *Current exception on reintialize check* {code} 2015-07-07 19:18:25,655 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, usedResources=, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0 2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh queues. java.io.IOException: Failed to re-init queues at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for children of queue root for label=node2 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379) ... 8 more 2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf OPERATION=refreshQueues TARGET=AdminService RESULT=FAILURE DESCRIPTION=Exception refresh queues. PERMISSIONS= 2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf OPERATION=transitionToActiveTARGET=RMHAProtocolService RESULT=FAILURE DESCRIPTION=Exception transitioning to active PERMISSIONS= 2015-07-07 19:18:25,656 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:321) at org.apache.hadoop.yarn.server.resourcemanager.
[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3893: --- Description: Cases that can cause this. # Capacity scheduler xml is wrongly configured during switch # Refresh ACL failure due to configuration # Refresh User group failure due to configuration Continuously both RM will try to be active {code} dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> ./yarn rmadmin -getServiceState rm1 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> ./yarn rmadmin -getServiceState rm2 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active {code} # Both Web UI active # Status shown as active for both RM was: Cases that can cause failure # Capacity scheduler xml is wrongly configured in switch # Refresh ACL failure due to configuration # Refresh User group failure due to configuration Capacity failure condition have given logs below Continuously both RM will try to be active {code} 2015-07-07 19:18:25,655 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, usedResources=, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0 2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh queues. java.io.IOException: Failed to re-init queues at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for children of queue root for label=node2 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379) ... 8 more 2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf OPERATION=refreshQueues TARGET=AdminService RESULT=FAILURE DESCRIPTION=Exception refresh queues. PERMISSIONS= 2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf OPERATION=transitionToActiveTARGET=RMHAProtocolService RESULT=FAILURE DESCRIPTION=Exception transitioning to active PERMISSIONS= 2015-07-07 19:18:25,656 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:321) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) ... 4 more Caused by: org.apache.hadoop.ha.ServiceFailedExcepti
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618831#comment-14618831 ] Sangjin Lee commented on YARN-3047: --- I need to double check the current state of the code, but in principle the writer will not depend on the configuration for the end point. The writer (collector) is created dynamically on a per-app basis, and its end point is registered on the RM. That's how timeline clients discover the end points. So I doubt that the configuration is used any longer to define the writer end point (correct me if I'm wrong [~zjshen]). > [Data Serving] Set up ATS reader with basic request serving structure and > lifecycle > --- > > Key: YARN-3047 > URL: https://issues.apache.org/jira/browse/YARN-3047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: Timeline_Reader(draft).pdf, > YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, > YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, > YARN-3047-YARN-2928.12.patch, YARN-3047.001.patch, YARN-3047.003.patch, > YARN-3047.005.patch, YARN-3047.006.patch, YARN-3047.007.patch, > YARN-3047.02.patch, YARN-3047.04.patch > > > Per design in YARN-2938, set up the ATS reader as a service and implement the > basic structure as a service. It includes lifecycle management, request > serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618812#comment-14618812 ] Hadoop QA commented on YARN-3896: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 13s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 23s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 24s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 59s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 87m 46s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744222/YARN-3896.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / bd4e109 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8455/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8455/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8455/console | This message was automatically generated. > RMNode transitioned from RUNNING to REBOOTED because its response id had not > been reset > --- > > Key: YARN-3896 > URL: https://issues.apache.org/jira/browse/YARN-3896 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3896.01.patch > > > {noformat} > 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: > Resolved 10.208.132.153 to /default-rack > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Reconnect from the node at: 10.208.132.153 > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered > with capability: , assigned nodeId > 10.208.132.153:8041 > 2015-07-03 16:49:39,104 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far > behind rm response id:2506413 nm response id:0 > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node 10.208.132.153:8041 as it is now REBOOTED > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED > {noformat} > The node(10.208.132.153) reconnected with RM. When it registered with RM, RM > set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's > heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration
[ https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618776#comment-14618776 ] Sunil G commented on YARN-3894: --- Thanks [~bibinchundatt] for reporting and providing analysis. During {{initScheduler}} call from *CapacityScheduler#serviceInit*, we will initialize the queues. In the same callflow, we also will validate the capacity of nodelabel against the queue capacity from {{ParentQueue#setChildQueues}}. {code} // check label capacities for (String nodeLabel : labelManager.getClusterNodeLabelNames()) { float capacityByLabel = queueCapacities.getCapacity(nodeLabel); // check children's labels float sum = 0; for (CSQueue queue : childQueues) { sum += queue.getQueueCapacities().getCapacity(nodeLabel); } if ((capacityByLabel > 0 && Math.abs(1.0f - sum) > PRECISION) || (capacityByLabel == 0) && (sum > 0)) { throw new IllegalArgumentException("Illegal" + " capacity of " + sum + " for children of queue " + queueName + " for label=" + nodeLabel); } } {code} As per this code, if there is a mismatch in capacity for nodelabel against the queue capacity, it should through *IllegalArgumentException*. But this will not happen in a case where we configure a wrong capacity for label in cs xml, and restart RM. *Issue:* During {{CommonNodeLabelsManager#serviceStart}}, labels will re-populated from old mirror file. But {{initScheduler}} and above call flow will happen from *serviceInit* instead of *serviceStart* This will make {{labelManager.getClusterNodeLabelNames()}} call as empty in above code. and desired exception wont be thrown. IMO We can move the node label init and recovery to serviceInit rather than serviceStart. [~leftnoteasy], could you please pool in your thoughts. > RM startup should fail for wrong CS xml NodeLabel capacity configuration > - > > Key: YARN-3894 > URL: https://issues.apache.org/jira/browse/YARN-3894 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: capacity-scheduler.xml > > > Currently in capacity Scheduler when capacity configuration is wrong > RM shutdown is the current behaviour, but not incase of NodeLabels capacity > mismatch > In {{CapacityScheduler#initializeQueues}} > {code} > private void initializeQueues(CapacitySchedulerConfiguration conf) > throws IOException { > root = > parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, > queues, queues, noop); > labelManager.reinitializeQueueLabels(getQueueToLabels()); > root = > parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, > queues, queues, noop); > LOG.info("Initialized root queue " + root); > initializeQueueMappings(); > setQueueAcls(authorizer, queues); > } > {code} > {{labelManager}} is initialized from queues and calculation for Label level > capacity mismatch happens in {{parseQueue}} . So during initialization > {{parseQueue}} the labels will be empty . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3898) YARN web console only proxies GET to application master but doesn't provide any feedback for other HTTP methods
Kam Kasravi created YARN-3898: - Summary: YARN web console only proxies GET to application master but doesn't provide any feedback for other HTTP methods Key: YARN-3898 URL: https://issues.apache.org/jira/browse/YARN-3898 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.1 Reporter: Kam Kasravi Priority: Minor YARN web console should provide some feedback when filtering (and preventing) DELETE, POST, PUT, etc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-3896: Target Version/s: 2.8.0 > RMNode transitioned from RUNNING to REBOOTED because its response id had not > been reset > --- > > Key: YARN-3896 > URL: https://issues.apache.org/jira/browse/YARN-3896 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3896.01.patch > > > {noformat} > 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: > Resolved 10.208.132.153 to /default-rack > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Reconnect from the node at: 10.208.132.153 > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered > with capability: , assigned nodeId > 10.208.132.153:8041 > 2015-07-03 16:49:39,104 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far > behind rm response id:2506413 nm response id:0 > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node 10.208.132.153:8041 as it is now REBOOTED > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED > {noformat} > The node(10.208.132.153) reconnected with RM. When it registered with RM, RM > set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's > heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618750#comment-14618750 ] Devaraj K commented on YARN-3896: - Thanks [~hex108] for delivering the patch quickly. Can you also add a test to simulate the scenario as part of the patch? > RMNode transitioned from RUNNING to REBOOTED because its response id had not > been reset > --- > > Key: YARN-3896 > URL: https://issues.apache.org/jira/browse/YARN-3896 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3896.01.patch > > > {noformat} > 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: > Resolved 10.208.132.153 to /default-rack > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Reconnect from the node at: 10.208.132.153 > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered > with capability: , assigned nodeId > 10.208.132.153:8041 > 2015-07-03 16:49:39,104 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far > behind rm response id:2506413 nm response id:0 > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node 10.208.132.153:8041 as it is now REBOOTED > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED > {noformat} > The node(10.208.132.153) reconnected with RM. When it registered with RM, RM > set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's > heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration
[ https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618743#comment-14618743 ] Bibin A Chundatt commented on YARN-3894: Detailed analysis and root cause # Capacity scheduler queue initialization happens Capacity#serviceInit # RMNodeLabelsManager#addToCluserNodeLabels store is added in service start on recovery > RM startup should fail for wrong CS xml NodeLabel capacity configuration > - > > Key: YARN-3894 > URL: https://issues.apache.org/jira/browse/YARN-3894 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: capacity-scheduler.xml > > > Currently in capacity Scheduler when capacity configuration is wrong > RM shutdown is the current behaviour, but not incase of NodeLabels capacity > mismatch > In {{CapacityScheduler#initializeQueues}} > {code} > private void initializeQueues(CapacitySchedulerConfiguration conf) > throws IOException { > root = > parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, > queues, queues, noop); > labelManager.reinitializeQueueLabels(getQueueToLabels()); > root = > parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, > queues, queues, noop); > LOG.info("Initialized root queue " + root); > initializeQueueMappings(); > setQueueAcls(authorizer, queues); > } > {code} > {{labelManager}} is initialized from queues and calculation for Label level > capacity mismatch happens in {{parseQueue}} . So during initialization > {{parseQueue}} the labels will be empty . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618738#comment-14618738 ] Varun Saxena commented on YARN-3878: Yeah but we still need to check the thread state. Then let me add something in DrainDispatcher(subclass of AsyncDispatcher) to return thread state and wait on it. > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3878.01.patch, YARN-3878.02.patch, > YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, > YARN-3878.06.patch, YARN-3878.07.patch > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000700b79250> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) > at java.lang.Thread.run(Thread.java:744) > "main" prio=10 tid=0x7fb980
[jira] [Commented] (YARN-3892) NPE on RMStateStore#serviceStop when CapacityScheduler#serviceInit fails
[ https://issues.apache.org/jira/browse/YARN-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618697#comment-14618697 ] Hudson commented on YARN-3892: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #238 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/238/]) YARN-3892. Fixed NPE on RMStateStore#serviceStop when CapacityScheduler#serviceInit fails. Contributed by Bibin A Chundatt (jianhe: rev c9dd2cada055c0beffd04bad0ded8324f66ad1b7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/CHANGES.txt > NPE on RMStateStore#serviceStop when CapacityScheduler#serviceInit fails > > > Key: YARN-3892 > URL: https://issues.apache.org/jira/browse/YARN-3892 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: 0001-YARN-3892.patch > > > NPE on RMStateStore#serviceStop when CapacityScheduler#serviceInit fails > {code} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.closeInternal(ZKRMStateStore.java:315) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:516) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:250) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:598) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:954) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:254) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1184) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)