[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy
[ https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820419#comment-15820419 ] Yufei Gu commented on YARN-4212: Thanks [~rchiang] and [~kasha]'s review. I uploaded patch 006 for all your comments, some comments: - Change {{Set}} to {{Set}} instead of {{Set}} to avoid unnecessary new objects while adding items to the set. - {{checkIfParentPolicyAllowed}} doesn't need to be recursive because of simplicity of what policies are allowed, basically we can consider {drf, fair, fifo} as a total order set. I modify it to a un-recursive version, and we can modify it to recursion whenever necessary. - Add preorder reinitializing for existing queues while reloading the alloc file. - Add test cases for reloading the alloc file and for policy violation in different levels. > FairScheduler: Parent queues is not allowed to be 'Fair' policy if its > children have the "drf" policy > - > > Key: YARN-4212 > URL: https://issues.apache.org/jira/browse/YARN-4212 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Yufei Gu > Labels: fairscheduler > Attachments: YARN-4212.002.patch, YARN-4212.003.patch, > YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, > YARN-4212.1.patch > > > The Fair Scheduler, while performing a {{recomputeShares()}} during an > {{update()}} call, uses the parent queues policy to distribute shares to its > children. > If the parent queues policy is 'fair', it only computes weight for memory and > sets the vcores fair share of its children to 0. > Assuming a situation where we have 1 parent queue with policy 'fair' and > multiple leaf queues with policy 'drf', Any app submitted to the child queues > with vcore requirement > 1 will always be above fairshare, since during the > recomputeShare process, the child queues were all assigned 0 for fairshare > vcores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy
[ https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-4212: --- Attachment: YARN-4212.006.patch > FairScheduler: Parent queues is not allowed to be 'Fair' policy if its > children have the "drf" policy > - > > Key: YARN-4212 > URL: https://issues.apache.org/jira/browse/YARN-4212 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Yufei Gu > Labels: fairscheduler > Attachments: YARN-4212.002.patch, YARN-4212.003.patch, > YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, > YARN-4212.1.patch > > > The Fair Scheduler, while performing a {{recomputeShares()}} during an > {{update()}} call, uses the parent queues policy to distribute shares to its > children. > If the parent queues policy is 'fair', it only computes weight for memory and > sets the vcores fair share of its children to 0. > Assuming a situation where we have 1 parent queue with policy 'fair' and > multiple leaf queues with policy 'drf', Any app submitted to the child queues > with vcore requirement > 1 will always be above fairshare, since during the > recomputeShare process, the child queues were all assigned 0 for fairshare > vcores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6064) Support fromId for flowRuns and flow/flowRun apps REST API's
[ https://issues.apache.org/jira/browse/YARN-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820339#comment-15820339 ] Varun Saxena commented on YARN-6064: Maybe javadoc can be {code} Defines the flow run id. If specified, retrieve the next set of flow runs from the given id. The set of flow runs retrieved is inclusive of specified fromId. {code} > Support fromId for flowRuns and flow/flowRun apps REST API's > > > Key: YARN-6064 > URL: https://issues.apache.org/jira/browse/YARN-6064 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Attachments: YARN-6064-YARN-5355.0001.patch, > YARN-6064-YARN-5355.0002.patch, YARN-6064-YARN-5355.0003.patch > > > Splitting out JIRA YARN-6027 for pagination support for flowRuns, flow apps > and flow run apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6064) Support fromId for flowRuns and flow/flowRun apps REST API's
[ https://issues.apache.org/jira/browse/YARN-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820316#comment-15820316 ] Rohith Sharma K S commented on YARN-6064: - Given no more concern on the Java Doc, I will attach patch with exception log message change. > Support fromId for flowRuns and flow/flowRun apps REST API's > > > Key: YARN-6064 > URL: https://issues.apache.org/jira/browse/YARN-6064 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Attachments: YARN-6064-YARN-5355.0001.patch, > YARN-6064-YARN-5355.0002.patch, YARN-6064-YARN-5355.0003.patch > > > Splitting out JIRA YARN-6027 for pagination support for flowRuns, flow apps > and flow run apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820173#comment-15820173 ] Naganarasimha G R edited comment on YARN-6072 at 1/12/17 4:50 AM: -- Thanks for the contributions [~ajithshetty] and [~bibinchundatt] for testing and raising the issue in detail. Thanks for additional reviews from [~djp], [~jianhe] & [~kasha]. Committed the patch to branch-2.8, branch-2 and trunk ! was (Author: naganarasimha): Thanks for the contributions [~ajithshetty] and [~bibinchundatt] for testing and raising the issue in detail. Thanks for additional reviews from [~djp], [~jianhe] & [~kasha]. > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > YARN-6072.03.branch-2.8.patch, YARN-6072.03.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at >
[jira] [Commented] (YARN-6008) Fetch container list for failed application attempt
[ https://issues.apache.org/jira/browse/YARN-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820160#comment-15820160 ] Ajith S commented on YARN-6008: --- I agree to this, will upload initial patch for the same shortly > Fetch container list for failed application attempt > --- > > Key: YARN-6008 > URL: https://issues.apache.org/jira/browse/YARN-6008 > Project: Hadoop YARN > Issue Type: Bug > Environment: hadoop version 2.6 >Reporter: Priyanka Gugale >Assignee: Ajith S > > When we run command "yarn container -list" on using failed application > attempt we should either get containers from that attempt or get a back list > as containers are no longer in running state. > Steps to reproduce: > 1. Launch a yarn application. > 2. Kill app master, it tries to restart application with new attempt id. > 3. Now run yarn command, > yarn container -list > Where Application Attempt ID is of failed attempt, > it lists the container from next attempt which is in "RUNNING" state right > now. > Expected behavior: > It should return list of killed containers from attempt 1 or empty list. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5864) YARN Capacity Scheduler - Queue Priorities
[ https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-5864: - Attachment: YARN-5864.006.patch Uploaded ver.6 patch, now made move reserved container to be a configurable option. > YARN Capacity Scheduler - Queue Priorities > -- > > Key: YARN-5864 > URL: https://issues.apache.org/jira/browse/YARN-5864 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-5864.001.patch, YARN-5864.002.patch, > YARN-5864.003.patch, YARN-5864.004.patch, YARN-5864.005.patch, > YARN-5864.006.patch, YARN-5864.poc-0.patch, > YARN-CapacityScheduler-Queue-Priorities-design-v1.pdf > > > Currently, Capacity Scheduler at every parent-queue level uses relative > used-capacities of the chil-queues to decide which queue can get next > available resource first. > For example, > - Q1 & Q2 are child queues under queueA > - Q1 has 20% of configured capacity, 5% of used-capacity and > - Q2 has 80% of configured capacity, 8% of used-capacity. > In the situation, the relative used-capacities are calculated as below > - Relative used-capacity of Q1 is 5/20 = 0.25 > - Relative used-capacity of Q2 is 8/80 = 0.10 > In the above example, per today’s Capacity Scheduler’s algorithm, Q2 is > selected by the scheduler first to receive next available resource. > Simply ordering queues according to relative used-capacities sometimes causes > a few troubles because scarce resources could be assigned to less-important > apps first. > # Latency sensitivity: This can be a problem with latency sensitive > applications where waiting till the ‘other’ queue gets full is not going to > cut it. The delay in scheduling directly reflects in the response times of > these applications. > # Resource fragmentation for large-container apps: Today’s algorithm also > causes issues with applications that need very large containers. It is > possible that existing queues are all within their resource guarantees but > their current allocation distribution on each node may be such that an > application which needs large container simply cannot fit on those nodes. > Services: > # The above problem (2) gets worse with long running applications. With short > running apps, previous containers may eventually finish and make enough space > for the apps with large containers. But with long running services in the > cluster, the large containers’ application may never get resources on any > nodes even if its demands are not yet met. > # Long running services are sometimes more picky w.r.t placement than normal > batch apps. For example, for a long running service in a separate queue (say > queue=service), during peak hours it may want to launch instances on 50% of > the cluster nodes. On each node, it may want to launch a large container, say > 200G memory per container. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5825) ProportionalPreemptionalPolicy could use readLock over LeafQueue instead of synchronized block
[ https://issues.apache.org/jira/browse/YARN-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820042#comment-15820042 ] Sunil G commented on YARN-5825: --- I do not see this as an incompatible change. [~jianhe], could you please confirm. > ProportionalPreemptionalPolicy could use readLock over LeafQueue instead of > synchronized block > -- > > Key: YARN-5825 > URL: https://issues.apache.org/jira/browse/YARN-5825 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5825.0001.patch, YARN-5825.0002.patch > > > Currently in PCPP, {{synchronized (curQueue)}} is used in various places. > Such instances could be replaced with a read lock. Thank you [~jianhe] for > pointing out the same as comment > [here|https://issues.apache.org/jira/browse/YARN-2009?focusedCommentId=15626578=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15626578] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6081) LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved from pending to avoid unnecessary preemption of reserved container
[ https://issues.apache.org/jira/browse/YARN-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820030#comment-15820030 ] Sunil G commented on YARN-6081: --- Thanks [~leftnoteasy] for the updated patch, and thanks [~eepayne] for the review. +1 from end also on latest patch. I will it commit later today. > LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved > from pending to avoid unnecessary preemption of reserved container > > > Key: YARN-6081 > URL: https://issues.apache.org/jira/browse/YARN-6081 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6081.001.patch, YARN-6081.002.patch > > > While doing YARN-5864 tests, found an issue when a queue's reserved > > pending. PreemptionResourceCalculator will preempt reserved container even if > there's only one active queue in the cluster. > To fix the problem, we need to deduct reserved from pending when getting > total-pending resource for LeafQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6058) Support for listing all applications i.e /apps
[ https://issues.apache.org/jira/browse/YARN-6058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820029#comment-15820029 ] Rohith Sharma K S commented on YARN-6058: - +1 for flow type which is very much necessary. There are 2 pieces # When a workflow is submitted by Oozie which has many actions(MR, Tez, Spak), then flow type should be Oozie. Always better to consider submitter as flow type. It can not be a Union of all applications type because each run can have different execution engine. # */apps* is also required because every execution framework has their own UI.(JHS for MR, Tez). These frameworks render entities of respective framework. Say, Oozie submits (MR,Tez) actions then Tez UI renders DAG of Oozie Tez. And similarly, JHS renders job details of Oozie MR. Such cases, */apps* help to get those applications directly rather than going through flows. > Support for listing all applications i.e /apps > -- > > Key: YARN-6058 > URL: https://issues.apache.org/jira/browse/YARN-6058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: yarn-5355-merge-blocker > > Primary use case for /apps is many execution engines runs on top of YARN > example, Tez, MR. These engines will have their own UI's which list specific > type of entities which are published by them Ex: DAG entities. > But, these UI's do not aware of either userName or flowName or applicationId > which are submitted by these engines. > Currently, given that user do not aware of user, flownName, and > applicationId, then he can not retrieve any entities. > By supporting /apps with filters, user can list of application with given > ApplicationType. These applications can be used for retrieving engine > specific entities like DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6071) Fix incompatible API change on AM-RM protocol due to YARN-3866 (trunk only)
[ https://issues.apache.org/jira/browse/YARN-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820014#comment-15820014 ] Hadoop QA commented on YARN-6071: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 47s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 31s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 40m 4s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 86m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6071 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12847132/YARN-6071.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux d7b45e5a9601 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a6b06f7 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14644/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14644/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org |
[jira] [Commented] (YARN-5899) Debug log in AbstractCSQueue#canAssignToThisQueue needs improvement
[ https://issues.apache.org/jira/browse/YARN-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819998#comment-15819998 ] Ying Zhang commented on YARN-5899: -- Thanks [~sunilg] for the review and commit. > Debug log in AbstractCSQueue#canAssignToThisQueue needs improvement > --- > > Key: YARN-5899 > URL: https://issues.apache.org/jira/browse/YARN-5899 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.0.0-alpha1 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Trivial > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5899.001.patch, YARN-5899.002.patch > > > A small fix inside function canAssignToThisQueue() for printing DEBUG info. > Please see patch attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819959#comment-15819959 ] Ying Zhang edited comment on YARN-6031 at 1/12/17 2:41 AM: --- Failed test case (TestRMRestart.testFinishedAppRemovalAfterRMRestart) is known and tracked by YARN-5548. was (Author: ying zhang): Failed test case () is known and tracked by YARN-5548. > Application recovery failed after disabling node label > -- > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-6031.001.patch, YARN-6031.002.patch, > YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, > YARN-6031.006.patch > > > Here is the repro steps: > Enable node label, restart RM, configure CS properly, and run some jobs; > Disable node label, restart RM, and the following exception thrown: > {noformat} > Caused by: > org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: > Invalid resource request, node label not enabled but request contains label > expression > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {noformat} > During RM restart, application recovery failed due to that application had > node label expression specified while node label has been disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819959#comment-15819959 ] Ying Zhang commented on YARN-6031: -- Failed test case () is known and tracked by YARN-5548. > Application recovery failed after disabling node label > -- > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-6031.001.patch, YARN-6031.002.patch, > YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, > YARN-6031.006.patch > > > Here is the repro steps: > Enable node label, restart RM, configure CS properly, and run some jobs; > Disable node label, restart RM, and the following exception thrown: > {noformat} > Caused by: > org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: > Invalid resource request, node label not enabled but request contains label > expression > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {noformat} > During RM restart, application recovery failed due to that application had > node label expression specified while node label has been disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6016) Bugs in AMRMProxy handling (local)AMRMToken
[ https://issues.apache.org/jira/browse/YARN-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819946#comment-15819946 ] Subru Krishnan commented on YARN-6016: -- Thanks [~botong] for the patch. Overall it looks good, I just had one request - can you add/update {{TestAMRMProxy}} as that was supposed to cover this scenario. > Bugs in AMRMProxy handling (local)AMRMToken > --- > > Key: YARN-6016 > URL: https://issues.apache.org/jira/browse/YARN-6016 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Attachments: YARN-6016.v1.patch, YARN-6016.v2.patch > > > Two AMRMProxy bugs: > First, the AMRMToken from RM should not be propagated to AM, since AMRMProxy > will create a local AMRMToken for it. > Second, the AMRMProxy Context is now parse the localAMRMTokenKeyId from > amrmToken, but should be from localAmrmToken. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5304) Ship single node HBase config option with single startup command
[ https://issues.apache.org/jira/browse/YARN-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819904#comment-15819904 ] Sangjin Lee commented on YARN-5304: --- Thanks for the summary [~vrushalic]. It is a good summary of the discussion. Just to add a couple of more fine points, - we would package this timeline service specific hbase configuration file in hadoop - this file would now be required to be present; that would also entail making {{TIMELINE_SERVICE_HBASE_CONFIGURATION_FILE}} a required config and the file it points to required, or {{HBaseTimelineStorageUtils.getTimelineServiceHBaseConf()}} should fail - bringing up hbase would require using this config file via the {{--config}} option (i.e. {{"hbase --config ...}}) > Ship single node HBase config option with single startup command > > > Key: YARN-5304 > URL: https://issues.apache.org/jira/browse/YARN-5304 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Joep Rottinghuis >Assignee: Vrushali C > Labels: YARN-5355, yarn-5355-merge-blocker > > For small to medium Hadoop deployments we should make it dead-simple to use > the timeline service v2. We should have a single command to launch and stop > the timelineservice back-end for the default HBase implementation. > A default config with all the values should be packaged that launches all the > needed daemons (on the RM node) with a single command with all the > recommended settings. > Having a timeline admin command, perhaps an init command might be needed, or > perhaps the timeline service can even auto-detect that and create tables, > deploy needed coprocessors etc. > The overall purpose is to ensure nobody needs to be an HBase expert to get > this going. For those cluster operators with HBase experience, they can > choose their own more sophisticated deployment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819869#comment-15819869 ] Hudson commented on YARN-6072: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #2 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/2/]) YARN-6072. RM unable to start in secure mode. Contributed by Ajith S. (naganarasimha_gr: rev a6b06f71797ad1ed9edbcef279bcf7d9e569f955) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > YARN-6072.03.branch-2.8.patch, YARN-6072.03.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at >
[jira] [Commented] (YARN-5849) Automatically create YARN control group for pre-mounted cgroups
[ https://issues.apache.org/jira/browse/YARN-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819827#comment-15819827 ] Miklos Szegedi commented on YARN-5849: -- Thank you [~bibinchundatt] for the review and [~templedf] for the review and commit. > Automatically create YARN control group for pre-mounted cgroups > --- > > Key: YARN-5849 > URL: https://issues.apache.org/jira/browse/YARN-5849 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5849.000.patch, YARN-5849.001.patch, > YARN-5849.002.patch, YARN-5849.003.patch, YARN-5849.004.patch, > YARN-5849.005.patch, YARN-5849.006.patch, YARN-5849.007.patch, > YARN-5849.008.patch > > > Yarn can be launched with linux-container-executor.cgroups.mount set to > false. It will search for the cgroup mount paths set up by the administrator > parsing the /etc/mtab file. You can also specify > resource.percentage-physical-cpu-limit to limit the CPU resources assigned to > containers. > linux-container-executor.cgroups.hierarchy is the root of the settings of all > YARN containers. If this is specified but not created YARN will fail at > startup: > Caused by: java.io.FileNotFoundException: > /cgroups/cpu/hadoop-yarn/cpu.cfs_period_us (Permission denied) > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.updateCgroup(CgroupsLCEResourcesHandler.java:263) > This JIRA is about automatically creating YARN control group in the case > above. It reduces the cost of administration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6071) Fix incompatible API change on AM-RM protocol due to YARN-3866 (trunk only)
[ https://issues.apache.org/jira/browse/YARN-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6071: - Attachment: YARN-6071.001.patch Attached ver.1 patch for review. > Fix incompatible API change on AM-RM protocol due to YARN-3866 (trunk only) > --- > > Key: YARN-6071 > URL: https://issues.apache.org/jira/browse/YARN-6071 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junping Du >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-6071.001.patch > > > In YARN-3866, we have addendum patch to fix incompatible API change on > branch-2 and branch-2.8. For trunk, we need a similar fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5966) AMRMClient changes to support ExecutionType update
[ https://issues.apache.org/jira/browse/YARN-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819778#comment-15819778 ] Hadoop QA commented on YARN-5966: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 24s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 22s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 20s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 21s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 36s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 48s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 7 new + 101 unchanged - 3 fixed = 108 total (was 104) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 31s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 31s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 28s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 27s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 34s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 33s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 33s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 32s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not
[jira] [Updated] (YARN-5864) YARN Capacity Scheduler - Queue Priorities
[ https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-5864: - Attachment: YARN-5864.005.patch Uploaded ver.5 patch, which include code to print performance information. > YARN Capacity Scheduler - Queue Priorities > -- > > Key: YARN-5864 > URL: https://issues.apache.org/jira/browse/YARN-5864 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-5864.001.patch, YARN-5864.002.patch, > YARN-5864.003.patch, YARN-5864.004.patch, YARN-5864.005.patch, > YARN-5864.poc-0.patch, YARN-CapacityScheduler-Queue-Priorities-design-v1.pdf > > > Currently, Capacity Scheduler at every parent-queue level uses relative > used-capacities of the chil-queues to decide which queue can get next > available resource first. > For example, > - Q1 & Q2 are child queues under queueA > - Q1 has 20% of configured capacity, 5% of used-capacity and > - Q2 has 80% of configured capacity, 8% of used-capacity. > In the situation, the relative used-capacities are calculated as below > - Relative used-capacity of Q1 is 5/20 = 0.25 > - Relative used-capacity of Q2 is 8/80 = 0.10 > In the above example, per today’s Capacity Scheduler’s algorithm, Q2 is > selected by the scheduler first to receive next available resource. > Simply ordering queues according to relative used-capacities sometimes causes > a few troubles because scarce resources could be assigned to less-important > apps first. > # Latency sensitivity: This can be a problem with latency sensitive > applications where waiting till the ‘other’ queue gets full is not going to > cut it. The delay in scheduling directly reflects in the response times of > these applications. > # Resource fragmentation for large-container apps: Today’s algorithm also > causes issues with applications that need very large containers. It is > possible that existing queues are all within their resource guarantees but > their current allocation distribution on each node may be such that an > application which needs large container simply cannot fit on those nodes. > Services: > # The above problem (2) gets worse with long running applications. With short > running apps, previous containers may eventually finish and make enough space > for the apps with large containers. But with long running services in the > cluster, the large containers’ application may never get resources on any > nodes even if its demands are not yet met. > # Long running services are sometimes more picky w.r.t placement than normal > batch apps. For example, for a long running service in a separate queue (say > queue=service), during peak hours it may want to launch instances on 50% of > the cluster nodes. On each node, it may want to launch a large container, say > 200G memory per container. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6071) Fix incompatible API change on AM-RM protocol due to YARN-3866 (trunk only)
[ https://issues.apache.org/jira/browse/YARN-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819734#comment-15819734 ] Wangda Tan commented on YARN-6071: -- [~templedf], yeah I was keep forgetting upload patch for this one. This will be a simple change to pb file, will upload a patch today, no later than tomorrow. > Fix incompatible API change on AM-RM protocol due to YARN-3866 (trunk only) > --- > > Key: YARN-6071 > URL: https://issues.apache.org/jira/browse/YARN-6071 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junping Du >Assignee: Wangda Tan >Priority: Blocker > > In YARN-3866, we have addendum patch to fix incompatible API change on > branch-2 and branch-2.8. For trunk, we need a similar fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819731#comment-15819731 ] Naganarasimha G R commented on YARN-6072: - Thanks [~djp],[~jianhe] and [~kasha] for confirming. Committing the patch now ! > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > YARN-6072.03.branch-2.8.patch, YARN-6072.03.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are added in following order > # EmbeddedElector > # AdminService > During resource manager
[jira] [Commented] (YARN-6071) Fix incompatible API change on AM-RM protocol due to YARN-3866 (trunk only)
[ https://issues.apache.org/jira/browse/YARN-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819729#comment-15819729 ] Daniel Templeton commented on YARN-6071: [~leftnoteasy], what's the story on this one? Are you planning to post a patch soon? > Fix incompatible API change on AM-RM protocol due to YARN-3866 (trunk only) > --- > > Key: YARN-6071 > URL: https://issues.apache.org/jira/browse/YARN-6071 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junping Du >Assignee: Wangda Tan >Priority: Blocker > > In YARN-3866, we have addendum patch to fix incompatible API change on > branch-2 and branch-2.8. For trunk, we need a similar fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5991) Yarn Distributed Shell does not print throwable t to App Master When failed to start container
[ https://issues.apache.org/jira/browse/YARN-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated YARN-5991: -- Hadoop Flags: Reviewed (was: Incompatible change,Reviewed) I'm unsetting the "incompatible" flag since this looks like it just changes a log print, which is not covered by compatibility. > Yarn Distributed Shell does not print throwable t to App Master When failed > to start container > -- > > Key: YARN-5991 > URL: https://issues.apache.org/jira/browse/YARN-5991 > Project: Hadoop YARN > Issue Type: Improvement > Environment: apache hadoop 2.7.1, centos 6.5 >Reporter: dashwang >Assignee: Jim Frankola >Priority: Minor > Labels: newbie > Fix For: 3.0.0-alpha2 > > Attachments: YARN-5991.001.patch > > > 16/12/12 16:27:20 INFO impl.NMClientAsyncImpl: Processing Event EventType: > START_CONTAINER for Container container_1481517162158_0027_01_03 > 16/12/12 16:27:20 INFO impl.NMClientAsyncImpl: Processing Event EventType: > START_CONTAINER for Container container_1481517162158_0027_01_04 > 16/12/12 16:27:20 INFO impl.NMClientAsyncImpl: Processing Event EventType: > START_CONTAINER for Container container_1481517162158_0027_01_02 > 16/12/12 16:27:20 INFO impl.ContainerManagementProtocolProxy: Opening proxy : > slave02:22710 > 16/12/12 16:27:20 INFO impl.ContainerManagementProtocolProxy: Opening proxy : > slave01:34140 > 16/12/12 16:27:20 INFO impl.ContainerManagementProtocolProxy: Opening proxy : > master:52037 > 16/12/12 16:27:20 ERROR launcher.ApplicationMaster: Failed to start Container > container_1481517162158_0027_01_02 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5825) ProportionalPreemptionalPolicy could use readLock over LeafQueue instead of synchronized block
[ https://issues.apache.org/jira/browse/YARN-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819695#comment-15819695 ] Andrew Wang commented on YARN-5825: --- Doing a release notes sweep for 3.0.0-alpha2, noticed this JIRA. If it's incompatible, should it have been committed to branch-2? Also, if this is incompatible, could someone also add a release note detailing the exposure? Thanks! > ProportionalPreemptionalPolicy could use readLock over LeafQueue instead of > synchronized block > -- > > Key: YARN-5825 > URL: https://issues.apache.org/jira/browse/YARN-5825 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5825.0001.patch, YARN-5825.0002.patch > > > Currently in PCPP, {{synchronized (curQueue)}} is used in various places. > Such instances could be replaced with a read lock. Thank you [~jianhe] for > pointing out the same as comment > [here|https://issues.apache.org/jira/browse/YARN-2009?focusedCommentId=15626578=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15626578] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5849) Automatically create YARN control group for pre-mounted cgroups
[ https://issues.apache.org/jira/browse/YARN-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819685#comment-15819685 ] Hudson commented on YARN-5849: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #1 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/1/]) YARN-5849. Automatically create YARN control group for pre-mounted (templedf: rev e6f13fe5d1df8918ffc680d18f9d5576f38893a6) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestCGroupsHandlerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsHandler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsHandlerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCgroups.md * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsCpuResourceHandlerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsBlkioResourceHandlerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TrafficControlBandwidthHandlerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestCGroupsBlkioResourceHandlerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestCGroupsCpuResourceHandlerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestTrafficControlBandwidthHandlerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestCGroupsMemoryResourceHandlerImpl.java > Automatically create YARN control group for pre-mounted cgroups > --- > > Key: YARN-5849 > URL: https://issues.apache.org/jira/browse/YARN-5849 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5849.000.patch, YARN-5849.001.patch, > YARN-5849.002.patch, YARN-5849.003.patch, YARN-5849.004.patch, > YARN-5849.005.patch, YARN-5849.006.patch, YARN-5849.007.patch, > YARN-5849.008.patch > > > Yarn can be launched with linux-container-executor.cgroups.mount set to > false. It will search for the cgroup mount paths set up by the administrator > parsing the /etc/mtab file. You can also specify > resource.percentage-physical-cpu-limit to limit the CPU resources assigned to > containers. > linux-container-executor.cgroups.hierarchy is the root of the settings of all > YARN containers. If this is specified but not created YARN will fail at > startup: > Caused by: java.io.FileNotFoundException: > /cgroups/cpu/hadoop-yarn/cpu.cfs_period_us (Permission denied) > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.updateCgroup(CgroupsLCEResourcesHandler.java:263) > This JIRA is about automatically creating YARN control group in the case > above. It reduces the cost of administration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819587#comment-15819587 ] Jian He commented on YARN-6072: --- looks good to me, +1 > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > YARN-6072.03.branch-2.8.patch, YARN-6072.03.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are added in following order > # EmbeddedElector > # AdminService > During resource manager service start() .EmbeddedElector starts first and > invokes
[jira] [Updated] (YARN-5966) AMRMClient changes to support ExecutionType update
[ https://issues.apache.org/jira/browse/YARN-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-5966: -- Attachment: YARN-5966.002.patch Updating patch. Fixing failed testcase > AMRMClient changes to support ExecutionType update > -- > > Key: YARN-5966 > URL: https://issues.apache.org/jira/browse/YARN-5966 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5966.001.patch, YARN-5966.002.patch, > YARN-5966.wip.001.patch > > > {{AMRMClient}} changes to support change of container ExecutionType -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6083) Add doc for reservation in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-6083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819564#comment-15819564 ] Yufei Gu commented on YARN-6083: Thanks for pointing out, [~subru]. > Add doc for reservation in Fair Scheduler > - > > Key: YARN-6083 > URL: https://issues.apache.org/jira/browse/YARN-6083 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Yufei Gu >Assignee: Yufei Gu > > We can enable reservation on a leaf queue by set the tag for > the queue, there is not doc for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3637) Handle localization sym-linking correctly at the YARN level
[ https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819541#comment-15819541 ] Hadoop QA commented on YARN-3637: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 16s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 55s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-3637 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12847096/YARN-3637-trunk.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 2bec2146a33c 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7979939 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14642/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14642/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Handle localization sym-linking correctly at the YARN level > --- > > Key: YARN-3637 > URL: https://issues.apache.org/jira/browse/YARN-3637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-3637-trunk.001.patch > > > The shared cache needs to handle resource sym-linking at the YARN layer. > Currently, we let the application layer (i.e. mapreduce) handle this, but it > is probably better
[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819482#comment-15819482 ] Eric Payne commented on YARN-5889: -- [~sunilg], - Shoule {{resetUserAddedOrRemoved}} just be {{setUserAddedOrRemoved}} - {{LeafQueue}}: I think {{totalUserConsumedRatio}} should be removed, since it' s not used. - {{LeafQueue#recalculateULCount}} / {{UsersManager#User#cachedULCount}}: I know I came up with the name originally, but I think a better name would be {{recalculateUL}} - {{getComputedActiveUserLimit}} / {{getComputedUserLimit}}: User's {{cachedULCount}} needs to be updated when the UL is recomputed or else it will always be out of sync and will always be recomputed: {code} if (userLimitPerSchedulingMode == null || user.getCachedULCount() != lQueue.getRecalculateULCount()) { userLimitPerSchedulingMode = reComputeUserLimits(rc, userName, nodePartition, clusterResource, false); user.setCachedULCount(lQueue.getRecalculateULCount()); } {code} [~leftnoteasy], bq. User#setCachedCount, should we invalidateUL for the user who allocates/releases containers, or we should invalidate all user limit? I think the latter one is more safe to me. Yes, unfortunately, I think that once the queue goes above its guarantee, the ratio will change when containers are allocated or released. We may be able to do an optimization to only reset the specific user's count when the queue is under its guarantee and all users when it is over, but that may not be worth the added complexity. > Improve user-limit calculation in capacity scheduler > > > Key: YARN-5889 > URL: https://issues.apache.org/jira/browse/YARN-5889 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G > Attachments: YARN-5889.0001.patch, > YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, > YARN-5889.v0.patch, YARN-5889.v1.patch, YARN-5889.v2.patch > > > Currently user-limit is computed during every heartbeat allocation cycle with > a write lock. To improve performance, this tickets is focussing on moving > user-limit calculation out of heartbeat allocation flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6076) Backport YARN-4752 (FS preemption changes) to branch-2
[ https://issues.apache.org/jira/browse/YARN-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-6076: --- Attachment: yarn-6076-branch-2.1.patch > Backport YARN-4752 (FS preemption changes) to branch-2 > -- > > Key: YARN-6076 > URL: https://issues.apache.org/jira/browse/YARN-6076 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-6076-branch-2.1.patch, yarn-6076-branch-2.1.patch > > > YARN-4752 was merged to trunk a while ago, and has been stable. Creating this > JIRA to merge it branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6076) Backport YARN-4752 (FS preemption changes) to branch-2
[ https://issues.apache.org/jira/browse/YARN-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819469#comment-15819469 ] Karthik Kambatla commented on YARN-6076: I am able to build this locally with both java8 and java7. Let me submit this again and see what happens. > Backport YARN-4752 (FS preemption changes) to branch-2 > -- > > Key: YARN-6076 > URL: https://issues.apache.org/jira/browse/YARN-6076 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-6076-branch-2.1.patch > > > YARN-4752 was merged to trunk a while ago, and has been stable. Creating this > JIRA to merge it branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6081) LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved from pending to avoid unnecessary preemption of reserved container
[ https://issues.apache.org/jira/browse/YARN-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819452#comment-15819452 ] Eric Payne commented on YARN-6081: -- +1 LGTM. The failed test ({{TestRMRestart}}) is not related to this patch. > LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved > from pending to avoid unnecessary preemption of reserved container > > > Key: YARN-6081 > URL: https://issues.apache.org/jira/browse/YARN-6081 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6081.001.patch, YARN-6081.002.patch > > > While doing YARN-5864 tests, found an issue when a queue's reserved > > pending. PreemptionResourceCalculator will preempt reserved container even if > there's only one active queue in the cluster. > To fix the problem, we need to deduct reserved from pending when getting > total-pending resource for LeafQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue
[ https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819442#comment-15819442 ] Hudson commented on YARN-5554: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #11108 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11108/]) YARN-5554. MoveApplicationAcrossQueues does not check user permission on (templedf: rev 7979939428ad5df213846e11bc1489bdf94ed9f8) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/QueueACLsManager.java > MoveApplicationAcrossQueues does not check user permission on the target queue > -- > > Key: YARN-5554 > URL: https://issues.apache.org/jira/browse/YARN-5554 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Haibo Chen >Assignee: Wilfred Spiegelenburg > Labels: oct16-medium > Fix For: 3.0.0-alpha2 > > Attachments: YARN-5554.10.patch, YARN-5554.11.patch, > YARN-5554.12.patch, YARN-5554.13.patch, YARN-5554.14.patch, > YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, YARN-5554.5.patch, > YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, YARN-5554.9.patch > > > moveApplicationAcrossQueues operation currently does not check user > permission on the target queue. This incorrectly allows one user to move > his/her own applications to a queue that the user has no access to -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3637) Handle localization sym-linking correctly at the YARN level
[ https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-3637: --- Attachment: YARN-3637-trunk.001.patch Attached is a v01 patch for handling symlink names and fragments as part of the shared cache yarn api. The major part of the patch adds a new parameter to the use api call. This allows a user to specify a preferred name for a resources even if the name of the resource in the shared cache is different. With this additional parameter, the user can avoid naming conflicts that happen when using resources from the shared cache. Note that this patch does not solve the existing problem in YARN where resource symlinks get clobbered if two resources are specified with the same name. Furthermore, this approach assumes the path returned is going to be used to create a LocalResource and is leveraging the way YARN localization uses the fragment portion of a URI. I think this makes it slightly easier for developers to implement shared cache support in their YARN application by abstracting away symlink/fragment management. Thoughts [~sjlee0] or anyone else? > Handle localization sym-linking correctly at the YARN level > --- > > Key: YARN-3637 > URL: https://issues.apache.org/jira/browse/YARN-3637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-3637-trunk.001.patch > > > The shared cache needs to handle resource sym-linking at the YARN layer. > Currently, we let the application layer (i.e. mapreduce) handle this, but it > is probably better for all applications if it is handled transparently. > Here is the scenario: > Imagine two separate jars (with unique checksums) that have the same name > job.jar. > They are stored in the shared cache as two separate resources: > checksum1/job.jar > checksum2/job.jar > A new application tries to use both of these resources, but internally refers > to them as different names: > foo.jar maps to checksum1 > bar.jar maps to checksum2 > When the shared cache returns the path to the resources, both resources are > named the same (i.e. job.jar). Because of this, when the resources are > localized one of them clobbers the other. This is because both symlinks in > the container_id directory are the same name (i.e. job.jar) even though they > point to two separate resource directories. > Originally we tackled this in the MapReduce client by using the fragment > portion of the resource url. This, however, seems like something that should > be solved at the YARN layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5849) Automatically create YARN control group for pre-mounted cgroups
[ https://issues.apache.org/jira/browse/YARN-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819395#comment-15819395 ] Daniel Templeton commented on YARN-5849: Excellent. Thanks, [~bibinchundatt]! +1 Committing soon. > Automatically create YARN control group for pre-mounted cgroups > --- > > Key: YARN-5849 > URL: https://issues.apache.org/jira/browse/YARN-5849 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > Attachments: YARN-5849.000.patch, YARN-5849.001.patch, > YARN-5849.002.patch, YARN-5849.003.patch, YARN-5849.004.patch, > YARN-5849.005.patch, YARN-5849.006.patch, YARN-5849.007.patch, > YARN-5849.008.patch > > > Yarn can be launched with linux-container-executor.cgroups.mount set to > false. It will search for the cgroup mount paths set up by the administrator > parsing the /etc/mtab file. You can also specify > resource.percentage-physical-cpu-limit to limit the CPU resources assigned to > containers. > linux-container-executor.cgroups.hierarchy is the root of the settings of all > YARN containers. If this is specified but not created YARN will fail at > startup: > Caused by: java.io.FileNotFoundException: > /cgroups/cpu/hadoop-yarn/cpu.cfs_period_us (Permission denied) > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.updateCgroup(CgroupsLCEResourcesHandler.java:263) > This JIRA is about automatically creating YARN control group in the case > above. It reduces the cost of administration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6081) LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved from pending to avoid unnecessary preemption of reserved container
[ https://issues.apache.org/jira/browse/YARN-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819374#comment-15819374 ] Hadoop QA commented on YARN-6081: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 30s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 17 new + 931 unchanged - 3 fixed = 948 total (was 934) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 41m 52s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 66m 33s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6081 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12847081/YARN-6081.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux f62055017949 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e648b6e | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/14641/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/14641/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14641/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output |
[jira] [Resolved] (YARN-6083) Add doc for reservation in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-6083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan resolved YARN-6083. -- Resolution: Duplicate [~yufeigu], IIUC this is a duplicate of YARN-4827. I already have a draft version of the doc but I have not uploaded as I am not able to run {{ReservationSystem}} e2e with {{FairScheduler}} as I am blocked by YARN-4859 > Add doc for reservation in Fair Scheduler > - > > Key: YARN-6083 > URL: https://issues.apache.org/jira/browse/YARN-6083 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Yufei Gu >Assignee: Yufei Gu > > We can enable reservation on a leaf queue by set the tag for > the queue, there is not doc for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6081) LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved from pending to avoid unnecessary preemption of reserved container
[ https://issues.apache.org/jira/browse/YARN-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819338#comment-15819338 ] Eric Payne commented on YARN-6081: -- Thanks [~leftnoteasy] for fixing this. I am reviewing today. I will update later today or early tomorrow. > LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved > from pending to avoid unnecessary preemption of reserved container > > > Key: YARN-6081 > URL: https://issues.apache.org/jira/browse/YARN-6081 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6081.001.patch, YARN-6081.002.patch > > > While doing YARN-5864 tests, found an issue when a queue's reserved > > pending. PreemptionResourceCalculator will preempt reserved container even if > there's only one active queue in the cluster. > To fix the problem, we need to deduct reserved from pending when getting > total-pending resource for LeafQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer
[ https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819248#comment-15819248 ] Jason Lowe commented on YARN-574: - Thanks for picking this up [~ajithshetty]. I took a quick look at the patch. It looks OK at a high level, but there is a race condition in how we're dealing with the thread pool. The code makes the assumption that work submitted to the queue will be picked up instantly by an idle thread in the thread pool. If it's not picked up fast enough then we can end up doing one or more super-quick heartbeats and accidentally queue up more work for the thread pool than we have active threads. That could actually make the localization _slower_ when there are multiple containers for the same job on the same node, since one of the other container localizers that has idle threads cannot work on a resource already handed to another localizer. IMHO we can trivially track the outstanding count ourselves. We simply need to increment an AtomicInteger when we submit the work to the executor, then wrap FSDownload in another Callable that decrements the AtomicInteger when FSDownload returns/throws. Then we can track how many resources are either pending or actively being downloaded without getting bitten by race conditions in the executor implementation. Alternatively the createStatus method already walks the Future objects returned from the executor and we could calculate how many resources are in-progress (i.e.: either pending or actively being downloaded) there. Once there are as many in-progress resources as the configured parallelism then we should avoid making quick heartbeats. > PrivateLocalizer does not support parallel resource download via > ContainerLocalizer > --- > > Key: YARN-574 > URL: https://issues.apache.org/jira/browse/YARN-574 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Omkar Vinit Joshi >Assignee: Ajith S > Attachments: YARN-574.03.patch, YARN-574.04.patch, YARN-574.1.patch, > YARN-574.2.patch > > > At present private resources will be downloaded in parallel only if multiple > containers request the same resource. However otherwise it will be serial. > The protocol between PrivateLocalizer and ContainerLocalizer supports > multiple downloads however it is not used and only one resource is sent for > downloading at a time. > I think we can increase / assure parallelism (even for single container > requesting resource) for private/application resources by making multiple > downloads per ContainerLocalizer. > Total Parallelism before > = number of threads allotted for PublicLocalizer [public resource] + number > of containers[private and application resource] > Total Parallelism after > = number of threads allotted for PublicLocalizer [public resource] + number > of containers * max downloads per container [private and application resource] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5864) YARN Capacity Scheduler - Queue Priorities
[ https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-5864: - Attachment: YARN-5864.004.patch [~sunilg], thanks for reviewing, for your comments: For 1), yes, underutilized queue always goes first before overutilized queues. For 2), I have thought about this. I intentionally make it to two policies because: - All configurations will be grouped, for example preemption-related configuration. - Priority can be interpreted in different way, for example, priority could be used as "weights" in different policy implementation. - Avoid too many options to enable/disable features inside one option. - Internal implementation is not related how admin uses the feature. For 3), added comment to make sure ParentQueue uses readlock correctly. (Now it is fine). For 4), it should be fine, it is already part of Maven dependency. For 5), As noted in comment, I agree that we can optimize this. Since time complexity of this algorithm is O(N^2 * Max_queue_depth), N is #LeafQueue. Since we have limited number of leaf queues, and Max_queue_depth is a small constant. We're fine now. For 6), Similar to above, we're fine now, and 5)/6) can be done separately. For 7), Updated For 8), Updated, and added new test. For 9), Updated according to changes of 8) For 10), I think we should make sure queue properties like used/pending/reserved will not be updated. And ideal-assigned/preemptable could be changed for different selectors. Please comment if you find any changes from IntraQueueSelector. For 11), Updated For 12), Considered this, I cannot think of a relatively easy approach to do this. The time complexity will be O(#containers * #reserved-nodes). And since we have a "touchedNode" set to avoid double check nodes, it should not a big problem even we have a large cluster. I will do some SLS performance test to make sure it works well. Attached ver.4 patch. This patch is on top of YARN-6081, will update patch available state once YARN-6081 get committed. > YARN Capacity Scheduler - Queue Priorities > -- > > Key: YARN-5864 > URL: https://issues.apache.org/jira/browse/YARN-5864 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-5864.001.patch, YARN-5864.002.patch, > YARN-5864.003.patch, YARN-5864.004.patch, YARN-5864.poc-0.patch, > YARN-CapacityScheduler-Queue-Priorities-design-v1.pdf > > > Currently, Capacity Scheduler at every parent-queue level uses relative > used-capacities of the chil-queues to decide which queue can get next > available resource first. > For example, > - Q1 & Q2 are child queues under queueA > - Q1 has 20% of configured capacity, 5% of used-capacity and > - Q2 has 80% of configured capacity, 8% of used-capacity. > In the situation, the relative used-capacities are calculated as below > - Relative used-capacity of Q1 is 5/20 = 0.25 > - Relative used-capacity of Q2 is 8/80 = 0.10 > In the above example, per today’s Capacity Scheduler’s algorithm, Q2 is > selected by the scheduler first to receive next available resource. > Simply ordering queues according to relative used-capacities sometimes causes > a few troubles because scarce resources could be assigned to less-important > apps first. > # Latency sensitivity: This can be a problem with latency sensitive > applications where waiting till the ‘other’ queue gets full is not going to > cut it. The delay in scheduling directly reflects in the response times of > these applications. > # Resource fragmentation for large-container apps: Today’s algorithm also > causes issues with applications that need very large containers. It is > possible that existing queues are all within their resource guarantees but > their current allocation distribution on each node may be such that an > application which needs large container simply cannot fit on those nodes. > Services: > # The above problem (2) gets worse with long running applications. With short > running apps, previous containers may eventually finish and make enough space > for the apps with large containers. But with long running services in the > cluster, the large containers’ application may never get resources on any > nodes even if its demands are not yet met. > # Long running services are sometimes more picky w.r.t placement than normal > batch apps. For example, for a long running service in a separate queue (say > queue=service), during peak hours it may want to launch instances on 50% of > the cluster nodes. On each node, it may want to launch a large container, say > 200G memory per container. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail:
[jira] [Commented] (YARN-5556) Support for deleting queues without requiring a RM restart
[ https://issues.apache.org/jira/browse/YARN-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819211#comment-15819211 ] Hadoop QA commented on YARN-5556: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 22s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 6 new + 280 unchanged - 2 fixed = 286 total (was 282) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 58s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 61m 11s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-5556 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12847065/YARN-5556.v2.006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 0e50c55d491e 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e648b6e | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/14640/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/14640/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14640/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output |
[jira] [Updated] (YARN-6081) LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved from pending to avoid unnecessary preemption of reserved container
[ https://issues.apache.org/jira/browse/YARN-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6081: - Attachment: YARN-6081.002.patch Thanks [~sunilg] for reviewing the patch. For 2), it uses Resources.substract so it will not touch the original value. For 3), updated to use componentwiseMax For 1/4/5, addressed. > LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved > from pending to avoid unnecessary preemption of reserved container > > > Key: YARN-6081 > URL: https://issues.apache.org/jira/browse/YARN-6081 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6081.001.patch, YARN-6081.002.patch > > > While doing YARN-5864 tests, found an issue when a queue's reserved > > pending. PreemptionResourceCalculator will preempt reserved container even if > there's only one active queue in the cluster. > To fix the problem, we need to deduct reserved from pending when getting > total-pending resource for LeafQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6083) Add doc for reservation in Fair Scheduler
Yufei Gu created YARN-6083: -- Summary: Add doc for reservation in Fair Scheduler Key: YARN-6083 URL: https://issues.apache.org/jira/browse/YARN-6083 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Yufei Gu Assignee: Yufei Gu We can enable reservation on a leaf queue by set the tag for the queue, there is not doc for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers
[ https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819132#comment-15819132 ] Devaraj K commented on YARN-5764: - bq. Do you have any benchmarks results that would illustrate the kind of performance gains that could potentially be realised with this patch? Thanks [~raviprak] for going through this. I will share the performance results here. Thanks [~sunilg] for the comments. bq. if NM is taking the decision based on cores (NUMA cpus), it ll be more container specific. Could we apply it more of application specific where few apps containers only will be NUMA aware. bq. Also I think such NUMA aware nodes could be controlled within a specific nodelabel, I think it may yield better use cases for NUMA. So during NM init, such awareness info could be passed to RM and it can be made as node attribute. Such nodes could then be labelled together as well. If we want to run an application only on NUMA aware nodes, we can group NUMA aware nodes into a node-label and specify this node-label for the application. I am wondering why do some applications don't want to run in NUMA if the NM supports and getting some perf gain for making this as applications specific. We can also include this as an attribute once the constraint node labels(YARN-3409) feature gets in. > NUMA awareness support for launching containers > --- > > Key: YARN-5764 > URL: https://issues.apache.org/jira/browse/YARN-5764 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Reporter: Olasoji >Assignee: Devaraj K > Attachments: NUMA Awareness for YARN Containers.pdf, > YARN-5764-v0.patch, YARN-5764-v1.patch > > > The purpose of this feature is to improve Hadoop performance by minimizing > costly remote memory accesses on non SMP systems. Yarn containers, on launch, > will be pinned to a specific NUMA node and all subsequent memory allocations > will be served by the same node, reducing remote memory accesses. The current > default behavior is to spread memory across all NUMA nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-6058) Support for listing all applications i.e /apps
[ https://issues.apache.org/jira/browse/YARN-6058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-6058: --- Comment: was deleted (was: Whatever ) > Support for listing all applications i.e /apps > -- > > Key: YARN-6058 > URL: https://issues.apache.org/jira/browse/YARN-6058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: yarn-5355-merge-blocker > > Primary use case for /apps is many execution engines runs on top of YARN > example, Tez, MR. These engines will have their own UI's which list specific > type of entities which are published by them Ex: DAG entities. > But, these UI's do not aware of either userName or flowName or applicationId > which are submitted by these engines. > Currently, given that user do not aware of user, flownName, and > applicationId, then he can not retrieve any entities. > By supporting /apps with filters, user can list of application with given > ApplicationType. These applications can be used for retrieving engine > specific entities like DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6058) Support for listing all applications i.e /apps
[ https://issues.apache.org/jira/browse/YARN-6058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819113#comment-15819113 ] Varun Saxena commented on YARN-6058: Well, whatever you are asking for here would lead to a full table scan for Application table. And records won't be in order as well due to the structure of the row key. This came up during the discussion on YARN-5585 as well I think and at that time I had suggested that if all you want is just a list of Application IDs', we can probably use App to flow table to show it. Would only App IDs' be enough or you need more metadata i.e. some other application attributes? Frankly, the intention of ATSv2 at the time of design was to model workflows i.e. let users drill down from flows to apps to generic entities. Whereas, what you want is application ID directly. Would it not be possible for Tez UI to follow the same order of flows->flowruns->apps (depending on the outcome of how we display flows in YARN-6027)? As Tez executes the DAG within the scope of an application, its case is somewhat unique though. We should, however, store app type as well, as others said. > Support for listing all applications i.e /apps > -- > > Key: YARN-6058 > URL: https://issues.apache.org/jira/browse/YARN-6058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: yarn-5355-merge-blocker > > Primary use case for /apps is many execution engines runs on top of YARN > example, Tez, MR. These engines will have their own UI's which list specific > type of entities which are published by them Ex: DAG entities. > But, these UI's do not aware of either userName or flowName or applicationId > which are submitted by these engines. > Currently, given that user do not aware of user, flownName, and > applicationId, then he can not retrieve any entities. > By supporting /apps with filters, user can list of application with given > ApplicationType. These applications can be used for retrieving engine > specific entities like DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6058) Support for listing all applications i.e /apps
[ https://issues.apache.org/jira/browse/YARN-6058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819074#comment-15819074 ] Varun Saxena commented on YARN-6058: Whatever > Support for listing all applications i.e /apps > -- > > Key: YARN-6058 > URL: https://issues.apache.org/jira/browse/YARN-6058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: yarn-5355-merge-blocker > > Primary use case for /apps is many execution engines runs on top of YARN > example, Tez, MR. These engines will have their own UI's which list specific > type of entities which are published by them Ex: DAG entities. > But, these UI's do not aware of either userName or flowName or applicationId > which are submitted by these engines. > Currently, given that user do not aware of user, flownName, and > applicationId, then he can not retrieve any entities. > By supporting /apps with filters, user can list of application with given > ApplicationType. These applications can be used for retrieving engine > specific entities like DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819036#comment-15819036 ] Junping Du commented on YARN-6072: -- I believe latest patch already incorporate Jian's comments above. [~Naganarasimha], would you go ahead to do the honor? :) > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > YARN-6072.03.branch-2.8.patch, YARN-6072.03.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are added in following order > # EmbeddedElector > #
[jira] [Updated] (YARN-5556) Support for deleting queues without requiring a RM restart
[ https://issues.apache.org/jira/browse/YARN-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-5556: Attachment: YARN-5556.v2.006.patch attaching a patch for addressing [~wangda]'s comments > Support for deleting queues without requiring a RM restart > -- > > Key: YARN-5556 > URL: https://issues.apache.org/jira/browse/YARN-5556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Xuan Gong >Assignee: Naganarasimha G R > Attachments: YARN-5556.v1.001.patch, YARN-5556.v1.002.patch, > YARN-5556.v1.003.patch, YARN-5556.v1.004.patch, YARN-5556.v2.005.patch, > YARN-5556.v2.006.patch > > > Today, we could add or modify queues without restarting the RM, via a CS > refresh. But for deleting queue, we have to restart the ResourceManager. We > could support for deleting queues without requiring a RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6058) Support for listing all applications i.e /apps
[ https://issues.apache.org/jira/browse/YARN-6058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818964#comment-15818964 ] Joep Rottinghuis commented on YARN-6058: Agreed we should hit the flow-activity table. Other tables don't have a strong time-range in the key and will result in very large scans. +1 for storing the framework types for a flow. From the HBase perspective we could make the value the count of the applications of that type in a flow, but that has two problems: increments aren't idempotent (in light of spooling and replay), and our plumbing would have to be adjusted. So probably we should just store 1 as the value and then use a SingleColumnValueExcludeFilter to return only those flows with the particular type having an activity in a day. This does mean that we have to have the framework type present each time we insert a record into the flow activity table. > Support for listing all applications i.e /apps > -- > > Key: YARN-6058 > URL: https://issues.apache.org/jira/browse/YARN-6058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: yarn-5355-merge-blocker > > Primary use case for /apps is many execution engines runs on top of YARN > example, Tez, MR. These engines will have their own UI's which list specific > type of entities which are published by them Ex: DAG entities. > But, these UI's do not aware of either userName or flowName or applicationId > which are submitted by these engines. > Currently, given that user do not aware of user, flownName, and > applicationId, then he can not retrieve any entities. > By supporting /apps with filters, user can list of application with given > ApplicationType. These applications can be used for retrieving engine > specific entities like DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5849) Automatically create YARN control group for pre-mounted cgroups
[ https://issues.apache.org/jira/browse/YARN-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818961#comment-15818961 ] Bibin A Chundatt commented on YARN-5849: Latest patch looks good to me too. > Automatically create YARN control group for pre-mounted cgroups > --- > > Key: YARN-5849 > URL: https://issues.apache.org/jira/browse/YARN-5849 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > Attachments: YARN-5849.000.patch, YARN-5849.001.patch, > YARN-5849.002.patch, YARN-5849.003.patch, YARN-5849.004.patch, > YARN-5849.005.patch, YARN-5849.006.patch, YARN-5849.007.patch, > YARN-5849.008.patch > > > Yarn can be launched with linux-container-executor.cgroups.mount set to > false. It will search for the cgroup mount paths set up by the administrator > parsing the /etc/mtab file. You can also specify > resource.percentage-physical-cpu-limit to limit the CPU resources assigned to > containers. > linux-container-executor.cgroups.hierarchy is the root of the settings of all > YARN containers. If this is specified but not created YARN will fail at > startup: > Caused by: java.io.FileNotFoundException: > /cgroups/cpu/hadoop-yarn/cpu.cfs_period_us (Permission denied) > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.updateCgroup(CgroupsLCEResourcesHandler.java:263) > This JIRA is about automatically creating YARN control group in the case > above. It reduces the cost of administration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5416) TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently due to not wait SchedulerApplicationAttempt to be stopped
[ https://issues.apache.org/jira/browse/YARN-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818918#comment-15818918 ] Hudson commented on YARN-5416: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #11107 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11107/]) YARN-5416. TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed (jlowe: rev 357eab95668dbc419239857ac5ce763d76fd40e7) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java > TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently > due to not wait SchedulerApplicationAttempt to be stopped > > > Key: YARN-5416 > URL: https://issues.apache.org/jira/browse/YARN-5416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test, yarn >Reporter: Junping Du >Assignee: Junping Du >Priority: Minor > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5416-v2.patch, YARN-5416.patch > > > The test failure stack is: > Running org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > Tests run: 54, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 385.338 sec > <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > testRMRestartWaitForPreviousAMToFinish[0](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) > Time elapsed: 43.134 sec <<< FAILURE! > java.lang.AssertionError: AppAttempt state is not correct (timedout) > expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at > org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:86) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:594) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:1008) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:530) > This is due to the same issue that partially fixed in YARN-4968 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5378) Accomodate app-id->cluster mapping
[ https://issues.apache.org/jira/browse/YARN-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818889#comment-15818889 ] Rohith Sharma K S commented on YARN-5378: - [~sjlee0] this is one of the requirement I am getting from cloud companies whenever I talk about ATSv2. Primary use case for them is multiple number of ephemeral cluster being created and destroyed where they do not aware of clusterId at all. IIRC I was asked doubt on appToFlow table key very long back, then reason given was same applicationId can be created across clusterId.(very less probability but can't ignore too). So I was suggested folks to keep track of clusterId and feed it when ever required to retrieve it from ATSv2. It would be great if this JIRA get some consensus and move forward!! > Accomodate app-id->cluster mapping > -- > > Key: YARN-5378 > URL: https://issues.apache.org/jira/browse/YARN-5378 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Joep Rottinghuis >Assignee: Sangjin Lee > Labels: yarn-5355-merge-blocker > > In discussion with [~sjlee0], [~vrushalic], [~subru], and [~curino] a > use-case came up to be able to map from application-id to cluster-id in > context of federation for Yarn. > What happens is that a "random" cluster in the federation is asked to > generate an app-id and then potentially a different cluster can be the "home" > cluster for the AM. Furthermore, tasks can then run in yet other clusters. > In order to be able to pull up the logical home cluster on which the > application ran, there needs to be a mapping from application-id to > cluster-id. This mapping is available in the federated Yarn case only during > the active live of the application. > A similar situation is common in our larger production environment. Somebody > will complain about a slow job, some failure or whatever. If we're lucky we > have an application-id. When we ask the user which cluster they ran on, > they'll typically answer with the machine from where they launched the job > (many users are unaware of the underlying physical clusters). This leaves us > to spelunk through various RM ui's to find a matching epoch in the > application ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers
[ https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818887#comment-15818887 ] Sunil G commented on YARN-5764: --- Thanks [~devaraj.k] for the proposal. Looks very interesting.. As mentioned above, if NM is taking the decision based on cores (NUMA cpus), it ll be more container specific. Could we apply it more of application specific where few apps containers only will be NUMA aware. Also I think such NUMA aware nodes could be controlled within a specific nodelabel, I think it may yield better use cases for NUMA. So during NM init, such awareness info could be passed to RM and it can be made as node attribute. Such nodes could then be labelled together as well. > NUMA awareness support for launching containers > --- > > Key: YARN-5764 > URL: https://issues.apache.org/jira/browse/YARN-5764 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Reporter: Olasoji >Assignee: Devaraj K > Attachments: NUMA Awareness for YARN Containers.pdf, > YARN-5764-v0.patch, YARN-5764-v1.patch > > > The purpose of this feature is to improve Hadoop performance by minimizing > costly remote memory accesses on non SMP systems. Yarn containers, on launch, > will be pinned to a specific NUMA node and all subsequent memory allocations > will be served by the same node, reducing remote memory accesses. The current > default behavior is to spread memory across all NUMA nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5864) YARN Capacity Scheduler - Queue Priorities
[ https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818835#comment-15818835 ] Sunil G commented on YARN-5864: --- Thanks [~leftnoteasy] for detailed proposal and the patch. I think this will really help to cut many corner cases whats present in scheduler today. Overall approach looks fine. *Few doubts in document as well as code:* +PriorityUtilizationQueueOrderingPolicy+ 1. bq.service queue has 66.7% configured resource (200G), each container needs 90G memory; Batch queue has 33.3% configured resource (100G), each container needs 20G memory. One doubt here. If *service* queue has used+reserved more than 66.7%, I think we ll not be considering higher priority queue here rt. 2. For normal *utilization* policy also, we use {{PriorityUtilizationQueueOrderingPolicy}} with {{respectPriority=false}} mode. May be we can pull a better name as we handle priority and utilization order in same policy impl. Or we could pull a {{AbstractUtilizationQueueOrderingPolicy}} which can support normal resource utilization and an extended Priority policy can do priority handling. 3. {{PriorityUtilizationQueueOrderingPolicy#getAssignmentIterator}} needs a readLock for *queues* ? +QueuePriorityContainerCandidateSelector+ 4. Could we use Guava libs in hadoop (ref: HashBasedTable) ? 5. {{intializePriorityDigraph}}, since queue priority set either at the time of initialize or reinitialize, i think we are recalculating and creating {{PriorityDigraph}} everytime. I think its not very specifically a preemption entity, still a scheduler entity as well. Could we create and cache it in CS so that such recomputation can be avoided. 6. {{intializePriorityDigraph}}, In {{preemptionContext.getLeafQueueNames()}} we are getting queue names in random. For better performance, i think we need these names in BFS search model which start from one side to another. Will that help ? 7. {{selectCandidates}} exit condition can be added in beginning, where queue priorities are not configured or digraph does not any queues in which some containers are reserved. 8. bq.Collections.sort(reservedContainers, CONTAINER_CREATION_TIME_COMPARATOR); Why are we sorting with container create time? Do we first need that reserved container from the most high priority queue? 9. In {{selectCandidates}} {noformat} 431 if (currentTime - reservedContainer.getCreationTime() < minTimeout) { 432 break; 433 } {noformat} I think we need to continue rt ? 10. {{selectCandidates}} all TempQueuePerPartition is still taken from context. I think in IntraQueue preemption selector make some changes in TempQueue. I will confirm soon. If so we might need a relook there. 11. In {{selectCandidates}}, while looping for {{newlySelectedToBePreemptContainers}}, it possible that container is already present in {{selectedCandidates}}. Currently we still deduct from {{totalPreemptedResourceAllowed}} in such cases as well. not looking correct. 12. {{tryToMakeBetterReservationPlacement}} looks a very big loop over all {{allSchedulerNodes}}. Looks not very optimal. I think i ll give one more pass once some of these are clarified. > YARN Capacity Scheduler - Queue Priorities > -- > > Key: YARN-5864 > URL: https://issues.apache.org/jira/browse/YARN-5864 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-5864.001.patch, YARN-5864.002.patch, > YARN-5864.003.patch, YARN-5864.poc-0.patch, > YARN-CapacityScheduler-Queue-Priorities-design-v1.pdf > > > Currently, Capacity Scheduler at every parent-queue level uses relative > used-capacities of the chil-queues to decide which queue can get next > available resource first. > For example, > - Q1 & Q2 are child queues under queueA > - Q1 has 20% of configured capacity, 5% of used-capacity and > - Q2 has 80% of configured capacity, 8% of used-capacity. > In the situation, the relative used-capacities are calculated as below > - Relative used-capacity of Q1 is 5/20 = 0.25 > - Relative used-capacity of Q2 is 8/80 = 0.10 > In the above example, per today’s Capacity Scheduler’s algorithm, Q2 is > selected by the scheduler first to receive next available resource. > Simply ordering queues according to relative used-capacities sometimes causes > a few troubles because scarce resources could be assigned to less-important > apps first. > # Latency sensitivity: This can be a problem with latency sensitive > applications where waiting till the ‘other’ queue gets full is not going to > cut it. The delay in scheduling directly reflects in the response times of > these applications. > # Resource fragmentation for large-container apps: Today’s algorithm also > causes issues with applications that need very
[jira] [Commented] (YARN-5416) TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently due to not wait SchedulerApplicationAttempt to be stopped
[ https://issues.apache.org/jira/browse/YARN-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818824#comment-15818824 ] Jason Lowe commented on YARN-5416: -- +1 lgtm. I'll fix the unused import checkstyle nits during the commit. > TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently > due to not wait SchedulerApplicationAttempt to be stopped > > > Key: YARN-5416 > URL: https://issues.apache.org/jira/browse/YARN-5416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test, yarn >Reporter: Junping Du >Assignee: Junping Du >Priority: Minor > Attachments: YARN-5416-v2.patch, YARN-5416.patch > > > The test failure stack is: > Running org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > Tests run: 54, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 385.338 sec > <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > testRMRestartWaitForPreviousAMToFinish[0](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) > Time elapsed: 43.134 sec <<< FAILURE! > java.lang.AssertionError: AppAttempt state is not correct (timedout) > expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at > org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:86) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:594) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:1008) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:530) > This is due to the same issue that partially fixed in YARN-4968 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5378) Accomodate app-id->cluster mapping
[ https://issues.apache.org/jira/browse/YARN-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee reassigned YARN-5378: - Assignee: Sangjin Lee (was: Joep Rottinghuis) > Accomodate app-id->cluster mapping > -- > > Key: YARN-5378 > URL: https://issues.apache.org/jira/browse/YARN-5378 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Joep Rottinghuis >Assignee: Sangjin Lee > Labels: yarn-5355-merge-blocker > > In discussion with [~sjlee0], [~vrushalic], [~subru], and [~curino] a > use-case came up to be able to map from application-id to cluster-id in > context of federation for Yarn. > What happens is that a "random" cluster in the federation is asked to > generate an app-id and then potentially a different cluster can be the "home" > cluster for the AM. Furthermore, tasks can then run in yet other clusters. > In order to be able to pull up the logical home cluster on which the > application ran, there needs to be a mapping from application-id to > cluster-id. This mapping is available in the federated Yarn case only during > the active live of the application. > A similar situation is common in our larger production environment. Somebody > will complain about a slow job, some failure or whatever. If we're lucky we > have an application-id. When we ask the user which cluster they ran on, > they'll typically answer with the machine from where they launched the job > (many users are unaware of the underlying physical clusters). This leaves us > to spelunk through various RM ui's to find a matching epoch in the > application ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5416) TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently due to not wait SchedulerApplicationAttempt to be stopped
[ https://issues.apache.org/jira/browse/YARN-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818577#comment-15818577 ] Eric Badger commented on YARN-5416: --- [~djp], patch looks good to me. Should probably clean up the checkstyle errors though (at least the unused imports, which are easy). > TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently > due to not wait SchedulerApplicationAttempt to be stopped > > > Key: YARN-5416 > URL: https://issues.apache.org/jira/browse/YARN-5416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test, yarn >Reporter: Junping Du >Assignee: Junping Du >Priority: Minor > Attachments: YARN-5416-v2.patch, YARN-5416.patch > > > The test failure stack is: > Running org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > Tests run: 54, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 385.338 sec > <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > testRMRestartWaitForPreviousAMToFinish[0](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) > Time elapsed: 43.134 sec <<< FAILURE! > java.lang.AssertionError: AppAttempt state is not correct (timedout) > expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at > org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:86) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:594) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:1008) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:530) > This is due to the same issue that partially fixed in YARN-4968 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818493#comment-15818493 ] Naganarasimha G R commented on YARN-6072: - [~jianh] any more comments or shall i go ahead ? > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > YARN-6072.03.branch-2.8.patch, YARN-6072.03.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are added in following order > # EmbeddedElector > # AdminService > During resource manager service start()
[jira] [Commented] (YARN-6062) nodemanager memory leak
[ https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818286#comment-15818286 ] Bibin A Chundatt commented on YARN-6062: JDK 7U45 as per the github report the issue is not available. > nodemanager memory leak > --- > > Key: YARN-6062 > URL: https://issues.apache.org/jira/browse/YARN-6062 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: gehaijiang > Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt > > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 8986 data 20 0 21.3g 19g 7376 S 5.5 20.7 2458:09 java > 38432 data 20 0 9.8g 7.9g 6300 S 95.5 8.4 35273:23 java > 6653 data 20 0 4558m 3.4g 10m S 9.2 3.6 6640:37 java > $ jps > 6653 NodeManager > Nodemanager memory has been up,Reach 10G。 > nodemanager yarn-env.sh configure (2G) > YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m > -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6081) LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved from pending to avoid unnecessary preemption of reserved container
[ https://issues.apache.org/jira/browse/YARN-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818018#comment-15818018 ] Sunil G commented on YARN-6081: --- Thanks [~leftnoteasy]. Good catch! We decrement pending resources only if container is allocated (not reserved). So ideally we have to deduct reserved memory from pending resource if any. Ideally makes sense for me. Few comments: 1. {{getTotalPendingResourcesConsideringUserLimit}}, Not part of this patch. Could have a java doc comment there as well. So it ll be make javadoc also more better? 2. {code} Resource pending = app.getAppAttemptResourceUsage().getPending( partition); if (deductReservedFromPending) { pending = Resources.subtract(pending, app.getAppAttemptResourceUsage().getReserved(partition)); } {code} I have one doubt here. {{pending}} holds a reference of pending resource of appAttemptResource usage. Inside {{if(deductReservedFromPending)}} block, that reference is getting updated. Is that intentional? 3. {code} pending = Resources.max(resourceCalculator, lastClusterResource, pending, Resources.none()); {code} A quick doubt. Why are we using lastClusterResource here? 4. {{testPreemptionNotHappenForSingleReservedQueue}}, comment near verify block is confusing. 5. In {{testPendingResourcesConsideringUserLimit}}, could we also try to assert the app's pending and reserved too? > LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved > from pending to avoid unnecessary preemption of reserved container > > > Key: YARN-6081 > URL: https://issues.apache.org/jira/browse/YARN-6081 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6081.001.patch > > > While doing YARN-5864 tests, found an issue when a queue's reserved > > pending. PreemptionResourceCalculator will preempt reserved container even if > there's only one active queue in the cluster. > To fix the problem, we need to deduct reserved from pending when getting > total-pending resource for LeafQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817878#comment-15817878 ] Hadoop QA commented on YARN-6031: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 41s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 61m 19s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6031 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846782/YARN-6031.006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 808f0406b80d 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / be529da | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/14639/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14639/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14639/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Application recovery failed after disabling node label > -- > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 >
[jira] [Commented] (YARN-6062) nodemanager memory leak
[ https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817846#comment-15817846 ] gehaijiang commented on YARN-6062: -- thanks! JDK 7U80 and JDK8 Is using beta > nodemanager memory leak > --- > > Key: YARN-6062 > URL: https://issues.apache.org/jira/browse/YARN-6062 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: gehaijiang > Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt > > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 8986 data 20 0 21.3g 19g 7376 S 5.5 20.7 2458:09 java > 38432 data 20 0 9.8g 7.9g 6300 S 95.5 8.4 35273:23 java > 6653 data 20 0 4558m 3.4g 10m S 9.2 3.6 6640:37 java > $ jps > 6653 NodeManager > Nodemanager memory has been up,Reach 10G。 > nodemanager yarn-env.sh configure (2G) > YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m > -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817767#comment-15817767 ] Sunil G commented on YARN-6031: --- Patch generally looks fie for me. Will wait for jenkins to kick off. Also will wait for a day if any others have some comments as well. > Application recovery failed after disabling node label > -- > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-6031.001.patch, YARN-6031.002.patch, > YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, > YARN-6031.006.patch > > > Here is the repro steps: > Enable node label, restart RM, configure CS properly, and run some jobs; > Disable node label, restart RM, and the following exception thrown: > {noformat} > Caused by: > org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: > Invalid resource request, node label not enabled but request contains label > expression > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {noformat} > During RM restart, application recovery failed due to that application had > node label expression specified while node label has been disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6027) Support fromId for flows API
[ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817764#comment-15817764 ] Rohith Sharma K S commented on YARN-6027: - Yes, it is doable. We have done POC for the same at one level. The issue we face is aggregated flows should be limited with constant value nevertheless of user inputed limit. otherwise if user provides higher limit then possibility of OOM is very high. > Support fromId for flows API > - > > Key: YARN-6027 > URL: https://issues.apache.org/jira/browse/YARN-6027 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > > In YARN-5585 , fromId is supported for retrieving entities. We need similar > filter for flows/flowRun apps and flow run and flow as well. > Along with supporting fromId, this JIRA should also discuss following points > * Should we throw an exception for entities/entity retrieval if duplicates > found? > * TimelieEntity : > ** Should equals method also check for idPrefix? > ** Does idPrefix is part of identifiers? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817747#comment-15817747 ] Ying Zhang commented on YARN-6031: -- Thanks [~sunilg]. Done. I was thinking that LOG.debug can do this check on its own, but we can always do it beforehand and follow the current code style in RM:-) > Application recovery failed after disabling node label > -- > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-6031.001.patch, YARN-6031.002.patch, > YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, > YARN-6031.006.patch > > > Here is the repro steps: > Enable node label, restart RM, configure CS properly, and run some jobs; > Disable node label, restart RM, and the following exception thrown: > {noformat} > Caused by: > org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: > Invalid resource request, node label not enabled but request contains label > expression > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {noformat} > During RM restart, application recovery failed due to that application had > node label expression specified while node label has been disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying Zhang updated YARN-6031: - Attachment: YARN-6031.006.patch > Application recovery failed after disabling node label > -- > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-6031.001.patch, YARN-6031.002.patch, > YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, > YARN-6031.006.patch > > > Here is the repro steps: > Enable node label, restart RM, configure CS properly, and run some jobs; > Disable node label, restart RM, and the following exception thrown: > {noformat} > Caused by: > org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: > Invalid resource request, node label not enabled but request contains label > expression > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {noformat} > During RM restart, application recovery failed due to that application had > node label expression specified while node label has been disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6027) Support fromId for flows API
[ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817725#comment-15817725 ] Varun Saxena commented on YARN-6027: [~rohithsharma], so IIUC what you basically want is for a single flow ID, have a single flow entity with all its flow runs (for a given date range). That is aggregate data across dates. Right ? Should be doable. > Support fromId for flows API > - > > Key: YARN-6027 > URL: https://issues.apache.org/jira/browse/YARN-6027 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > > In YARN-5585 , fromId is supported for retrieving entities. We need similar > filter for flows/flowRun apps and flow run and flow as well. > Along with supporting fromId, this JIRA should also discuss following points > * Should we throw an exception for entities/entity retrieval if duplicates > found? > * TimelieEntity : > ** Should equals method also check for idPrefix? > ** Does idPrefix is part of identifiers? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817714#comment-15817714 ] Hadoop QA commented on YARN-6031: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 41m 14s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 50s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6031 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846768/YARN-6031.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux ee8bf4a7ffe6 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 467f5f1 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14638/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14638/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Application recovery failed after disabling node label > -- > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-6031.001.patch, YARN-6031.002.patch, >
[jira] [Commented] (YARN-6027) Support fromId for flows API
[ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817675#comment-15817675 ] Rohith Sharma K S commented on YARN-6027: - bq. Is that making things slow? Yes, it is slow bq. Should we consider aggregating at the server side before returning data? That way, the amount of data returned is less. Yup, this what we really looking. Given a date range, Flow collapse would be better and also pagination will be easier for collapsed flows. Importantly, collapsed flows retrieval should be fixed(we can define new limit), and this can not be same as limit. > Support fromId for flows API > - > > Key: YARN-6027 > URL: https://issues.apache.org/jira/browse/YARN-6027 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > > In YARN-5585 , fromId is supported for retrieving entities. We need similar > filter for flows/flowRun apps and flow run and flow as well. > Along with supporting fromId, this JIRA should also discuss following points > * Should we throw an exception for entities/entity retrieval if duplicates > found? > * TimelieEntity : > ** Should equals method also check for idPrefix? > ** Does idPrefix is part of identifiers? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817651#comment-15817651 ] Naganarasimha G R commented on YARN-6072: - Test case failures seems to be unrelated to the patch modifications, I think all of us agree with modifications, hence will commit it shortly ! > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > YARN-6072.03.branch-2.8.patch, YARN-6072.03.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are added in following
[jira] [Comment Edited] (YARN-5764) NUMA awareness support for launching containers
[ https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817623#comment-15817623 ] Ravi Prakash edited comment on YARN-5764 at 1/11/17 8:21 AM: - Hi Devaraj! Thanks for all your work. Do you have any benchmarks results that would illustrate the kind of performance gains that could potentially be realised with this patch? It'd be good if others had an opportunity to test it in their hardware and setup. was (Author: raviprak): Hi Devaraj! Thanks for all your work. Do you have any benchmarks results that would illustrate the kind of performance gains that could potentially be realised with this patch? > NUMA awareness support for launching containers > --- > > Key: YARN-5764 > URL: https://issues.apache.org/jira/browse/YARN-5764 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Reporter: Olasoji >Assignee: Devaraj K > Attachments: NUMA Awareness for YARN Containers.pdf, > YARN-5764-v0.patch, YARN-5764-v1.patch > > > The purpose of this feature is to improve Hadoop performance by minimizing > costly remote memory accesses on non SMP systems. Yarn containers, on launch, > will be pinned to a specific NUMA node and all subsequent memory allocations > will be served by the same node, reducing remote memory accesses. The current > default behavior is to spread memory across all NUMA nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers
[ https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817623#comment-15817623 ] Ravi Prakash commented on YARN-5764: Hi Devaraj! Thanks for all your work. Do you have any benchmarks results that would illustrate the kind of performance gains that could potentially be realised with this patch? > NUMA awareness support for launching containers > --- > > Key: YARN-5764 > URL: https://issues.apache.org/jira/browse/YARN-5764 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Reporter: Olasoji >Assignee: Devaraj K > Attachments: NUMA Awareness for YARN Containers.pdf, > YARN-5764-v0.patch, YARN-5764-v1.patch > > > The purpose of this feature is to improve Hadoop performance by minimizing > costly remote memory accesses on non SMP systems. Yarn containers, on launch, > will be pinned to a specific NUMA node and all subsequent memory allocations > will be served by the same node, reducing remote memory accesses. The current > default behavior is to spread memory across all NUMA nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817617#comment-15817617 ] Hadoop QA commented on YARN-6072: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 45s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 20s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_121. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}166m 10s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_111 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | JDK v1.7.0_121 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:5af2af1 | | JIRA Issue | YARN-6072 | | JIRA Patch URL |
[jira] [Comment Edited] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817589#comment-15817589 ] Sunil G edited comment on YARN-6031 at 1/11/17 8:03 AM: Quick correction: Could u also pls add {{LOG.isDebugEnabled()}} before logging. was (Author: sunilg): Quick correction: Could u also pls added {{LOG.isDebugEnabled()}} before logging. > Application recovery failed after disabling node label > -- > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-6031.001.patch, YARN-6031.002.patch, > YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch > > > Here is the repro steps: > Enable node label, restart RM, configure CS properly, and run some jobs; > Disable node label, restart RM, and the following exception thrown: > {noformat} > Caused by: > org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: > Invalid resource request, node label not enabled but request contains label > expression > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {noformat} > During RM restart, application recovery failed due to that application had > node label expression specified while node label has been disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817589#comment-15817589 ] Sunil G commented on YARN-6031: --- Quick correction: Could u also pls added {{LOG.isDebugEnabled()}} before logging. > Application recovery failed after disabling node label > -- > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-6031.001.patch, YARN-6031.002.patch, > YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch > > > Here is the repro steps: > Enable node label, restart RM, configure CS properly, and run some jobs; > Disable node label, restart RM, and the following exception thrown: > {noformat} > Caused by: > org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: > Invalid resource request, node label not enabled but request contains label > expression > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {noformat} > During RM restart, application recovery failed due to that application had > node label expression specified while node label has been disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org