[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator
[ https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573911#comment-14573911 ] Jian He commented on YARN-2716: --- Thanks Karthik for working this ! This simplifies things a lot. Mostly good, few comments and questions: - these two booleans not used, maybe removed. {{private boolean create = false, delete = false; }} - is this going to be done in this jira? {code} // TODO: Check deleting appIdRemovePath works recursively safeDelete(appIdRemovePath);{code} - will the safeDelete throw noNodeExist exception if deleting a non-existing zone? - {{new RetryNTimes(numRetries, zkSessionTimeout / numRetries));}}, I think the second parameter should be zkRetryInterval; Also, I have a question why in HA case, zkRetryInterval is calculated as below {code} if (HAUtil.isHAEnabled(conf)) { zkRetryInterval = zkSessionTimeout / numRetries; {code} - I found this [thread|http://mail-archives.apache.org/mod_mbox/curator-user/201410.mbox/%3cd076bc8e.9ef1%25sreichl...@chegg.com%3E] saying that blockUntilConnect is not needed to call; Suppose it’s needed, I think the zkSessionTimeout value is too small, it would be numRetries*numRetryInterval, otherwise RM will exit soon after retry 10s by default. {code} if (!curatorFramework.blockUntilConnected( zkSessionTimeout, TimeUnit.MILLISECONDS)) { LOG.fatal("Couldn't establish connection to ZK server"); throw new YarnRuntimeException("Couldn't connect to ZK server"); } {code} - remove this ? {code} // @Override // public ZooKeeper getNewZooKeeper() throws IOException { //return client; // } {code} - I think testZKSessionTimeout may be removed too ? it looks like a test for curator > Refactor ZKRMStateStore retry code with Apache Curator > -- > > Key: YARN-2716 > URL: https://issues.apache.org/jira/browse/YARN-2716 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Karthik Kambatla > Attachments: yarn-2716-1.patch, yarn-2716-prelim.patch, > yarn-2716-prelim.patch, yarn-2716-super-prelim.patch > > > Per suggestion by [~kasha] in YARN-2131, it's nice to use curator to > simplify the retry logic in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3508) Preemption processing occuring on the main RM dispatcher
[ https://issues.apache.org/jira/browse/YARN-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573878#comment-14573878 ] Wangda Tan commented on YARN-3508: -- Trying to better understand this problem: I'm not sure where is bottleneck. If CapacityScheduler becomes bottleneck, move preemption events out of main RM dispatcher doesn't help. This approach only helps when main dispatcher is bottleneck. And a parallel thing we can do is to optimize number of preemption event. Currently, if a container sits in to-preempt list, before it is get preempted, one event will be sent to scheduler for every few seconds, we can reduce frequency of this event. > Preemption processing occuring on the main RM dispatcher > > > Key: YARN-3508 > URL: https://issues.apache.org/jira/browse/YARN-3508 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Attachments: YARN-3508.002.patch, YARN-3508.01.patch > > > We recently saw the RM for a large cluster lag far behind on the > AsyncDispacher event queue. The AsyncDispatcher thread was consistently > blocked on the highly-contended CapacityScheduler lock trying to dispatch > preemption-related events for RMContainerPreemptEventDispatcher. Preemption > processing should occur on the scheduler event dispatcher thread or a > separate thread to avoid delaying the processing of other events in the > primary dispatcher queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573830#comment-14573830 ] Zhijie Shen commented on YARN-3051: --- [~varun_saxena], thanks for working on the new patch. It seems to be a complete reader side protype, which is nice. I still need some time to take thorough look, but I'd like to my thoughts about the reader APIs. IMHO, we may want to have or start with two sets of APIs: 1) the APIs to query the raw data and 2) the APIs to query the aggregation data. 1) APIs to query the raw data: We would like to have the APIs to let users zoom into the details about their jobs, and give users the freedom to fetch the raw data and do the customized process that ATS will not do. For example, Hive/Pig on Tez need this set of APIs to get the framework specific data, process it and render it on their on web UI. We basically need 2 such APIs. a. Get a single entity given an ID that uniquely locates the entity in the backend (We assume the uniqueness is assured somehow). * This API can be extended or split into multiple sub-APIs to get a single element of the entity, such as events, metrics and configuration. b. Search for a set entities that match the given predicates. * We can start from the predicates that we used in ATS v1 (also for the compatibility purpose), but some of them may no longer apply. * We may want to add more predicates to check the newly added element in v2. * With more predefined semantics, we can even query entities that belong to some container/attempt/application and so on. 2) APIs to query the aggregation data These are complete new in v2 and are the advantage. With the aggregation, we can answer some statistical questions about the job, the user, the queue, the flow and the cluster. These APIs are not directing users to the individual entities put by the application, but returning statistical data (carried by Application|User|Queue|Flow|ClusterEntity). a. Get certain level aggregation data given the ID of the concept on that level, i.e., the job, the user, the queue, the flow and the cluster. b. Search for the the jobs, the users, the queues, the flows and the clusters given predicates. * For the predicates, we could learn from the examples in hRaven. > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051-YARN-2928.003.patch, > YARN-3051-YARN-2928.03.patch, YARN-3051.wip.02.YARN-2928.patch, > YARN-3051.wip.patch, YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3706) Generalize native HBase writer for additional tables
[ https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joep Rottinghuis updated YARN-3706: --- Attachment: YARN-3726-YARN-2928.005.patch Uploading YARN-3726-YARN-2928.005.patch Added proper encoding and decoding of column names and values where a splitter is used. We now also encode spaces in the column names, and properly decode them on the way out. Fixed TestHBaseTimelineWriterImpl to confirm that configs now properly work as well. Still need to add reading of metrics, fix a unit test for join (with null as separator) of the older join method, and add a entity reader that creates an entire entity object from a scan result. > Generalize native HBase writer for additional tables > > > Key: YARN-3706 > URL: https://issues.apache.org/jira/browse/YARN-3706 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Joep Rottinghuis >Assignee: Joep Rottinghuis >Priority: Minor > Attachments: YARN-3706-YARN-2928.001.patch, > YARN-3726-YARN-2928.002.patch, YARN-3726-YARN-2928.003.patch, > YARN-3726-YARN-2928.004.patch, YARN-3726-YARN-2928.005.patch > > > When reviewing YARN-3411 we noticed that we could change the class hierarchy > a little in order to accommodate additional tables easily. > In order to get ready for benchmark testing we left the original layout in > place, as performance would not be impacted by the code hierarchy. > Here is a separate jira to address the hierarchy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573754#comment-14573754 ] Karthik Kambatla commented on YARN-3453: Few comments: # New imports in FairScheduler and FSLeafQueue are not required. # Looking at the remaining uses of DefaultResourceCalculator in FairScheduler, we could benefit from updating all of them to DominantResourceCalculator? [~ashwinshankar77] - do you concur? # In FairScheduler, changing the scope of RESOURCE_CALCULATOR and DOMINANT_RESOURCE_CALCULATOR is not required. # We should add unit-tests to avoid regressions in the future. # Nit: In each of the policies, my preference would be not make the calculator and comparator members static unless required. We have had cases where our tests would invoke multiple instances of the class leading to issues. Not that I foresee multiple instantiations for these classes, but would like to avoid it if we can. > Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator > even in DRF mode causing thrashing > > > Key: YARN-3453 > URL: https://issues.apache.org/jira/browse/YARN-3453 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Ashwin Shankar >Assignee: Arun Suresh > Attachments: YARN-3453.1.patch, YARN-3453.2.patch > > > There are two places in preemption code flow where DefaultResourceCalculator > is used, even in DRF mode. > Which basically results in more resources getting preempted than needed, and > those extra preempted containers aren’t even getting to the “starved” queue > since scheduling logic is based on DRF's Calculator. > Following are the two places : > 1. {code:title=FSLeafQueue.java|borderStyle=solid} > private boolean isStarved(Resource share) > {code} > A queue shouldn’t be marked as “starved” if the dominant resource usage > is >= fair/minshare. > 2. {code:title=FairScheduler.java|borderStyle=solid} > protected Resource resToPreempt(FSLeafQueue sched, long curTime) > {code} > -- > One more thing that I believe needs to change in DRF mode is : during a > preemption round,if preempting a few containers results in satisfying needs > of a resource type, then we should exit that preemption round, since the > containers that we just preempted should bring the dominant resource usage to > min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3768) Index out of range exception with environment variables without values
[ https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu reassigned YARN-3768: --- Assignee: zhihai xu > Index out of range exception with environment variables without values > -- > > Key: YARN-3768 > URL: https://issues.apache.org/jira/browse/YARN-3768 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.5.0 >Reporter: Joe Ferner >Assignee: zhihai xu > > Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range > exception occurs if an environment variable is encountered without a value. > I believe this occurs because java will not return empty strings from the > split method. Similar to this > http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor
[ https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573745#comment-14573745 ] zhihai xu commented on YARN-3745: - Sorry, there's one more thing I forgot to mention, Can we rename {{initExceptionWithConstructor}} to instantiateExceptionImpl? > SerializedException should also try to instantiate internal exception with > the default constructor > -- > > Key: YARN-3745 > URL: https://issues.apache.org/jira/browse/YARN-3745 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: YARN-3745.1.patch, YARN-3745.patch > > > While deserialising a SerializedException it tries to create internal > exception in instantiateException() with cn = > cls.getConstructor(String.class). > if cls does not has a constructor with String parameter it throws > Nosuchmethodexception > for example ClosedChannelException class. > We should also try to instantiate exception with default constructor so that > inner exception can to propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor
[ https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573724#comment-14573724 ] zhihai xu commented on YARN-3745: - [~lavkesh], thanks for working on this issue. This looks like a good catch. One question about the patch, why retrying on SecurityException? Will retrying on NoSuchMethodException be enough? If need retrying on SecurityException, Can we add a test case against it? There is a typo in the comment {{This does not has constructor with String argument}}, should be {{have}} instead of {{has}}. Also could we make the comment {{Try with String constructor if it fails try with default.}} clearer as {{Try constructor with String argument, if it fails, try default.}} Can we add some comment to explain why ClassNotFoundException is expected in the test? > SerializedException should also try to instantiate internal exception with > the default constructor > -- > > Key: YARN-3745 > URL: https://issues.apache.org/jira/browse/YARN-3745 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: YARN-3745.1.patch, YARN-3745.patch > > > While deserialising a SerializedException it tries to create internal > exception in instantiateException() with cn = > cls.getConstructor(String.class). > if cls does not has a constructor with String parameter it throws > Nosuchmethodexception > for example ClosedChannelException class. > We should also try to instantiate exception with default constructor so that > inner exception can to propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573673#comment-14573673 ] Wangda Tan commented on YARN-3769: -- Thanks [~eepayne], I reassigned it to me, I will upload a design doc shortly for review. > Preemption occurring unnecessarily because preemption doesn't consider user > limit > - > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0 >Reporter: Eric Payne >Assignee: Wangda Tan > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-3769: Assignee: Wangda Tan (was: Eric Payne) > Preemption occurring unnecessarily because preemption doesn't consider user > limit > - > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0 >Reporter: Eric Payne >Assignee: Wangda Tan > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573670#comment-14573670 ] Eric Payne commented on YARN-3769: -- [~leftnoteasy] bq. If you think it's fine, could I take a shot at it? It sounds like it would work. It's fine with me if you want to work on that. > Preemption occurring unnecessarily because preemption doesn't consider user > limit > - > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0 >Reporter: Eric Payne >Assignee: Eric Payne > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573667#comment-14573667 ] Wangda Tan commented on YARN-3769: -- [~eepayne], Exactly. > Preemption occurring unnecessarily because preemption doesn't consider user > limit > - > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0 >Reporter: Eric Payne >Assignee: Eric Payne > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573664#comment-14573664 ] Eric Payne commented on YARN-3769: -- [~leftnoteasy], {quote} One thing I've thought for a while is adding a "lazy preemption" mechanism, which is: when a container is marked preempted and wait for max_wait_before_time, it becomes a "can_be_killed" container. If there's another queue can allocate on a node with "can_be_killed" container, such container will be killed immediately to make room the new containers. {quote} IIUC, in your proposal, the preemption monitor would mark the containers as preemptable, and then after some configurable wait period, the capacity scheduler would be the one to do the killing if it finds that it needs the resources on that node. Is my understanding correct? > Preemption occurring unnecessarily because preemption doesn't consider user > limit > - > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0 >Reporter: Eric Payne >Assignee: Eric Payne > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573659#comment-14573659 ] Hudson commented on YARN-3766: -- FAILURE: Integrated in Hadoop-trunk-Commit #7971 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7971/]) YARN-3766. Fixed the apps table column error of generic history web UI. Contributed by Xuan Gong. (zjshen: rev 18dd01d6bf67f4d522b947454c1f4347d1cbbc19) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSView.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java > ATS Web UI breaks because of YARN-3467 > -- > > Key: YARN-3766 > URL: https://issues.apache.org/jira/browse/YARN-3766 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.8.0 > > Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch > > > The ATS web UI breaks because of the following changes made in YARN-3467. > {code} > +++ > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java > @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs( >.append(", 'mRender': renderHadoopDate }") >.append("\n, {'sType':'numeric', bSearchable:false, 'aTargets':"); > if (isFairSchedulerPage) { > - sb.append("[11]"); > + sb.append("[13]"); > } else if (isResourceManager) { > - sb.append("[10]"); > + sb.append("[12]"); > } else { >sb.append("[9]"); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573638#comment-14573638 ] Wangda Tan commented on YARN-3769: -- [~eepayne], This is a very interesting problem, actually not only user-limit causes it. For example, fair ordering (YARN-3306), hard locality requirements (I want resources from rackA and nodeX only), AM resource limit; In the near future we can have constraints (YARN-3409), all can lead to resource is preempted from one queue, but the other queue cannot use it because of specific resource requirement and limits. One thing I've thought for a while is adding a "lazy preemption" mechanism, which is: when a container is marked preempted and wait for max_wait_before_time, it becomes a "can_be_killed" container. If there's another queue can allocate on a node with "can_be_killed" container, such container will be killed immediately to make room the new containers. This mechanism can make preemption policy doesn't need to consider complex resource requirements and limits inside a queue, and also it can avoid kill unnecessary containers. If you think it's fine, could I take a shot at it? Thoughts? [~vinodkv]. > Preemption occurring unnecessarily because preemption doesn't consider user > limit > - > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0 >Reporter: Eric Payne >Assignee: Eric Payne > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573633#comment-14573633 ] Zhijie Shen commented on YARN-3766: --- Patch looks good. Tried it locally and the web UI has been fixed. Will commit it. > ATS Web UI breaks because of YARN-3467 > -- > > Key: YARN-3766 > URL: https://issues.apache.org/jira/browse/YARN-3766 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Blocker > Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch > > > The ATS web UI breaks because of the following changes made in YARN-3467. > {code} > +++ > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java > @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs( >.append(", 'mRender': renderHadoopDate }") >.append("\n, {'sType':'numeric', bSearchable:false, 'aTargets':"); > if (isFairSchedulerPage) { > - sb.append("[11]"); > + sb.append("[13]"); > } else if (isResourceManager) { > - sb.append("[10]"); > + sb.append("[12]"); > } else { >sb.append("[9]"); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573619#comment-14573619 ] Eric Payne commented on YARN-3769: -- The following configuration will cause this: || queue || capacity || max || pending || used || user limit | root | 100 | 100 | 40 | 90 | N/A | | A | 10 | 100 | 20 | 70 | 70 | | B | 10 | 100 | 20 | 20 | 20 | One app is running in each queue. Both apps are asking for more resources, but they have each reached their user limit, so even though both are asking for more and there are resources available, no more resources are allocated to either app. The preemption monitor will see that {{B}} is asking for a lot more resources, and it will see that {{B}} is more underserved than {{A}}, so the preemption monitor will try to make the queues balance by preempting resources (10, for example) from {{A}}. || queue || capacity || max || pending || used || user limit | root | 100 | 100 | 50 | 80 | N/A | | A | 10 | 100 | 30 | 60 | 70 | | B | 10 | 100 | 20 | 20 | 20 | However, when the capacity scheduler tries to give that container to the app in {{B}}, the app will recognize that it has no headroom, and refuse the container. So the capacity scheduler offers the container again to the app in {{A}}, which accepts it because it has headroom now, and the process starts over again. Note that this happens even when used cluster resources are below 100% because the used + pending for the cluster would put it above 100%. > Preemption occurring unnecessarily because preemption doesn't consider user > limit > - > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0 >Reporter: Eric Payne >Assignee: Eric Payne > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit
Eric Payne created YARN-3769: Summary: Preemption occurring unnecessarily because preemption doesn't consider user limit Key: YARN-3769 URL: https://issues.apache.org/jira/browse/YARN-3769 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.7.0, 2.6.0, 2.8.0 Reporter: Eric Payne Assignee: Eric Payne We are seeing the preemption monitor preempting containers from queue A and then seeing the capacity scheduler giving them immediately back to queue A. This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID
[ https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573593#comment-14573593 ] zhihai xu commented on YARN-3017: - Hi [~rohithsharma], thanks for the information. Sorry, I am not familiar with rolling upgrade, Could you give a little more detail about the possibility to break the rolling upgrade? But I saw the ContainerId format is changed by YARN-2562 at 2.6.0 release eight months ago, Compared to the change at YARN-2562, this patch is minor. Because it only changes function {{ContainerId#toString}}, the current function {{ContainerId##fromString}} supports both the current container string format and the new container string format. CC [~ozawa] for the impact of ContainerId format change. > ContainerID in ResourceManager Log Has Slightly Different Format From > AppAttemptID > -- > > Key: YARN-3017 > URL: https://issues.apache.org/jira/browse/YARN-3017 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: MUFEED USMAN >Priority: Minor > Labels: PatchAvailable > Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch > > > Not sure if this should be filed as a bug or not. > In the ResourceManager log in the events surrounding the creation of a new > application attempt, > ... > ... > 2014-11-14 17:45:37,258 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching > masterappattempt_1412150883650_0001_02 > ... > ... > The application attempt has the ID format "_1412150883650_0001_02". > Whereas the associated ContainerID goes by "_1412150883650_0001_02_". > ... > ... > 2014-11-14 17:45:37,260 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting > up > container Container: [ContainerId: container_1412150883650_0001_02_01, > NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: vCores:1, > disks:0.0>, Priority: 0, Token: Token { kind: ContainerToken, service: > 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02 > ... > ... > Curious to know if this is kept like that for a reason. If not while using > filtering tools to, say, grep events surrounding a specific attempt by the > numeric ID part information may slip out during troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3768) Index out of range exception with environment variables without values
[ https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573539#comment-14573539 ] zhihai xu commented on YARN-3768: - Hi [~joeferner], That is a good find. I can see the change at MAPREDUCE-5965 may trigger this bug. I can take up this issue if you don't mind. thanks for reporting this issue. > Index out of range exception with environment variables without values > -- > > Key: YARN-3768 > URL: https://issues.apache.org/jira/browse/YARN-3768 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.5.0 >Reporter: Joe Ferner > > Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range > exception occurs if an environment variable is encountered without a value. > I believe this occurs because java will not return empty strings from the > split method. Similar to this > http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573515#comment-14573515 ] Zhijie Shen commented on YARN-3044: --- I'm not sure because as far as I can tell, NM's impl is different from RM's, but it's up to you to figure out the proper solution:-) > [Event producers] Implement RM writing app lifecycle events to ATS > -- > > Key: YARN-3044 > URL: https://issues.apache.org/jira/browse/YARN-3044 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3044-YARN-2928.004.patch, > YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, > YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, > YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, > YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, > YARN-3044.20150416-1.patch > > > Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty
[ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573512#comment-14573512 ] Hudson commented on YARN-3733: -- FAILURE: Integrated in Hadoop-trunk-Commit #7970 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7970/]) Add missing test file of YARN-3733 (wangda: rev 405bbcf68c32d8fd8a83e46e686eacd14e5a533c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceCalculator.java > Fix DominantRC#compare() does not work as expected if cluster resource is > empty > --- > > Key: YARN-3733 > URL: https://issues.apache.org/jira/browse/YARN-3733 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 , 2 NM , 2 RM > one NM - 3 GB 6 v core >Reporter: Bibin A Chundatt >Assignee: Rohith >Priority: Blocker > Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, > 0002-YARN-3733.patch, YARN-3733.patch > > > Steps to reproduce > = > 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster) > 2. Configure map and reduce size to 512 MB after changing scheduler minimum > size to 512 MB > 3. Configure capacity scheduler and AM limit to .5 > (DominantResourceCalculator is configured) > 4. Submit 30 concurrent task > 5. Switch RM > Actual > = > For 12 Jobs AM gets allocated and all 12 starts running > No other Yarn child is initiated , *all 12 Jobs in Running state for ever* > Expected > === > Only 6 should be running at a time since max AM allocated is .5 (3072 MB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3768) Index out of range exception with environment variables without values
Joe Ferner created YARN-3768: Summary: Index out of range exception with environment variables without values Key: YARN-3768 URL: https://issues.apache.org/jira/browse/YARN-3768 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.5.0 Reporter: Joe Ferner Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range exception occurs if an environment variable is encountered without a value. I believe this occurs because java will not return empty strings from the split method. Similar to this http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN
[ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573468#comment-14573468 ] kassiano josé matteussi commented on YARN-2139: --- Dears, I have studied resource management under Hadoop applications running wrapped in Linux containers and I have faced troubles to restrict disk I/O with cgroups (bps_write, bps_read). Does anybody know if it is possible to do so? I have heard that limiting I/O with cgroups is restricted to synchronous writing (SYNC) and that is why it wouldn't work well with Hadoop + HDFS. Is this still true in more recent kernel implementation? Best Regards, Kassiano > [Umbrella] Support for Disk as a Resource in YARN > -- > > Key: YARN-2139 > URL: https://issues.apache.org/jira/browse/YARN-2139 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wei Yan > Attachments: Disk_IO_Isolation_Scheduling_3.pdf, > Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf, > YARN-2139-prototype-2.patch, YARN-2139-prototype.patch > > > YARN should consider disk as another resource for (1) scheduling tasks on > nodes, (2) isolation at runtime, (3) spindle locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures
[ https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573325#comment-14573325 ] Hudson commented on YARN-2392: -- SUCCESS: Integrated in Hadoop-trunk-Commit #7968 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7968/]) YARN-2392. Add more diags about app retry limits on AM failures. Contributed by Steve Loughran (jianhe: rev 1970ca7cbcdb7efa160d0cedc2e3e22c1401fad6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/CHANGES.txt > add more diags about app retry limits on AM failures > > > Key: YARN-2392 > URL: https://issues.apache.org/jira/browse/YARN-2392 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-2392-001.patch, YARN-2392-002.patch, > YARN-2392-002.patch > > > # when an app fails the failure count is shown, but not what the global + > local limits are. If the two are different, they should both be printed. > # the YARN-2242 strings don't have enough whitespace between text and the URL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3767) Yarn Scheduler Load Simulator does not work
[ https://issues.apache.org/jira/browse/YARN-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3767: --- Assignee: (was: Varun Saxena) > Yarn Scheduler Load Simulator does not work > --- > > Key: YARN-3767 > URL: https://issues.apache.org/jira/browse/YARN-3767 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: OS X 10.10. JDK 1.7 >Reporter: David Kjerrumgaard > > Running the SLS, as per the instructions on the web results in a > NullPointerException being thrown. > Steps followed to create error: > 1) Download Apache Hadoop 2.7.0 tarball from Apache site > 2) Untar 2.7.0 tarball into /opt directory > 3) Execute the following command: > /opt/hadoop-2.7.0/share/hadoop/tools/sls//bin/slsrun.sh > --input-rumen=/opt/hadoop-2.7.0/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json > --output-dir=/tmp > Results in the following error: > 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2118.smile.com:2 Node Transitioned > from NEW to RUNNING > 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node > a2118.smile.com:2 clusterResource: > 15/06/04 10:25:41 INFO util.RackResolver: Resolved a2115.smile.com to > /default-rack > 15/06/04 10:25:41 INFO resourcemanager.ResourceTrackerService: NodeManager > from node a2115.smile.com(cmPort: 3 httpPort: 80) registered with capability: > , assigned nodeId a2115.smile.com:3 > 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2115.smile.com:3 Node Transitioned > from NEW to RUNNING > 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node > a2115.smile.com:3 clusterResource: > Exception in thread "main" java.lang.RuntimeException: > java.lang.NullPointerException > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134) > at > org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:398) > at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:250) > at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:145) > at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528) > Caused by: java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:126) > ... 4 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures
[ https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573293#comment-14573293 ] Jian He commented on YARN-2392: --- looks good, committing > add more diags about app retry limits on AM failures > > > Key: YARN-2392 > URL: https://issues.apache.org/jira/browse/YARN-2392 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: YARN-2392-001.patch, YARN-2392-002.patch, > YARN-2392-002.patch > > > # when an app fails the failure count is shown, but not what the global + > local limits are. If the two are different, they should both be printed. > # the YARN-2242 strings don't have enough whitespace between text and the URL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3698) Make task attempt log files accessible from webapps & correct node-manager redirection
[ https://issues.apache.org/jira/browse/YARN-3698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreenath Somarajapuram updated YARN-3698: - Summary: Make task attempt log files accessible from webapps & correct node-manager redirection (was: Make task attempt log files accessible from webapps) > Make task attempt log files accessible from webapps & correct node-manager > redirection > -- > > Key: YARN-3698 > URL: https://issues.apache.org/jira/browse/YARN-3698 > Project: Hadoop YARN > Issue Type: Task >Reporter: Sreenath Somarajapuram > > Currently we don't have direct access to an attempt's log file from web apps. > The only available option is through jobhistory, and that provides an HTML > view of the log. > Requirements: > # A link to access the raw log file. > # A variant of the link with the following headers set, this enables direct > download of the file across all browsers. > Content-Disposition: attachment; filename="attempt-id.log" > Content-Type of text/plain > # Node manager redirects an attempt syslog view to the container view. Hence > we are not able to view the logs of a specific attempt. > Before redirection: > http://sandbox.hortonworks.com:8042/node/containerlogs/container_1432048982252_0004_01_02/root/syslog_attempt_1432048982252_0004_1_02_00_0 > After redirection: > http://sandbox.hortonworks.com:19888/jobhistory/logs/sandbox.hortonworks.com:45454/container_1432048982252_0004_01_02/container_1432048982252_0004_01_02/root -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3698) Make task attempt log files accessible from webapps
[ https://issues.apache.org/jira/browse/YARN-3698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreenath Somarajapuram updated YARN-3698: - Description: Currently we don't have direct access to an attempt's log file from web apps. The only available option is through jobhistory, and that provides an HTML view of the log. Requirements: # A link to access the raw log file. # A variant of the link with the following headers set, this enables direct download of the file across all browsers. Content-Disposition: attachment; filename="attempt-id.log" Content-Type of text/plain # Node manager redirects an attempt syslog view to the container view. Hence we are not able to view the logs of a specific attempt. Before redirection: http://sandbox.hortonworks.com:8042/node/containerlogs/container_1432048982252_0004_01_02/root/syslog_attempt_1432048982252_0004_1_02_00_0 After redirection: http://sandbox.hortonworks.com:19888/jobhistory/logs/sandbox.hortonworks.com:45454/container_1432048982252_0004_01_02/container_1432048982252_0004_01_02/root was: Currently we don't have direct access to an attempt's log file from web apps. The only available option is through jobhistory, and that provides an HTML view of the log. Requirements: # A link to access the raw log file. # A variant of the link with the following headers set, this enables direct download of the file across all browsers. Content-Disposition: attachment; filename="attempt-id.log" Content-Type of text/plain > Make task attempt log files accessible from webapps > --- > > Key: YARN-3698 > URL: https://issues.apache.org/jira/browse/YARN-3698 > Project: Hadoop YARN > Issue Type: Task >Reporter: Sreenath Somarajapuram > > Currently we don't have direct access to an attempt's log file from web apps. > The only available option is through jobhistory, and that provides an HTML > view of the log. > Requirements: > # A link to access the raw log file. > # A variant of the link with the following headers set, this enables direct > download of the file across all browsers. > Content-Disposition: attachment; filename="attempt-id.log" > Content-Type of text/plain > # Node manager redirects an attempt syslog view to the container view. Hence > we are not able to view the logs of a specific attempt. > Before redirection: > http://sandbox.hortonworks.com:8042/node/containerlogs/container_1432048982252_0004_01_02/root/syslog_attempt_1432048982252_0004_1_02_00_0 > After redirection: > http://sandbox.hortonworks.com:19888/jobhistory/logs/sandbox.hortonworks.com:45454/container_1432048982252_0004_01_02/container_1432048982252_0004_01_02/root -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2513) Host framework UIs in YARN for use with the ATS
[ https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573288#comment-14573288 ] Zhijie Shen commented on YARN-2513: --- As it's valuable to some existing ATS use case, let's try to get the patch in and target 2.8. [~jeagles], three comments about the patch: 1. Shall we add "yarn.timeline-service.ui-names" to yarn-default.xml too? Like "yarn.nodemanager.aux-services"? 2. Can we add some text in TimelineServer.md to document the configs and introduce how to install framework UIs. 3. Can we add a test case to validate and showcase that ATS can load a framework UIs (e.g., a single helloworld.html)? > Host framework UIs in YARN for use with the ATS > --- > > Key: YARN-2513 > URL: https://issues.apache.org/jira/browse/YARN-2513 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, > YARN-2513.v3.patch > > > Allow for pluggable UIs as described by TEZ-8. Yarn can provide the > infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another
[ https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573276#comment-14573276 ] Hudson commented on YARN-3764: -- FAILURE: Integrated in Hadoop-trunk-Commit #7966 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7966/]) YARN-3764. CapacityScheduler should forbid moving LeafQueue from one parent to another. Contributed by Wangda Tan (jianhe: rev 6ad4e59cfc111a92747fdb1fb99cc6378044832a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java > CapacityScheduler should forbid moving LeafQueue from one parent to another > --- > > Key: YARN-3764 > URL: https://issues.apache.org/jira/browse/YARN-3764 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.7.1 > > Attachments: YARN-3764.1.patch > > > Currently CapacityScheduler doesn't handle the case well, for example: > A queue structure: > {code} > root > | > a (100) > / \ >x y > (50) (50) > {code} > And reinitialize using following structure: > {code} > root > / \ > (50)a x (50) > | > y >(100) > {code} > The actual queue structure after reinitialize is: > {code} > root > /\ >a (50) x (50) > / \ > xy > (50) (100) > {code} > We should forbid admin doing that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3767) Yarn Scheduler Load Simulator does not work
[ https://issues.apache.org/jira/browse/YARN-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573273#comment-14573273 ] Varun Saxena commented on YARN-3767: Yes it will work if you copy {{sls-runner.xml}} to {{etc/hadoop}}. This is mentioned in documentation as well. Refer to : http://hadoop.apache.org/docs/r2.4.1/hadoop-sls/SchedulerLoadSimulator.html#Step_1:_Configure_Hadoop_and_the_simulator It mentions "Before we start, make sure Hadoop and the simulator are configured well. All configuration files for Hadoop and the simulator should be placed in directory $HADOOP_ROOT/etc/hadoop, where the ResourceManager and Yarn scheduler load their configurations. Directory $HADOOP_ROOT/share/hadoop/tools/sls/sample-conf/ provides several example configurations, that can be used to start a demo." > Yarn Scheduler Load Simulator does not work > --- > > Key: YARN-3767 > URL: https://issues.apache.org/jira/browse/YARN-3767 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: OS X 10.10. JDK 1.7 >Reporter: David Kjerrumgaard >Assignee: Varun Saxena > > Running the SLS, as per the instructions on the web results in a > NullPointerException being thrown. > Steps followed to create error: > 1) Download Apache Hadoop 2.7.0 tarball from Apache site > 2) Untar 2.7.0 tarball into /opt directory > 3) Execute the following command: > /opt/hadoop-2.7.0/share/hadoop/tools/sls//bin/slsrun.sh > --input-rumen=/opt/hadoop-2.7.0/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json > --output-dir=/tmp > Results in the following error: > 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2118.smile.com:2 Node Transitioned > from NEW to RUNNING > 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node > a2118.smile.com:2 clusterResource: > 15/06/04 10:25:41 INFO util.RackResolver: Resolved a2115.smile.com to > /default-rack > 15/06/04 10:25:41 INFO resourcemanager.ResourceTrackerService: NodeManager > from node a2115.smile.com(cmPort: 3 httpPort: 80) registered with capability: > , assigned nodeId a2115.smile.com:3 > 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2115.smile.com:3 Node Transitioned > from NEW to RUNNING > 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node > a2115.smile.com:3 clusterResource: > Exception in thread "main" java.lang.RuntimeException: > java.lang.NullPointerException > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134) > at > org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:398) > at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:250) > at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:145) > at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528) > Caused by: java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:126) > ... 4 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3767) Yarn Scheduler Load Simulator does not work
[ https://issues.apache.org/jira/browse/YARN-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573269#comment-14573269 ] Varun Saxena commented on YARN-3767: This belongs to YARN. > Yarn Scheduler Load Simulator does not work > --- > > Key: YARN-3767 > URL: https://issues.apache.org/jira/browse/YARN-3767 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: OS X 10.10. JDK 1.7 >Reporter: David Kjerrumgaard >Assignee: Varun Saxena > > Running the SLS, as per the instructions on the web results in a > NullPointerException being thrown. > Steps followed to create error: > 1) Download Apache Hadoop 2.7.0 tarball from Apache site > 2) Untar 2.7.0 tarball into /opt directory > 3) Execute the following command: > /opt/hadoop-2.7.0/share/hadoop/tools/sls//bin/slsrun.sh > --input-rumen=/opt/hadoop-2.7.0/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json > --output-dir=/tmp > Results in the following error: > 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2118.smile.com:2 Node Transitioned > from NEW to RUNNING > 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node > a2118.smile.com:2 clusterResource: > 15/06/04 10:25:41 INFO util.RackResolver: Resolved a2115.smile.com to > /default-rack > 15/06/04 10:25:41 INFO resourcemanager.ResourceTrackerService: NodeManager > from node a2115.smile.com(cmPort: 3 httpPort: 80) registered with capability: > , assigned nodeId a2115.smile.com:3 > 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2115.smile.com:3 Node Transitioned > from NEW to RUNNING > 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node > a2115.smile.com:3 clusterResource: > Exception in thread "main" java.lang.RuntimeException: > java.lang.NullPointerException > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134) > at > org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:398) > at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:250) > at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:145) > at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528) > Caused by: java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:126) > ... 4 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3767) Yarn Scheduler Load Simulator does not work
[ https://issues.apache.org/jira/browse/YARN-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena moved HADOOP-12062 to YARN-3767: - Component/s: (was: tools) Affects Version/s: (was: 2.7.0) 2.7.0 Key: YARN-3767 (was: HADOOP-12062) Project: Hadoop YARN (was: Hadoop Common) > Yarn Scheduler Load Simulator does not work > --- > > Key: YARN-3767 > URL: https://issues.apache.org/jira/browse/YARN-3767 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: OS X 10.10. JDK 1.7 >Reporter: David Kjerrumgaard >Assignee: Varun Saxena > > Running the SLS, as per the instructions on the web results in a > NullPointerException being thrown. > Steps followed to create error: > 1) Download Apache Hadoop 2.7.0 tarball from Apache site > 2) Untar 2.7.0 tarball into /opt directory > 3) Execute the following command: > /opt/hadoop-2.7.0/share/hadoop/tools/sls//bin/slsrun.sh > --input-rumen=/opt/hadoop-2.7.0/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json > --output-dir=/tmp > Results in the following error: > 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2118.smile.com:2 Node Transitioned > from NEW to RUNNING > 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node > a2118.smile.com:2 clusterResource: > 15/06/04 10:25:41 INFO util.RackResolver: Resolved a2115.smile.com to > /default-rack > 15/06/04 10:25:41 INFO resourcemanager.ResourceTrackerService: NodeManager > from node a2115.smile.com(cmPort: 3 httpPort: 80) registered with capability: > , assigned nodeId a2115.smile.com:3 > 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2115.smile.com:3 Node Transitioned > from NEW to RUNNING > 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node > a2115.smile.com:3 clusterResource: > Exception in thread "main" java.lang.RuntimeException: > java.lang.NullPointerException > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134) > at > org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:398) > at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:250) > at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:145) > at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528) > Caused by: java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:126) > ... 4 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2513) Host framework UIs in YARN for use with the ATS
[ https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2513: -- Target Version/s: 2.8.0 > Host framework UIs in YARN for use with the ATS > --- > > Key: YARN-2513 > URL: https://issues.apache.org/jira/browse/YARN-2513 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, > YARN-2513.v3.patch > > > Allow for pluggable UIs as described by TEZ-8. Yarn can provide the > infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty
[ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573202#comment-14573202 ] Hudson commented on YARN-3733: -- FAILURE: Integrated in Hadoop-trunk-Commit #7965 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7965/]) YARN-3733. Fix DominantRC#compare() does not work as expected if cluster resource is empty. (Rohith Sharmaks via wangda) (wangda: rev ebd797c48fe236b404cf3a125ac9d1f7714e291e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java > Fix DominantRC#compare() does not work as expected if cluster resource is > empty > --- > > Key: YARN-3733 > URL: https://issues.apache.org/jira/browse/YARN-3733 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 , 2 NM , 2 RM > one NM - 3 GB 6 v core >Reporter: Bibin A Chundatt >Assignee: Rohith >Priority: Blocker > Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, > 0002-YARN-3733.patch, YARN-3733.patch > > > Steps to reproduce > = > 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster) > 2. Configure map and reduce size to 512 MB after changing scheduler minimum > size to 512 MB > 3. Configure capacity scheduler and AM limit to .5 > (DominantResourceCalculator is configured) > 4. Submit 30 concurrent task > 5. Switch RM > Actual > = > For 12 Jobs AM gets allocated and all 12 starts running > No other Yarn child is initiated , *all 12 Jobs in Running state for ever* > Expected > === > Only 6 should be running at a time since max AM allocated is .5 (3072 MB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty
[ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573182#comment-14573182 ] Wangda Tan commented on YARN-3733: -- Great! Committing... > Fix DominantRC#compare() does not work as expected if cluster resource is > empty > --- > > Key: YARN-3733 > URL: https://issues.apache.org/jira/browse/YARN-3733 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 , 2 NM , 2 RM > one NM - 3 GB 6 v core >Reporter: Bibin A Chundatt >Assignee: Rohith >Priority: Blocker > Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, > 0002-YARN-3733.patch, YARN-3733.patch > > > Steps to reproduce > = > 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster) > 2. Configure map and reduce size to 512 MB after changing scheduler minimum > size to 512 MB > 3. Configure capacity scheduler and AM limit to .5 > (DominantResourceCalculator is configured) > 4. Submit 30 concurrent task > 5. Switch RM > Actual > = > For 12 Jobs AM gets allocated and all 12 starts running > No other Yarn child is initiated , *all 12 Jobs in Running state for ever* > Expected > === > Only 6 should be running at a time since max AM allocated is .5 (3072 MB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty
[ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3733: - Summary: Fix DominantRC#compare() does not work as expected if cluster resource is empty (was: DominantRC#compare() does not work as expected if cluster resource is empty) > Fix DominantRC#compare() does not work as expected if cluster resource is > empty > --- > > Key: YARN-3733 > URL: https://issues.apache.org/jira/browse/YARN-3733 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 , 2 NM , 2 RM > one NM - 3 GB 6 v core >Reporter: Bibin A Chundatt >Assignee: Rohith >Priority: Blocker > Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, > 0002-YARN-3733.patch, YARN-3733.patch > > > Steps to reproduce > = > 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster) > 2. Configure map and reduce size to 512 MB after changing scheduler minimum > size to 512 MB > 3. Configure capacity scheduler and AM limit to .5 > (DominantResourceCalculator is configured) > 4. Submit 30 concurrent task > 5. Switch RM > Actual > = > For 12 Jobs AM gets allocated and all 12 starts running > No other Yarn child is initiated , *all 12 Jobs in Running state for ever* > Expected > === > Only 6 should be running at a time since max AM allocated is .5 (3072 MB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573034#comment-14573034 ] Hudson commented on YARN-3762: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/]) YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) (kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java > FairScheduler: CME on FSParentQueue#getQueueUserAclInfo > --- > > Key: YARN-3762 > URL: https://issues.apache.org/jira/browse/YARN-3762 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.8.0 > > Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch > > > In our testing, we ran into the following ConcurrentModificationException: > {noformat} > halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0 > 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, > queueName=root.testyarnpool3, queueCurrentCapacity=0.0, > queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0 > 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client > java.util.ConcurrentModificationException: > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs
[ https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573039#comment-14573039 ] Hudson commented on YARN-3749: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/]) YARN-3749. We should make a copy of configuration when init (xgong: rev 5766a04428f65bb008b5c451f6f09e61e1000300) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java > We should make a copy of configuration when init MiniYARNCluster with > multiple RMs > -- > > Key: YARN-3749 > URL: https://issues.apache.org/jira/browse/YARN-3749 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chun Chen >Assignee: Chun Chen > Fix For: 2.8.0 > > Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, > YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, > YARN-3749.patch > > > When I was trying to write a test case for YARN-2674, I found DS client > trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 > when RM failover. But I initially set > yarn.resourcemanager.address.rm1=0.0.0.0:18032, > yarn.resourcemanager.address.rm2=0.0.0.0:28032 After digging, I found it is > in ClientRMService where the value of yarn.resourcemanager.address.rm2 > changed to 0.0.0.0:18032. See the following code in ClientRMService: > {code} > clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST, >YarnConfiguration.RM_ADDRESS, > > YarnConfiguration.DEFAULT_RM_ADDRESS, >server.getListenerAddress()); > {code} > Since we use the same instance of configuration in rm1 and rm2 and init both > RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 > during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during > starting of rm1. > So I think it is safe to make a copy of configuration when init both of the > rm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573043#comment-14573043 ] Hudson commented on YARN-3751: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/]) YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java > TestAHSWebServices fails after YARN-3467 > > > Key: YARN-3751 > URL: https://issues.apache.org/jira/browse/YARN-3751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-3751.patch > > > YARN-3467 changed AppInfo and assumed that used resource is not null. It's > not true as this information is not published to timeline server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
[ https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573045#comment-14573045 ] Hudson commented on YARN-3585: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/]) YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java > NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled > -- > > Key: YARN-3585 > URL: https://issues.apache.org/jira/browse/YARN-3585 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Rohith >Priority: Critical > Fix For: 2.7.1 > > Attachments: 0001-YARN-3585.patch, YARN-3585.patch > > > With NM recovery enabled, after decommission, nodemanager log show stop but > process cannot end. > non daemon thread: > {noformat} > "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on > condition [0x] > "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable > [0x] > "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable > "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 > nid=0x29ed runnable > "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 > nid=0x29ee runnable > "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 > nid=0x29ef runnable > "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 > nid=0x29f0 runnable > "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 > nid=0x29f1 runnable > "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 > nid=0x29f2 runnable > "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 > nid=0x29f3 runnable > "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 > nid=0x29f4 runnable > "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 > runnable > "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 > nid=0x29f5 runnable > "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 > nid=0x29f6 runnable > "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting > on condition > {noformat} > and jni leveldb thread stack > {noformat} > Thread 12 (Thread 0x7f33dd842700 (LWP 10903)): > #0 0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x7f33dfce2a3b in leveldb::(anonymous > namespace)::PosixEnv::BGThreadWrapper(void*) () from > /tmp/libleveldbjni-64-1-6922178968300745716.8 > #2 0x003d83407851 in start_thread () from /lib64/libpthread.so.0 > #3 0x003d830e811d in clone () from /lib64/libc.so.6 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573040#comment-14573040 ] Hudson commented on YARN-41: FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/]) YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceTracker.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYSCRPCFactories.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceTrackerPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceTrackerPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/ha
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573041#comment-14573041 ] Hudson commented on YARN-1462: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/]) Revert "YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in" (zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7) * hadoop-yarn-project/CHANGES.txt Revert "YARN-1462. Made RM write application tags to timeline server and exposed them to users via generic history web UI and REST API. Contributed by Xuan Gong." (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java > AHS API and other AHS changes to handle tags for completed MR jobs > -- > > Key: YARN-1462 > URL: https://issues.apache.org/jira/browse/YARN-1462 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-1462-branch-2.7-1.2.patch, > YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, > YARN-1462.3.patch, YARN-1462.4.patch > > > AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs
[ https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573011#comment-14573011 ] Hudson commented on YARN-3749: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/]) YARN-3749. We should make a copy of configuration when init (xgong: rev 5766a04428f65bb008b5c451f6f09e61e1000300) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java > We should make a copy of configuration when init MiniYARNCluster with > multiple RMs > -- > > Key: YARN-3749 > URL: https://issues.apache.org/jira/browse/YARN-3749 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chun Chen >Assignee: Chun Chen > Fix For: 2.8.0 > > Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, > YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, > YARN-3749.patch > > > When I was trying to write a test case for YARN-2674, I found DS client > trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 > when RM failover. But I initially set > yarn.resourcemanager.address.rm1=0.0.0.0:18032, > yarn.resourcemanager.address.rm2=0.0.0.0:28032 After digging, I found it is > in ClientRMService where the value of yarn.resourcemanager.address.rm2 > changed to 0.0.0.0:18032. See the following code in ClientRMService: > {code} > clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST, >YarnConfiguration.RM_ADDRESS, > > YarnConfiguration.DEFAULT_RM_ADDRESS, >server.getListenerAddress()); > {code} > Since we use the same instance of configuration in rm1 and rm2 and init both > RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 > during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during > starting of rm1. > So I think it is safe to make a copy of configuration when init both of the > rm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573013#comment-14573013 ] Hudson commented on YARN-1462: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/]) Revert "YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in" (zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7) * hadoop-yarn-project/CHANGES.txt Revert "YARN-1462. Made RM write application tags to timeline server and exposed them to users via generic history web UI and REST API. Contributed by Xuan Gong." (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java > AHS API and other AHS changes to handle tags for completed MR jobs > -- > > Key: YARN-1462 > URL: https://issues.apache.org/jira/browse/YARN-1462 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-1462-branch-2.7-1.2.patch, > YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, > YARN-1462.3.patch, YARN-1462.4.patch > > > AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
[ https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573017#comment-14573017 ] Hudson commented on YARN-3585: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/]) YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/CHANGES.txt > NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled > -- > > Key: YARN-3585 > URL: https://issues.apache.org/jira/browse/YARN-3585 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Rohith >Priority: Critical > Fix For: 2.7.1 > > Attachments: 0001-YARN-3585.patch, YARN-3585.patch > > > With NM recovery enabled, after decommission, nodemanager log show stop but > process cannot end. > non daemon thread: > {noformat} > "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on > condition [0x] > "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable > [0x] > "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable > "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 > nid=0x29ed runnable > "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 > nid=0x29ee runnable > "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 > nid=0x29ef runnable > "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 > nid=0x29f0 runnable > "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 > nid=0x29f1 runnable > "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 > nid=0x29f2 runnable > "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 > nid=0x29f3 runnable > "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 > nid=0x29f4 runnable > "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 > runnable > "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 > nid=0x29f5 runnable > "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 > nid=0x29f6 runnable > "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting > on condition > {noformat} > and jni leveldb thread stack > {noformat} > Thread 12 (Thread 0x7f33dd842700 (LWP 10903)): > #0 0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x7f33dfce2a3b in leveldb::(anonymous > namespace)::PosixEnv::BGThreadWrapper(void*) () from > /tmp/libleveldbjni-64-1-6922178968300745716.8 > #2 0x003d83407851 in start_thread () from /lib64/libpthread.so.0 > #3 0x003d830e811d in clone () from /lib64/libc.so.6 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573006#comment-14573006 ] Hudson commented on YARN-3762: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/]) YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) (kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java > FairScheduler: CME on FSParentQueue#getQueueUserAclInfo > --- > > Key: YARN-3762 > URL: https://issues.apache.org/jira/browse/YARN-3762 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.8.0 > > Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch > > > In our testing, we ran into the following ConcurrentModificationException: > {noformat} > halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0 > 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, > queueName=root.testyarnpool3, queueCurrentCapacity=0.0, > queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0 > 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client > java.util.ConcurrentModificationException: > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573015#comment-14573015 ] Hudson commented on YARN-3751: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/]) YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java > TestAHSWebServices fails after YARN-3467 > > > Key: YARN-3751 > URL: https://issues.apache.org/jira/browse/YARN-3751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-3751.patch > > > YARN-3467 changed AppInfo and assumed that used resource is not null. It's > not true as this information is not published to timeline server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573012#comment-14573012 ] Hudson commented on YARN-41: FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/]) YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceTracker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceTrackerPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeState.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceTrackerPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java * hadoop-yarn-project/hadoop-yarn/h
[jira] [Updated] (YARN-2573) Integrate ReservationSystem with the RM failover mechanism
[ https://issues.apache.org/jira/browse/YARN-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2573: Attachment: Design for Reservation HA.pdf Attaching design for the umbrella jira > Integrate ReservationSystem with the RM failover mechanism > -- > > Key: YARN-2573 > URL: https://issues.apache.org/jira/browse/YARN-2573 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: Design for Reservation HA.pdf > > > YARN-1051 introduces the ReservationSystem and the current implementation is > completely in-memory based. YARN-149 brings in the notion of RM HA with a > highly available state store. This JIRA proposes persisting the Plan into the > RMStateStore and recovering it post RM failover -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572897#comment-14572897 ] Hudson commented on YARN-3751: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/]) YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java > TestAHSWebServices fails after YARN-3467 > > > Key: YARN-3751 > URL: https://issues.apache.org/jira/browse/YARN-3751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-3751.patch > > > YARN-3467 changed AppInfo and assumed that used resource is not null. It's > not true as this information is not published to timeline server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs
[ https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572894#comment-14572894 ] Hudson commented on YARN-3749: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/]) YARN-3749. We should make a copy of configuration when init (xgong: rev 5766a04428f65bb008b5c451f6f09e61e1000300) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java > We should make a copy of configuration when init MiniYARNCluster with > multiple RMs > -- > > Key: YARN-3749 > URL: https://issues.apache.org/jira/browse/YARN-3749 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chun Chen >Assignee: Chun Chen > Fix For: 2.8.0 > > Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, > YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, > YARN-3749.patch > > > When I was trying to write a test case for YARN-2674, I found DS client > trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 > when RM failover. But I initially set > yarn.resourcemanager.address.rm1=0.0.0.0:18032, > yarn.resourcemanager.address.rm2=0.0.0.0:28032 After digging, I found it is > in ClientRMService where the value of yarn.resourcemanager.address.rm2 > changed to 0.0.0.0:18032. See the following code in ClientRMService: > {code} > clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST, >YarnConfiguration.RM_ADDRESS, > > YarnConfiguration.DEFAULT_RM_ADDRESS, >server.getListenerAddress()); > {code} > Since we use the same instance of configuration in rm1 and rm2 and init both > RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 > during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during > starting of rm1. > So I think it is safe to make a copy of configuration when init both of the > rm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
[ https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572899#comment-14572899 ] Hudson commented on YARN-3585: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/]) YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java > NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled > -- > > Key: YARN-3585 > URL: https://issues.apache.org/jira/browse/YARN-3585 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Rohith >Priority: Critical > Fix For: 2.7.1 > > Attachments: 0001-YARN-3585.patch, YARN-3585.patch > > > With NM recovery enabled, after decommission, nodemanager log show stop but > process cannot end. > non daemon thread: > {noformat} > "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on > condition [0x] > "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable > [0x] > "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable > "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 > nid=0x29ed runnable > "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 > nid=0x29ee runnable > "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 > nid=0x29ef runnable > "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 > nid=0x29f0 runnable > "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 > nid=0x29f1 runnable > "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 > nid=0x29f2 runnable > "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 > nid=0x29f3 runnable > "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 > nid=0x29f4 runnable > "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 > runnable > "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 > nid=0x29f5 runnable > "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 > nid=0x29f6 runnable > "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting > on condition > {noformat} > and jni leveldb thread stack > {noformat} > Thread 12 (Thread 0x7f33dd842700 (LWP 10903)): > #0 0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x7f33dfce2a3b in leveldb::(anonymous > namespace)::PosixEnv::BGThreadWrapper(void*) () from > /tmp/libleveldbjni-64-1-6922178968300745716.8 > #2 0x003d83407851 in start_thread () from /lib64/libpthread.so.0 > #3 0x003d830e811d in clone () from /lib64/libc.so.6 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572895#comment-14572895 ] Hudson commented on YARN-1462: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/]) Revert "YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in" (zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7) * hadoop-yarn-project/CHANGES.txt Revert "YARN-1462. Made RM write application tags to timeline server and exposed them to users via generic history web UI and REST API. Contributed by Xuan Gong." (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java > AHS API and other AHS changes to handle tags for completed MR jobs > -- > > Key: YARN-1462 > URL: https://issues.apache.org/jira/browse/YARN-1462 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-1462-branch-2.7-1.2.patch, > YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, > YARN-1462.3.patch, YARN-1462.4.patch > > > AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572889#comment-14572889 ] Hudson commented on YARN-3762: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/]) YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) (kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/CHANGES.txt > FairScheduler: CME on FSParentQueue#getQueueUserAclInfo > --- > > Key: YARN-3762 > URL: https://issues.apache.org/jira/browse/YARN-3762 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.8.0 > > Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch > > > In our testing, we ran into the following ConcurrentModificationException: > {noformat} > halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0 > 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, > queueName=root.testyarnpool3, queueCurrentCapacity=0.0, > queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0 > 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client > java.util.ConcurrentModificationException: > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-41: --- Release Note: The behavior of shutdown a NM could be different (if NM work preserving is not enabled): NM will unregister to RM immediately rather than waiting for timeout to be LOST. A new status of NodeStatus - SHUTDOWN is involved which could affect UI, CLI and ClusterMetrics for node's status. > The RM should handle the graceful shutdown of the NM. > - > > Key: YARN-41 > URL: https://issues.apache.org/jira/browse/YARN-41 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Ravi Teja Ch N V >Assignee: Devaraj K > Fix For: 2.8.0 > > Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, > MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, > YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, > YARN-41-8.patch, YARN-41.patch > > > Instead of waiting for the NM expiry, RM should remove and handle the NM, > which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572869#comment-14572869 ] Hudson commented on YARN-3751: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/]) YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java * hadoop-yarn-project/CHANGES.txt > TestAHSWebServices fails after YARN-3467 > > > Key: YARN-3751 > URL: https://issues.apache.org/jira/browse/YARN-3751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-3751.patch > > > YARN-3467 changed AppInfo and assumed that used resource is not null. It's > not true as this information is not published to timeline server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572867#comment-14572867 ] Hudson commented on YARN-1462: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/]) Revert "YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in" (zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7) * hadoop-yarn-project/CHANGES.txt Revert "YARN-1462. Made RM write application tags to timeline server and exposed them to users via generic history web UI and REST API. Contributed by Xuan Gong." (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java * hadoop-yarn-project/CHANGES.txt > AHS API and other AHS changes to handle tags for completed MR jobs > -- > > Key: YARN-1462 > URL: https://issues.apache.org/jira/browse/YARN-1462 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-1462-branch-2.7-1.2.patch, > YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, > YARN-1462.3.patch, YARN-1462.4.patch > > > AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs
[ https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572866#comment-14572866 ] Hudson commented on YARN-3749: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/]) YARN-3749. We should make a copy of configuration when init (xgong: rev 5766a04428f65bb008b5c451f6f09e61e1000300) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java > We should make a copy of configuration when init MiniYARNCluster with > multiple RMs > -- > > Key: YARN-3749 > URL: https://issues.apache.org/jira/browse/YARN-3749 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chun Chen >Assignee: Chun Chen > Fix For: 2.8.0 > > Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, > YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, > YARN-3749.patch > > > When I was trying to write a test case for YARN-2674, I found DS client > trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 > when RM failover. But I initially set > yarn.resourcemanager.address.rm1=0.0.0.0:18032, > yarn.resourcemanager.address.rm2=0.0.0.0:28032 After digging, I found it is > in ClientRMService where the value of yarn.resourcemanager.address.rm2 > changed to 0.0.0.0:18032. See the following code in ClientRMService: > {code} > clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST, >YarnConfiguration.RM_ADDRESS, > > YarnConfiguration.DEFAULT_RM_ADDRESS, >server.getListenerAddress()); > {code} > Since we use the same instance of configuration in rm1 and rm2 and init both > RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 > during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during > starting of rm1. > So I think it is safe to make a copy of configuration when init both of the > rm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
[ https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572871#comment-14572871 ] Hudson commented on YARN-3585: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/]) YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java > NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled > -- > > Key: YARN-3585 > URL: https://issues.apache.org/jira/browse/YARN-3585 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Rohith >Priority: Critical > Fix For: 2.7.1 > > Attachments: 0001-YARN-3585.patch, YARN-3585.patch > > > With NM recovery enabled, after decommission, nodemanager log show stop but > process cannot end. > non daemon thread: > {noformat} > "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on > condition [0x] > "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable > [0x] > "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable > "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 > nid=0x29ed runnable > "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 > nid=0x29ee runnable > "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 > nid=0x29ef runnable > "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 > nid=0x29f0 runnable > "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 > nid=0x29f1 runnable > "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 > nid=0x29f2 runnable > "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 > nid=0x29f3 runnable > "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 > nid=0x29f4 runnable > "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 > runnable > "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 > nid=0x29f5 runnable > "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 > nid=0x29f6 runnable > "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting > on condition > {noformat} > and jni leveldb thread stack > {noformat} > Thread 12 (Thread 0x7f33dd842700 (LWP 10903)): > #0 0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x7f33dfce2a3b in leveldb::(anonymous > namespace)::PosixEnv::BGThreadWrapper(void*) () from > /tmp/libleveldbjni-64-1-6922178968300745716.8 > #2 0x003d83407851 in start_thread () from /lib64/libpthread.so.0 > #3 0x003d830e811d in clone () from /lib64/libc.so.6 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572861#comment-14572861 ] Hudson commented on YARN-3762: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/]) YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) (kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/CHANGES.txt > FairScheduler: CME on FSParentQueue#getQueueUserAclInfo > --- > > Key: YARN-3762 > URL: https://issues.apache.org/jira/browse/YARN-3762 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.8.0 > > Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch > > > In our testing, we ran into the following ConcurrentModificationException: > {noformat} > halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0 > 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, > queueName=root.testyarnpool3, queueCurrentCapacity=0.0, > queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0 > 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client > java.util.ConcurrentModificationException: > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3758) The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not working as expected in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572657#comment-14572657 ] Naganarasimha G R commented on YARN-3758: - Hi [~rohithsharma] This issue is similar to the issue raised in YARN-3525, I feel if {{yarn.scheduler.minimum-allocation-mb}} is specific to capacity then better change it to {{yarn.scheduler.capacity.minimum-allocation-mb}} similar to the suggestion in YARN-3525. So that there is less confusion. Thoughts ? > The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not > working as expected in FairScheduler > > > Key: YARN-3758 > URL: https://issues.apache.org/jira/browse/YARN-3758 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: skrho > > Hello there~~ > I have 2 clusters > First cluster is 5 node , default 1 application queue, Capacity scheduler, 8G > Physical memory each node > Second cluster is 10 node, 2 application queuey, fair-scheduler, 230G > Physical memory each node > Wherever a mapreduce job is running, I want resourcemanager is to set the > minimum memory 256m to container > So I was changing configuration in yarn-site.xml & mapred-site.xml > yarn.scheduler.minimum-allocation-mb : 256 > mapreduce.map.java.opts : -Xms256m > mapreduce.reduce.java.opts : -Xms256m > mapreduce.map.memory.mb : 256 > mapreduce.reduce.memory.mb : 256 > In First cluster whenever a mapreduce job is running , I can see used memory > 256m in web console( http://installedIP:8088/cluster/nodes ) > But In Second cluster whenever a mapreduce job is running , I can see used > memory 1024m in web console( http://installedIP:8088/cluster/nodes ) > I know default memory value is 1024m, so if there is not changing memory > setting, the default value is working. > I have been testing for two weeks, but I don't know why mimimum memory > setting is not working in second cluster > Why this difference is happened? > Am I wrong setting configuration? > or Is there bug? > Thank you for reading~~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3758) The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not working as expected in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3758: - Summary: The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not working as expected in FairScheduler (was: The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not working in container) > The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not > working as expected in FairScheduler > > > Key: YARN-3758 > URL: https://issues.apache.org/jira/browse/YARN-3758 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: skrho > > Hello there~~ > I have 2 clusters > First cluster is 5 node , default 1 application queue, Capacity scheduler, 8G > Physical memory each node > Second cluster is 10 node, 2 application queuey, fair-scheduler, 230G > Physical memory each node > Wherever a mapreduce job is running, I want resourcemanager is to set the > minimum memory 256m to container > So I was changing configuration in yarn-site.xml & mapred-site.xml > yarn.scheduler.minimum-allocation-mb : 256 > mapreduce.map.java.opts : -Xms256m > mapreduce.reduce.java.opts : -Xms256m > mapreduce.map.memory.mb : 256 > mapreduce.reduce.memory.mb : 256 > In First cluster whenever a mapreduce job is running , I can see used memory > 256m in web console( http://installedIP:8088/cluster/nodes ) > But In Second cluster whenever a mapreduce job is running , I can see used > memory 1024m in web console( http://installedIP:8088/cluster/nodes ) > I know default memory value is 1024m, so if there is not changing memory > setting, the default value is working. > I have been testing for two weeks, but I don't know why mimimum memory > setting is not working in second cluster > Why this difference is happened? > Am I wrong setting configuration? > or Is there bug? > Thank you for reading~~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3758) The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not working in container
[ https://issues.apache.org/jira/browse/YARN-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572630#comment-14572630 ] Rohith commented on YARN-3758: -- bq. Is it bug ? To be clear, is the inconsistent behavior is bug? or implemented intentionally for FS? > The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not > working in container > > > Key: YARN-3758 > URL: https://issues.apache.org/jira/browse/YARN-3758 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: skrho > > Hello there~~ > I have 2 clusters > First cluster is 5 node , default 1 application queue, Capacity scheduler, 8G > Physical memory each node > Second cluster is 10 node, 2 application queuey, fair-scheduler, 230G > Physical memory each node > Wherever a mapreduce job is running, I want resourcemanager is to set the > minimum memory 256m to container > So I was changing configuration in yarn-site.xml & mapred-site.xml > yarn.scheduler.minimum-allocation-mb : 256 > mapreduce.map.java.opts : -Xms256m > mapreduce.reduce.java.opts : -Xms256m > mapreduce.map.memory.mb : 256 > mapreduce.reduce.memory.mb : 256 > In First cluster whenever a mapreduce job is running , I can see used memory > 256m in web console( http://installedIP:8088/cluster/nodes ) > But In Second cluster whenever a mapreduce job is running , I can see used > memory 1024m in web console( http://installedIP:8088/cluster/nodes ) > I know default memory value is 1024m, so if there is not changing memory > setting, the default value is working. > I have been testing for two weeks, but I don't know why mimimum memory > setting is not working in second cluster > Why this difference is happened? > Am I wrong setting configuration? > or Is there bug? > Thank you for reading~~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3758) The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not working in container
[ https://issues.apache.org/jira/browse/YARN-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572628#comment-14572628 ] Rohith commented on YARN-3758: -- Had looked into code for CS and FS. The minimum allocation understanding and its behavior is different acros CS and FS. # CS : It is straight forward that if any request with less than min-allocation-mb, then the CS normalize the request to min-allocation-mb. And containers are allocated with minimum-allocation-mb. # FS : if any request with less than min-allocation-mb then the FS normalize the request with the factor {{yarn.scheduler.increment-allocation-mb}}. Example in description, min-alocation-mb is 256mb, but increment-allocation-mb default 1024mb which always allocate 1024mb to containers. There is huge effect of {{yarn.scheduler.increment-allocation-mb}} which changes the requested memory and assign with newly calculated resource. The behavior is not consistent with CS and FS. I am not sure why there an additional configuration introduced in FS? Is it bug ? > The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not > working in container > > > Key: YARN-3758 > URL: https://issues.apache.org/jira/browse/YARN-3758 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: skrho > > Hello there~~ > I have 2 clusters > First cluster is 5 node , default 1 application queue, Capacity scheduler, 8G > Physical memory each node > Second cluster is 10 node, 2 application queuey, fair-scheduler, 230G > Physical memory each node > Wherever a mapreduce job is running, I want resourcemanager is to set the > minimum memory 256m to container > So I was changing configuration in yarn-site.xml & mapred-site.xml > yarn.scheduler.minimum-allocation-mb : 256 > mapreduce.map.java.opts : -Xms256m > mapreduce.reduce.java.opts : -Xms256m > mapreduce.map.memory.mb : 256 > mapreduce.reduce.memory.mb : 256 > In First cluster whenever a mapreduce job is running , I can see used memory > 256m in web console( http://installedIP:8088/cluster/nodes ) > But In Second cluster whenever a mapreduce job is running , I can see used > memory 1024m in web console( http://installedIP:8088/cluster/nodes ) > I know default memory value is 1024m, so if there is not changing memory > setting, the default value is working. > I have been testing for two weeks, but I don't know why mimimum memory > setting is not working in second cluster > Why this difference is happened? > Am I wrong setting configuration? > or Is there bug? > Thank you for reading~~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572618#comment-14572618 ] Hudson commented on YARN-1462: -- FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/948/]) Revert "YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in" (zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7) * hadoop-yarn-project/CHANGES.txt Revert "YARN-1462. Made RM write application tags to timeline server and exposed them to users via generic history web UI and REST API. Contributed by Xuan Gong." (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java > AHS API and other AHS changes to handle tags for completed MR jobs > -- > > Key: YARN-1462 > URL: https://issues.apache.org/jira/browse/YARN-1462 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-1462-branch-2.7-1.2.patch, > YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, > YARN-1462.3.patch, YARN-1462.4.patch > > > AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572611#comment-14572611 ] Hudson commented on YARN-3762: -- FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/948/]) YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) (kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java > FairScheduler: CME on FSParentQueue#getQueueUserAclInfo > --- > > Key: YARN-3762 > URL: https://issues.apache.org/jira/browse/YARN-3762 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.8.0 > > Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch > > > In our testing, we ran into the following ConcurrentModificationException: > {noformat} > halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0 > 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, > queueName=root.testyarnpool3, queueCurrentCapacity=0.0, > queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0 > 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client > java.util.ConcurrentModificationException: > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
[ https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572622#comment-14572622 ] Hudson commented on YARN-3585: -- FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/948/]) YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java > NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled > -- > > Key: YARN-3585 > URL: https://issues.apache.org/jira/browse/YARN-3585 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Rohith >Priority: Critical > Fix For: 2.7.1 > > Attachments: 0001-YARN-3585.patch, YARN-3585.patch > > > With NM recovery enabled, after decommission, nodemanager log show stop but > process cannot end. > non daemon thread: > {noformat} > "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on > condition [0x] > "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable > [0x] > "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable > "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 > nid=0x29ed runnable > "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 > nid=0x29ee runnable > "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 > nid=0x29ef runnable > "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 > nid=0x29f0 runnable > "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 > nid=0x29f1 runnable > "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 > nid=0x29f2 runnable > "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 > nid=0x29f3 runnable > "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 > nid=0x29f4 runnable > "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 > runnable > "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 > nid=0x29f5 runnable > "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 > nid=0x29f6 runnable > "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting > on condition > {noformat} > and jni leveldb thread stack > {noformat} > Thread 12 (Thread 0x7f33dd842700 (LWP 10903)): > #0 0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x7f33dfce2a3b in leveldb::(anonymous > namespace)::PosixEnv::BGThreadWrapper(void*) () from > /tmp/libleveldbjni-64-1-6922178968300745716.8 > #2 0x003d83407851 in start_thread () from /lib64/libpthread.so.0 > #3 0x003d830e811d in clone () from /lib64/libc.so.6 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572620#comment-14572620 ] Hudson commented on YARN-3751: -- FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/948/]) YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java > TestAHSWebServices fails after YARN-3467 > > > Key: YARN-3751 > URL: https://issues.apache.org/jira/browse/YARN-3751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-3751.patch > > > YARN-3467 changed AppInfo and assumed that used resource is not null. It's > not true as this information is not published to timeline server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs
[ https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572617#comment-14572617 ] Hudson commented on YARN-3749: -- FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/948/]) YARN-3749. We should make a copy of configuration when init (xgong: rev 5766a04428f65bb008b5c451f6f09e61e1000300) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java > We should make a copy of configuration when init MiniYARNCluster with > multiple RMs > -- > > Key: YARN-3749 > URL: https://issues.apache.org/jira/browse/YARN-3749 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chun Chen >Assignee: Chun Chen > Fix For: 2.8.0 > > Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, > YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, > YARN-3749.patch > > > When I was trying to write a test case for YARN-2674, I found DS client > trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 > when RM failover. But I initially set > yarn.resourcemanager.address.rm1=0.0.0.0:18032, > yarn.resourcemanager.address.rm2=0.0.0.0:28032 After digging, I found it is > in ClientRMService where the value of yarn.resourcemanager.address.rm2 > changed to 0.0.0.0:18032. See the following code in ClientRMService: > {code} > clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST, >YarnConfiguration.RM_ADDRESS, > > YarnConfiguration.DEFAULT_RM_ADDRESS, >server.getListenerAddress()); > {code} > Since we use the same instance of configuration in rm1 and rm2 and init both > RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 > during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during > starting of rm1. > So I think it is safe to make a copy of configuration when init both of the > rm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572609#comment-14572609 ] Hudson commented on YARN-41: FAILURE: Integrated in Hadoop-trunk-Commit #7963 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7963/]) YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceTrackerPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeState.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceTracker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceTrackerPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoo
[jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens
[ https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572604#comment-14572604 ] Hadoop QA commented on YARN-2674: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:red}-1{color} | javac | 7m 32s | The applied patch generated 1 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 39s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 30s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 1m 3s | Tests failed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 1m 52s | Tests passed in hadoop-yarn-server-tests. | | | | 40m 27s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-applications-distributedshell | | Failed unit tests | hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels | | | hadoop.yarn.applications.distributedshell.TestDSAppMaster | | | hadoop.yarn.applications.distributedshell.TestDistributedShell | | | hadoop.yarn.applications.distributedshell.TestDistributedShellWithRMHA | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737533/YARN-2674.3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e830207 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/8193/artifact/patchprocess/diffJavacWarnings.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8193/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-applications-distributedshell.html | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8193/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-tests test log | https://builds.apache.org/job/PreCommit-YARN-Build/8193/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8193/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8193/console | This message was automatically generated. > Distributed shell AM may re-launch containers if RM work preserving restart > happens > --- > > Key: YARN-2674 > URL: https://issues.apache.org/jira/browse/YARN-2674 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chun Chen >Assignee: Chun Chen > Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch > > > Currently, if RM work preserving restart happens while distributed shell is > running, distribute shell AM may re-launch all the containers, including > new/running/complete. We must make sure it won't re-launch the > running/complete containers. > We need to remove allocated containers from > AMRMClientImpl#remoteRequestsTable once AM receive them from RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572597#comment-14572597 ] Hudson commented on YARN-1462: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/]) Revert "YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in" (zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7) * hadoop-yarn-project/CHANGES.txt Revert "YARN-1462. Made RM write application tags to timeline server and exposed them to users via generic history web UI and REST API. Contributed by Xuan Gong." (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java > AHS API and other AHS changes to handle tags for completed MR jobs > -- > > Key: YARN-1462 > URL: https://issues.apache.org/jira/browse/YARN-1462 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-1462-branch-2.7-1.2.patch, > YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, > YARN-1462.3.patch, YARN-1462.4.patch > > > AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572599#comment-14572599 ] Hudson commented on YARN-3751: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/]) YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java > TestAHSWebServices fails after YARN-3467 > > > Key: YARN-3751 > URL: https://issues.apache.org/jira/browse/YARN-3751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-3751.patch > > > YARN-3467 changed AppInfo and assumed that used resource is not null. It's > not true as this information is not published to timeline server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
[ https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572601#comment-14572601 ] Hudson commented on YARN-3585: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/]) YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/CHANGES.txt > NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled > -- > > Key: YARN-3585 > URL: https://issues.apache.org/jira/browse/YARN-3585 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Rohith >Priority: Critical > Fix For: 2.7.1 > > Attachments: 0001-YARN-3585.patch, YARN-3585.patch > > > With NM recovery enabled, after decommission, nodemanager log show stop but > process cannot end. > non daemon thread: > {noformat} > "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on > condition [0x] > "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable > [0x] > "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable > "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 > nid=0x29ed runnable > "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 > nid=0x29ee runnable > "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 > nid=0x29ef runnable > "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 > nid=0x29f0 runnable > "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 > nid=0x29f1 runnable > "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 > nid=0x29f2 runnable > "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 > nid=0x29f3 runnable > "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 > nid=0x29f4 runnable > "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 > runnable > "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 > nid=0x29f5 runnable > "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 > nid=0x29f6 runnable > "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting > on condition > {noformat} > and jni leveldb thread stack > {noformat} > Thread 12 (Thread 0x7f33dd842700 (LWP 10903)): > #0 0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x7f33dfce2a3b in leveldb::(anonymous > namespace)::PosixEnv::BGThreadWrapper(void*) () from > /tmp/libleveldbjni-64-1-6922178968300745716.8 > #2 0x003d83407851 in start_thread () from /lib64/libpthread.so.0 > #3 0x003d830e811d in clone () from /lib64/libc.so.6 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572591#comment-14572591 ] Hudson commented on YARN-3762: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/]) YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) (kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java > FairScheduler: CME on FSParentQueue#getQueueUserAclInfo > --- > > Key: YARN-3762 > URL: https://issues.apache.org/jira/browse/YARN-3762 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.8.0 > > Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch > > > In our testing, we ran into the following ConcurrentModificationException: > {noformat} > halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0 > 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, > queueName=root.testyarnpool3, queueCurrentCapacity=0.0, > queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0 > 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client > java.util.ConcurrentModificationException: > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs
[ https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572596#comment-14572596 ] Hudson commented on YARN-3749: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/]) YARN-3749. We should make a copy of configuration when init (xgong: rev 5766a04428f65bb008b5c451f6f09e61e1000300) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java > We should make a copy of configuration when init MiniYARNCluster with > multiple RMs > -- > > Key: YARN-3749 > URL: https://issues.apache.org/jira/browse/YARN-3749 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chun Chen >Assignee: Chun Chen > Fix For: 2.8.0 > > Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, > YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, > YARN-3749.patch > > > When I was trying to write a test case for YARN-2674, I found DS client > trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 > when RM failover. But I initially set > yarn.resourcemanager.address.rm1=0.0.0.0:18032, > yarn.resourcemanager.address.rm2=0.0.0.0:28032 After digging, I found it is > in ClientRMService where the value of yarn.resourcemanager.address.rm2 > changed to 0.0.0.0:18032. See the following code in ClientRMService: > {code} > clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST, >YarnConfiguration.RM_ADDRESS, > > YarnConfiguration.DEFAULT_RM_ADDRESS, >server.getListenerAddress()); > {code} > Since we use the same instance of configuration in rm1 and rm2 and init both > RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 > during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during > starting of rm1. > So I think it is safe to make a copy of configuration when init both of the > rm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572584#comment-14572584 ] Junping Du commented on YARN-41: bq. These findbugs are not related to the patch here. Agree. Also, the test failure is not related and the same failure also show up in other patches, like: YARN-3248. We may should file a separated JIRA to fix this. Committing latest patch in. > The RM should handle the graceful shutdown of the NM. > - > > Key: YARN-41 > URL: https://issues.apache.org/jira/browse/YARN-41 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Ravi Teja Ch N V >Assignee: Devaraj K > Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, > MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, > YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, > YARN-41-8.patch, YARN-41.patch > > > Instead of waiting for the NM expiry, RM should remove and handle the NM, > which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens
[ https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572535#comment-14572535 ] Chun Chen commented on YARN-2674: - Upload YARN-2674.3.patch with a test case and more detailed comments. > Distributed shell AM may re-launch containers if RM work preserving restart > happens > --- > > Key: YARN-2674 > URL: https://issues.apache.org/jira/browse/YARN-2674 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chun Chen >Assignee: Chun Chen > Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch > > > Currently, if RM work preserving restart happens while distributed shell is > running, distribute shell AM may re-launch all the containers, including > new/running/complete. We must make sure it won't re-launch the > running/complete containers. > We need to remove allocated containers from > AMRMClientImpl#remoteRequestsTable once AM receive them from RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens
[ https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun Chen updated YARN-2674: Attachment: YARN-2674.3.patch > Distributed shell AM may re-launch containers if RM work preserving restart > happens > --- > > Key: YARN-2674 > URL: https://issues.apache.org/jira/browse/YARN-2674 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chun Chen >Assignee: Chun Chen > Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch > > > Currently, if RM work preserving restart happens while distributed shell is > running, distribute shell AM may re-launch all the containers, including > new/running/complete. We must make sure it won't re-launch the > running/complete containers. > We need to remove allocated containers from > AMRMClientImpl#remoteRequestsTable once AM receive them from RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures
[ https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572419#comment-14572419 ] Steve Loughran commented on YARN-2392: -- checkstyle {code} ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java:1464: '" Then click on links to logs of each attempt.\n"' have incorrect indentation level 8, expected level should be 10. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java:1020: Line is longer than 80 characters (found 81). {code} > add more diags about app retry limits on AM failures > > > Key: YARN-2392 > URL: https://issues.apache.org/jira/browse/YARN-2392 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: YARN-2392-001.patch, YARN-2392-002.patch, > YARN-2392-002.patch > > > # when an app fails the failure count is shown, but not what the global + > local limits are. If the two are different, they should both be printed. > # the YARN-2242 strings don't have enough whitespace between text and the URL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572376#comment-14572376 ] Devaraj K commented on YARN-41: --- {code:xml} -1 pre-patch 19m 45s Pre-patch trunk has 3 extant Findbugs (version 3.0.0) warnings. {code} These findbugs are not related to the patch here. > The RM should handle the graceful shutdown of the NM. > - > > Key: YARN-41 > URL: https://issues.apache.org/jira/browse/YARN-41 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Ravi Teja Ch N V >Assignee: Devaraj K > Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, > MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, > YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, > YARN-41-8.patch, YARN-41.patch > > > Instead of waiting for the NM expiry, RM should remove and handle the NM, > which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572318#comment-14572318 ] Hadoop QA commented on YARN-41: --- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 58s | Pre-patch trunk has 3 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 12 new or modified test files. | | {color:green}+1{color} | javac | 7m 45s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 3m 1s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 35s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 3s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 30s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 6m 16s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 50m 29s | Tests failed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 1m 52s | Tests passed in hadoop-yarn-server-tests. | | | | 110m 24s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12735565/YARN-41-8.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1bb79c9 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-tests test log | https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8192/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8192/console | This message was automatically generated. > The RM should handle the graceful shutdown of the NM. > - > > Key: YARN-41 > URL: https://issues.apache.org/jira/browse/YARN-41 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Ravi Teja Ch N V >Assignee: Devaraj K > Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, > MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, > YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, > YARN-41-8.patch, YARN-41.patch > > > Instead of waiting for the NM expiry, RM should remove and handle the NM, > which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID
[ https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572289#comment-14572289 ] Rohith commented on YARN-3017: -- Apoligies for coming very late into this issue.. Thinking that changing containerId format may breaks complatability when rolling upgrade has been done with RM HA + work preserving enabled? IIUC, using ZKRMStateStore, rolling upgrade can be done now. > ContainerID in ResourceManager Log Has Slightly Different Format From > AppAttemptID > -- > > Key: YARN-3017 > URL: https://issues.apache.org/jira/browse/YARN-3017 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: MUFEED USMAN >Priority: Minor > Labels: PatchAvailable > Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch > > > Not sure if this should be filed as a bug or not. > In the ResourceManager log in the events surrounding the creation of a new > application attempt, > ... > ... > 2014-11-14 17:45:37,258 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching > masterappattempt_1412150883650_0001_02 > ... > ... > The application attempt has the ID format "_1412150883650_0001_02". > Whereas the associated ContainerID goes by "_1412150883650_0001_02_". > ... > ... > 2014-11-14 17:45:37,260 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting > up > container Container: [ContainerId: container_1412150883650_0001_02_01, > NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: vCores:1, > disks:0.0>, Priority: 0, Token: Token { kind: ContainerToken, service: > 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02 > ... > ... > Curious to know if this is kept like that for a reason. If not while using > filtering tools to, say, grep events surrounding a specific attempt by the > numeric ID part information may slip out during troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)