[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998324#comment-13998324 ] Wangda Tan commented on YARN-2053: -- Sure, I'll do that, thanks for review! Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Attachments: YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2054) Poor defaults for YARN ZK configs for retries and retry-inteval
[ https://issues.apache.org/jira/browse/YARN-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998337#comment-13998337 ] Sandy Ryza commented on YARN-2054: -- +1 Poor defaults for YARN ZK configs for retries and retry-inteval --- Key: YARN-2054 URL: https://issues.apache.org/jira/browse/YARN-2054 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2054-1.patch Currenly, we have the following default values: # yarn.resourcemanager.zk-num-retries - 500 # yarn.resourcemanager.zk-retry-interval-ms - 2000 This leads to a cumulate 1000 seconds before the RM gives up trying to connect to the ZK. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1957) ProportionalCapacitPreemptionPolicy handling of corner cases...
[ https://issues.apache.org/jira/browse/YARN-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1957: -- Issue Type: Sub-task (was: Bug) Parent: YARN-45 ProportionalCapacitPreemptionPolicy handling of corner cases... --- Key: YARN-1957 URL: https://issues.apache.org/jira/browse/YARN-1957 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler, preemption Attachments: YARN-1957.patch, YARN-1957.patch, YARN-1957_test.patch The current version of ProportionalCapacityPreemptionPolicy should be improved to deal with the following two scenarios: 1) when rebalancing over-capacity allocations, it potentially preempts without considering the maxCapacity constraints of a queue (i.e., preempting possibly more than strictly necessary) 2) a zero capacity queue is preempted even if there is no demand (coherent with old use of zero-capacity to disabled queues) The proposed patch fixes both issues, and introduce few new test cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers
[ https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997192#comment-13997192 ] Wangda Tan commented on YARN-2017: -- bq. On a second thought, user might pass in a resource request with null capability. I would prefer not changing it. In fact, we can add many other null checks in many places. Changed the patch back. I think null capability should be checked by ApplicationMasterService and throw exception before passed in. So do or don't do null pointer checking should be fine :) Merge some of the common lib code in schedulers --- Key: YARN-2017 URL: https://issues.apache.org/jira/browse/YARN-2017 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2017.1.patch, YARN-2017.2.patch, YARN-2017.3.patch A bunch of same code is repeated among schedulers, e.g: between FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a common base. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2054) Poor defaults for YARN ZK configs for retries and retry-inteval
[ https://issues.apache.org/jira/browse/YARN-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998423#comment-13998423 ] Vinod Kumar Vavilapalli commented on YARN-2054: --- Sounds related to YARN-1878, though not exactly. If we want these configs to match up with yarn.resourcemanager.zk-timeout-ms and (as YARN-1878 is trying) if that can change, we need to somehow make them linked dynamically? Poor defaults for YARN ZK configs for retries and retry-inteval --- Key: YARN-2054 URL: https://issues.apache.org/jira/browse/YARN-2054 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2054-1.patch Currenly, we have the following default values: # yarn.resourcemanager.zk-num-retries - 500 # yarn.resourcemanager.zk-retry-interval-ms - 2000 This leads to a cumulate 1000 seconds before the RM gives up trying to connect to the ZK. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1916) Leveldb timeline store applies secondary filters incorrectly
[ https://issues.apache.org/jira/browse/YARN-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1916: -- Issue Type: Sub-task (was: Bug) Parent: YARN-1530 Leveldb timeline store applies secondary filters incorrectly Key: YARN-1916 URL: https://issues.apache.org/jira/browse/YARN-1916 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1916.1.patch When applying a secondary filter (fieldname:fieldvalue) in a get entities query, LeveldbTimelineStore retrieves entities that do not have the specified fieldname, in addition to correctly retrieving entities that have the fieldname with the specified fieldvalue. It should not return entities that do not have the fieldname. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995057#comment-13995057 ] Sunil G commented on YARN-2022: --- Thank you very much Carlo for the review. As per your concern about AM Container priority, I am using a static final variable named AM_CONTAINER_PRIORITY from RMAppAttemptImpl to check whether a container is AM or not. As per my code review, this variable is not been set by user [RM only uses this to create an AM container Resource Request]. Hence there is no much problem in using the same. Secondly for the corner cases, I agree with your point. In a specific corner case it is possible that 100% AM can take over a queue. 1. maximum-am-resource-percent is in cluster level and we can get maximum runnable applications. Actual total running applications count can also be fetched from all leaf queues. With these two, a checkpoint can be derived as you have mentioned. 2. user-limit-factor will set a user limit quota among total resources for each user. If preemption has to be done among applications, currently only application timestamp is considered [reverse order]. So how this factor can help in giving a checkpoint for saving AM. Could you please share your thoughts on this point. I will work on defining checkpoint for saving AM and will update. Meanwhile please check whether my explanation is in-line with you thoughts. Thank you. Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-1368) Common work to re-populate containers’ state into scheduler
[ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998164#comment-13998164 ] Alejandro Abdelnur edited comment on YARN-1368 at 5/15/14 5:08 AM: --- [~jianhe], I understand the patch is taking a different approach, which is based on the work Anubhav started. Instead hijacking the JIRA, the correct way should have been proposing -to the assignee/author of the original patch- the changes and offering to contribute/breakdown tasks. Please do so next time. was (Author: tucu00): [~ jianhe], I understand the patch is taking a different approach, which is based on the work Anubhav started. Instead hijacking the JIRA, the correct way should have been proposing -to the assignee/author of the original patch- the changes and offering to contribute/breakdown tasks. Please do so next time. Common work to re-populate containers’ state into scheduler --- Key: YARN-1368 URL: https://issues.apache.org/jira/browse/YARN-1368 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.combined.001.patch, YARN-1368.preliminary.patch YARN-1367 adds support for the NM to tell the RM about all currently running containers upon registration. The RM needs to send this information to the schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1957) ProportionalCapacitPreemptionPolicy handling of corner cases...
[ https://issues.apache.org/jira/browse/YARN-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998212#comment-13998212 ] Hudson commented on YARN-1957: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/]) YARN-1957. Consider the max capacity of the queue when computing the ideal capacity for preemption. Contributed by Carlo Curino (cdouglas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594414) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java ProportionalCapacitPreemptionPolicy handling of corner cases... --- Key: YARN-1957 URL: https://issues.apache.org/jira/browse/YARN-1957 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler, preemption Fix For: 3.0.0, 2.5.0, 2.4.1 Attachments: YARN-1957.patch, YARN-1957.patch, YARN-1957_test.patch The current version of ProportionalCapacityPreemptionPolicy should be improved to deal with the following two scenarios: 1) when rebalancing over-capacity allocations, it potentially preempts without considering the maxCapacity constraints of a queue (i.e., preempting possibly more than strictly necessary) 2) a zero capacity queue is preempted even if there is no demand (coherent with old use of zero-capacity to disabled queues) The proposed patch fixes both issues, and introduce few new test cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1982) Rename the daemon name to timelineserver
[ https://issues.apache.org/jira/browse/YARN-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998228#comment-13998228 ] Hudson commented on YARN-1982: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/]) YARN-1982. Renamed the daemon name to be TimelineServer instead of History Server and deprecated the old usage. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593748) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh Rename the daemon name to timelineserver Key: YARN-1982 URL: https://issues.apache.org/jira/browse/YARN-1982 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 2.4.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: cli Fix For: 2.5.0 Attachments: YARN-1982.1.patch Nowadays, it's confusing that we call the new component timeline server, but we use {code} yarn historyserver yarn-daemon.sh start historyserver {code} to start the daemon. Before the confusion keeps being propagated, we'd better to modify command line asap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1975) Used resources shows escaped html in CapacityScheduler and FairScheduler page
[ https://issues.apache.org/jira/browse/YARN-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998210#comment-13998210 ] Hudson commented on YARN-1975: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/]) YARN-1975. Fix yarn application CLI to print the scheme of the tracking url of failed/killed applications. Contributed by Junping Du (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593874) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java Used resources shows escaped html in CapacityScheduler and FairScheduler page - Key: YARN-1975 URL: https://issues.apache.org/jira/browse/YARN-1975 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.4.0 Reporter: Nathan Roberts Assignee: Mit Desai Fix For: 3.0.0, 2.4.1 Attachments: YARN-1975.patch, screenshot-1975.png Used resources displays as amp;lt;memory:, vCores;amp;gt; with capacity scheduler -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1976) Tracking url missing http protocol for FAILED application
[ https://issues.apache.org/jira/browse/YARN-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998173#comment-13998173 ] Hudson commented on YARN-1976: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/]) YARN-1976. Fix CHANGES.txt for YARN-1976. (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594123) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Tracking url missing http protocol for FAILED application - Key: YARN-1976 URL: https://issues.apache.org/jira/browse/YARN-1976 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Assignee: Junping Du Fix For: 2.4.1 Attachments: YARN-1976-v2.patch, YARN-1976.patch Run yarn application -list -appStates FAILED, It does not print http protocol name like FINISHED apps. {noformat} -bash-4.1$ yarn application -list -appStates FINISHED,FAILED,KILLED 14/04/15 23:55:07 INFO client.RMProxy: Connecting to ResourceManager at host Total number of applications (application-types: [] and states: [FINISHED, FAILED, KILLED]):4 Application-IdApplication-Name Application-Type User Queue State Final-State ProgressTracking-URL application_1397598467870_0004 Sleep job MAPREDUCEhrt_qa defaultFINISHED SUCCEEDED 100% http://host:19888/jobhistory/job/job_1397598467870_0004 application_1397598467870_0003 Sleep job MAPREDUCEhrt_qa defaultFINISHED SUCCEEDED 100% http://host:19888/jobhistory/job/job_1397598467870_0003 application_1397598467870_0002 Sleep job MAPREDUCEhrt_qa default FAILED FAILED 100% host:8088/cluster/app/application_1397598467870_0002 application_1397598467870_0001 word count MAPREDUCEhrt_qa defaultFINISHED SUCCEEDED 100% http://host:19888/jobhistory/job/job_1397598467870_0001 {noformat} It only prints 'host:8088/cluster/app/application_1397598467870_0002' instead 'http://host:8088/cluster/app/application_1397598467870_0002' -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2042) String shouldn't be compared using == in QueuePlacementRule#NestedUserQueue#getQueueForApp()
[ https://issues.apache.org/jira/browse/YARN-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998167#comment-13998167 ] Hudson commented on YARN-2042: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/]) YARN-2042. String shouldn't be compared using == in QueuePlacementRule#NestedUserQueue#getQueueForApp (Chen He via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594482) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java String shouldn't be compared using == in QueuePlacementRule#NestedUserQueue#getQueueForApp() Key: YARN-2042 URL: https://issues.apache.org/jira/browse/YARN-2042 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Assignee: Chen He Priority: Minor Attachments: YARN-2042.patch {code} if (queueName != null queueName != ) { {code} queueName.isEmpty() should be used instead of comparing against -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1937) Access control of per-framework data
[ https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1937: -- Attachment: YARN-1937.1.patch I created a patch to make a TimlineACLsManager, which will check whether the query user is going to be the owner of then timeline entity; if he is, he's going to retrieve the entity or the events of this entity; otherwise, he can not access the corresponding timeline data. To support the ACLs, I need to record the owner information of the timeline data when it is posted. I leverage the primary filter to store the owner information by reserving the timeline system filter key. Of course the system information will be masked before returning the timeline data back to the user. I upload the preliminary patch to demonstrate the idea, and will work on the test cases and complete local test. It is worth mentioning that: 1. I do access control at the granularity of timeline entity. We can definitely explore more fine-grained control, but I prefer keeping the thing simple initially. 2. Initially, I'm going to support access control that only the owner can access his timeline data. In the future, we can extend it to allow admin and configured user/group list. Will file a separate ticket for the follow-up work. Access control of per-framework data Key: YARN-1937 URL: https://issues.apache.org/jira/browse/YARN-1937 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1937.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2017) Merge some of the common lib code in schedulers
[ https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2017: -- Attachment: YARN-2017.4.patch Rebased the patch Merge some of the common lib code in schedulers --- Key: YARN-2017 URL: https://issues.apache.org/jira/browse/YARN-2017 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2017.1.patch, YARN-2017.2.patch, YARN-2017.3.patch, YARN-2017.4.patch A bunch of same code is repeated among schedulers, e.g: between FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a common base. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-570) Time strings are formated in different timezone
[ https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997193#comment-13997193 ] Akira AJISAKA commented on YARN-570: Attached a patch. With the patch, yarn.util.Times.format() renders as Wed May 14 10:24:29 JST 2014, which is consistent with MapReduce jobhistoryserver WebUI. bq. Can you update format() as well to print in the same style, if you agree? The format of JavaScript {{Date.toLocaleString()}} varies by the browser. In my environment: {code} Chrome: 2014/5/14 10:25:08 Safari: 2014年5月14日 10:25:08 JST {code} Therefore, it's impossible to update {{format()}} to print in the same style. Time strings are formated in different timezone --- Key: YARN-570 URL: https://issues.apache.org/jira/browse/YARN-570 Project: Hadoop YARN Issue Type: Bug Reporter: Peng Zhang Assignee: Akira AJISAKA Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch Time strings on different page are displayed in different timezone. If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as Wed, 10 Apr 2013 08:29:56 GMT If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 16:29:56 Same value, but different timezone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He reassigned YARN-1680: - Assignee: Chen He availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Chen He There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-896) Roll up for long-lived services in YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-896: -- Assignee: Xuan Gong Roll up for long-lived services in YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans Assignee: Xuan Gong YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2053: -- Issue Type: Sub-task (was: Bug) Parent: YARN-1489 Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Attachments: YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-2052: - Description: Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations. ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2022: -- Issue Type: Sub-task (was: Improvement) Parent: YARN-45 Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
[ https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He reassigned YARN-2034: - Assignee: Chen He Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect Key: YARN-2034 URL: https://issues.apache.org/jira/browse/YARN-2034 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe Assignee: Chen He Priority: Minor Attachments: YARN-2034.patch The description in yarn-default.xml for yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per local directory, but according to the code it's a setting for the entire node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1986) In Fifo Scheduler, node heartbeat in between creating app and attempt causes NPE
[ https://issues.apache.org/jira/browse/YARN-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1986: - Summary: In Fifo Scheduler, node heartbeat in between creating app and attempt causes NPE (was: After upgrade from 2.2.0 to 2.4.0, NPE on first job start.) In Fifo Scheduler, node heartbeat in between creating app and attempt causes NPE Key: YARN-1986 URL: https://issues.apache.org/jira/browse/YARN-1986 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Jon Bringhurst Assignee: Sandy Ryza Priority: Critical Attachments: YARN-1986-2.patch, YARN-1986-3.patch, YARN-1986-testcase.patch, YARN-1986.patch After upgrade from 2.2.0 to 2.4.0, NPE on first job start. -After RM was restarted, the job runs without a problem.- {noformat} 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) at java.lang.Thread.run(Thread.java:744) 19:11:13,443 INFO ResourceManager:604 - Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-1365: Attachment: YARN-1365.initial.patch This is change from the prototype that allows applications to register after an RM restart. Need to still add unit tests ApplicationMasterService to allow Register and Unregister of an app that was running before restart --- Key: YARN-1365 URL: https://issues.apache.org/jira/browse/YARN-1365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1365.initial.patch For an application that was running before restart, the ApplicationMasterService currently throws an exception when the app tries to make the initial register or final unregister call. These should succeed and the RMApp state machine should transition to completed like normal. Unregistration should succeed for an app that the RM considers complete since the RM may have died after saving completion in the store but before notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
Jason Lowe created YARN-2034: Summary: Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect Key: YARN-2034 URL: https://issues.apache.org/jira/browse/YARN-2034 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.0, 0.23.10 Reporter: Jason Lowe Priority: Minor The description for yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per local directory, but according to the code it's a setting for the entire node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-1368) Common work to re-populate containers’ state into scheduler
[ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998164#comment-13998164 ] Alejandro Abdelnur edited comment on YARN-1368 at 5/15/14 5:09 AM: --- [~jianhe], I understand the patch is taking a different approach, which is based on the work Anubhav started. Instead hijacking the JIRA, the correct way should have been proposing [to the assignee/author of the original patch] the changes and offering to contribute/breakdown tasks. Please do so next time. was (Author: tucu00): [~jianhe], I understand the patch is taking a different approach, which is based on the work Anubhav started. Instead hijacking the JIRA, the correct way should have been proposing -to the assignee/author of the original patch- the changes and offering to contribute/breakdown tasks. Please do so next time. Common work to re-populate containers’ state into scheduler --- Key: YARN-1368 URL: https://issues.apache.org/jira/browse/YARN-1368 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.combined.001.patch, YARN-1368.preliminary.patch YARN-1367 adds support for the NM to tell the RM about all currently running containers upon registration. The RM needs to send this information to the schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998192#comment-13998192 ] Hudson commented on YARN-1962: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/]) YARN-1962. Changed Timeline Service client configuration to be off by default given the non-readiness of the feature yet. Contributed by Mohammad Kamrul Islam. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593750) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml Timeline server is enabled by default - Key: YARN-1962 URL: https://issues.apache.org/jira/browse/YARN-1962 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Fix For: 2.4.1 Attachments: YARN-1962.1.patch, YARN-1962.2.patch Since Timeline server is not matured and secured yet, enabling it by default might create some confusion. We were playing with 2.4.0 and found a lot of exceptions for distributed shell example related to connection refused error. Btw, we didn't run TS because it is not secured yet. Although it is possible to explicitly turn it off through yarn-site config. In my opinion, this extra change for this new service is not worthy at this point,. This JIRA is to turn it off by default. If there is an agreement, i can put a simple patch about this. {noformat} 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.in14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
[jira] [Commented] (YARN-2042) String shouldn't be compared using == in QueuePlacementRule#NestedUserQueue#getQueueForApp()
[ https://issues.apache.org/jira/browse/YARN-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998205#comment-13998205 ] Hudson commented on YARN-2042: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/]) YARN-2042. String shouldn't be compared using == in QueuePlacementRule#NestedUserQueue#getQueueForApp (Chen He via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594482) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java String shouldn't be compared using == in QueuePlacementRule#NestedUserQueue#getQueueForApp() Key: YARN-2042 URL: https://issues.apache.org/jira/browse/YARN-2042 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Assignee: Chen He Priority: Minor Attachments: YARN-2042.patch {code} if (queueName != null queueName != ) { {code} queueName.isEmpty() should be used instead of comparing against -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2061) Revisit logging levels in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998251#comment-13998251 ] Karthik Kambatla commented on YARN-2061: # After loading state corresponding to one application. {code} LOG.info(Done Loading applications from ZK state store); {code} Revisit logging levels in ZKRMStateStore - Key: YARN-2061 URL: https://issues.apache.org/jira/browse/YARN-2061 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Ray Chiang Priority: Minor Labels: newbie ZKRMStateStore has a few places where it is logging at the INFO level. We should change these to DEBUG or TRACE level messages. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1987) Wrapper for leveldb DBIterator to aid in handling database exceptions
[ https://issues.apache.org/jira/browse/YARN-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998233#comment-13998233 ] Hudson commented on YARN-1987: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/]) YARN-1987. Wrapper for leveldb DBIterator to aid in handling database exceptions. (Jason Lowe via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593757) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/LeveldbIterator.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/utils * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/utils/TestLeveldbIterator.java Wrapper for leveldb DBIterator to aid in handling database exceptions - Key: YARN-1987 URL: https://issues.apache.org/jira/browse/YARN-1987 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.5.0 Attachments: YARN-1987.patch, YARN-1987v2.patch Per discussions in YARN-1984 and MAPREDUCE-5652, it would be nice to have a utility wrapper around leveldb's DBIterator to translate the raw RuntimeExceptions it can throw into DBExceptions to make it easier to handle database errors while iterating. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1957) ProportionalCapacitPreemptionPolicy handling of corner cases...
[ https://issues.apache.org/jira/browse/YARN-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998174#comment-13998174 ] Hudson commented on YARN-1957: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/]) YARN-1957. Consider the max capacity of the queue when computing the ideal capacity for preemption. Contributed by Carlo Curino (cdouglas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594414) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java ProportionalCapacitPreemptionPolicy handling of corner cases... --- Key: YARN-1957 URL: https://issues.apache.org/jira/browse/YARN-1957 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler, preemption Fix For: 3.0.0, 2.5.0, 2.4.1 Attachments: YARN-1957.patch, YARN-1957.patch, YARN-1957_test.patch The current version of ProportionalCapacityPreemptionPolicy should be improved to deal with the following two scenarios: 1) when rebalancing over-capacity allocations, it potentially preempts without considering the maxCapacity constraints of a queue (i.e., preempting possibly more than strictly necessary) 2) a zero capacity queue is preempted even if there is no demand (coherent with old use of zero-capacity to disabled queues) The proposed patch fixes both issues, and introduce few new test cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1751) Improve MiniYarnCluster and LogCLIHelpers for log aggregation testing
[ https://issues.apache.org/jira/browse/YARN-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993565#comment-13993565 ] Jason Lowe commented on YARN-1751: -- Despite them both being small changes, I think these should be separate JIRA since they're otherwise unrelated changes for different problems and can stand on their own. We can morph this JIRA into one of them and file a new one to cover the other. For the LogCLIHelpers change, I think it should be calling FileContext.getFileContext(remoteAppLogDir.toUri(), conf) in case the remoteAppLogDir is not on the default filesystem. There's also the question of whether it should guard against a null conf, since oddly despite LogCLIHelpers being Configurable it isn't using the config until after this change. I think I'm leaning towards leaving it null and letting the NPE occur so callers will fix it. We've had lots of performance problems and other weirdness in the past when code forgot to pass down a custom config and things sorta worked with the default one. +1 for the MiniYarnCluster change. Improve MiniYarnCluster and LogCLIHelpers for log aggregation testing - Key: YARN-1751 URL: https://issues.apache.org/jira/browse/YARN-1751 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Ming Ma Assignee: Ming Ma Attachments: YARN-1751-trunk.patch MiniYarnCluster specifies individual remote log aggregation root dir for each NM. Test code that uses MiniYarnCluster won't be able to get the value of log aggregation root dir. The following code isn't necessary in MiniYarnCluster. File remoteLogDir = new File(testWorkDir, MiniYARNCluster.this.getName() + -remoteLogDir-nm- + index); remoteLogDir.mkdir(); config.set(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, remoteLogDir.getAbsolutePath()); In LogCLIHelpers.java, dumpAllContainersLogs should pass its conf object to FileContext.getFileContext() call. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2064) MR job successful but Note: Container killed by the ApplicationMaster.
[ https://issues.apache.org/jira/browse/YARN-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998467#comment-13998467 ] Jian He commented on YARN-2064: --- Hi [~dlim1234], the message should be only saying some containers(map or reduce tasks) were killed by the AM during the runtime of AM. As long as you can see the SUCCEED state on the RM web UI, the job should be successful. You can also use yarn application -status to query the app status from CLI. Also, please ask such questions in Hadoop user group mailing list rather than here next time. JIRA site is supposed to be used for reporting issues not for answering general questions. thanks. MR job successful but Note: Container killed by the ApplicationMaster. -- Key: YARN-2064 URL: https://issues.apache.org/jira/browse/YARN-2064 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager, scheduler Reporter: dlim Hi, just a short question for everyone I got MR job run on YARN, normally for small jobs, it succeeded without any note in the URL page. However, when running long-running job, it ends with successful status but with note: Container killed by the ApplicationMaster. The job is still running and i hesitate to kill it. Anyone know if it is actually successful or not ?? I know there is a previous post on this, but the answers are not so clear for me. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues
[ https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992689#comment-13992689 ] Hudson commented on YARN-1864: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5597 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5597/]) YARN-1864. Add missing file FSQueueType.java (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593191) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueueType.java YARN-1864. Fair Scheduler Dynamic Hierarchical User Queues (Ashwin Shankar via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593190) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueuePlacementPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm Fair Scheduler Dynamic Hierarchical User Queues --- Key: YARN-1864 URL: https://issues.apache.org/jira/browse/YARN-1864 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Labels: scheduler Fix For: 2.5.0 Attachments: YARN-1864-v1.txt, YARN-1864-v2.txt, YARN-1864-v3.txt, YARN-1864-v4.txt, YARN-1864-v5.txt, YARN-1864-v6.txt, YARN-1864-v6.txt In Fair Scheduler, we want to be able to create user queues under any parent queue in the hierarchy. For eg. Say user1 submits a job to a parent queue called root.allUserQueues, we want be able to create a new queue called root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted by this user to root.allUserQueues will be run in this newly created root.allUserQueues.user1. This is very similar to the 'user-as-default' feature in Fair Scheduler which creates user queues under root queue. But we want the ability to create user queues under ANY parent queue. Why do we want this ? 1. Preemption : these dynamically created user queues can preempt each other if its fair share is not met. So there is fairness among users. User queues can also preempt other non-user leaf queue as well if below fair share. 2. Allocation to user queues : we want all the user queries(adhoc) to consume only a fraction of resources in the shared cluster. By creating this feature,we could do that by giving a fair share to the parent user queue which is then redistributed to all the dynamically created user queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2061) Revisit logging levels in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998322#comment-13998322 ] Ray Chiang commented on YARN-2061: -- One minor question. Looking at the Apache Commons Log Interface, it looks like the API expects the developer to always call is*Enabled() API before calling the actual Log.* function, but that's not used consistently in this class. Should I add that as well? Revisit logging levels in ZKRMStateStore - Key: YARN-2061 URL: https://issues.apache.org/jira/browse/YARN-2061 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Ray Chiang Priority: Minor Labels: newbie ZKRMStateStore has a few places where it is logging at the INFO level. We should change these to DEBUG or TRACE level messages. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997623#comment-13997623 ] Wangda Tan commented on YARN-2053: -- And I think this should be marked as critical or blocker bug, agree? Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Attachments: YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2042) String shouldn't be compared using == in QueuePlacementRule#NestedUserQueue#getQueueForApp()
[ https://issues.apache.org/jira/browse/YARN-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996665#comment-13996665 ] Henry Saputra commented on YARN-2042: - +1 for the patch String shouldn't be compared using == in QueuePlacementRule#NestedUserQueue#getQueueForApp() Key: YARN-2042 URL: https://issues.apache.org/jira/browse/YARN-2042 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Assignee: Chen He Priority: Minor Attachments: YARN-2042.patch {code} if (queueName != null queueName != ) { {code} queueName.isEmpty() should be used instead of comparing against -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1937) Add entity-level access control of the timeline data for owners only
[ https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1937: -- Attachment: YARN-1937.2.patch Upload a new patch: 1. Prevent modifying the timeline entity from being modified by other users (re-put a timeline entity) 2. Isolate the exception when checking the access for collection operations (getEntities/Events) 3. Add corresponding test cases to verify ACL behavior 4. Fixed a related bug in MemoryTimelineStore, which didn't do a deep copy before return an object. Add entity-level access control of the timeline data for owners only Key: YARN-1937 URL: https://issues.apache.org/jira/browse/YARN-1937 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1937.1.patch, YARN-1937.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2040) Recover information about finished containers
[ https://issues.apache.org/jira/browse/YARN-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993959#comment-13993959 ] Karthik Kambatla commented on YARN-2040: [~jlowe] - please close this as duplicate if any of the other sub-tasks are already handling this. Thanks. Recover information about finished containers - Key: YARN-2040 URL: https://issues.apache.org/jira/browse/YARN-2040 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla The NM should store and recover information about finished containers as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-570) Time strings are formated in different timezone
[ https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-570: --- Attachment: YARN-570.2.patch Time strings are formated in different timezone --- Key: YARN-570 URL: https://issues.apache.org/jira/browse/YARN-570 Project: Hadoop YARN Issue Type: Bug Reporter: Peng Zhang Assignee: Akira AJISAKA Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch Time strings on different page are displayed in different timezone. If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as Wed, 10 Apr 2013 08:29:56 GMT If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 16:29:56 Same value, but different timezone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application
[ https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998392#comment-13998392 ] Xuan Gong commented on YARN-941: I am starting to work on it. And will provide a proposal soon. RM Should have a way to update the tokens it has for a running application -- Key: YARN-941 URL: https://issues.apache.org/jira/browse/YARN-941 Project: Hadoop YARN Issue Type: Sub-task Reporter: Robert Joseph Evans Assignee: Xuan Gong When an application is submitted to the RM it includes with it a set of tokens that the RM will renew on behalf of the application, that will be passed to the AM when the application is launched, and will be used when launching the application to access HDFS to download files on behalf of the application. For long lived applications/services these tokens can expire, and then the tokens that the AM has will be invalid, and the tokens that the RM had will also not work to launch a new AM. We need to provide an API that will allow the RM to replace the current tokens for this application with a new set. To avoid any real race issues, I think this API should be something that the AM calls, so that the client can connect to the AM with a new set of tokens it got using kerberos, then the AM can inform the RM of the new set of tokens and quickly update its tokens internally to use these new ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
[ https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2034: - Description: The description in yarn-default.xml for yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per local directory, but according to the code it's a setting for the entire node. (was: The description for yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per local directory, but according to the code it's a setting for the entire node.) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect Key: YARN-2034 URL: https://issues.apache.org/jira/browse/YARN-2034 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe Priority: Minor The description in yarn-default.xml for yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per local directory, but according to the code it's a setting for the entire node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2018) TestClientRMService.testTokenRenewalWrongUser fails after HADOOP-10562
[ https://issues.apache.org/jira/browse/YARN-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992833#comment-13992833 ] Hudson commented on YARN-2018: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1777 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1777/]) YARN-2018. TestClientRMService.testTokenRenewalWrongUser fails after HADOOP-10562. (Contributed by Ming Ma) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592783) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java TestClientRMService.testTokenRenewalWrongUser fails after HADOOP-10562 Key: YARN-2018 URL: https://issues.apache.org/jira/browse/YARN-2018 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.5.0 Reporter: Tsuyoshi OZAWA Assignee: Ming Ma Attachments: YARN-2018.patch The test failure is observed on YARN-1945 and YARN-1861. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2054) Poor defaults for YARN ZK configs for retries and retry-inteval
[ https://issues.apache.org/jira/browse/YARN-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998439#comment-13998439 ] Xuan Gong commented on YARN-2054: - Agree with [~jianhe]. Poor defaults for YARN ZK configs for retries and retry-inteval --- Key: YARN-2054 URL: https://issues.apache.org/jira/browse/YARN-2054 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2054-1.patch Currenly, we have the following default values: # yarn.resourcemanager.zk-num-retries - 500 # yarn.resourcemanager.zk-retry-interval-ms - 2000 This leads to a cumulate 1000 seconds before the RM gives up trying to connect to the ZK. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1986) In Fifo Scheduler, node heartbeat in between creating app and attempt causes NPE
[ https://issues.apache.org/jira/browse/YARN-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998194#comment-13998194 ] Hudson commented on YARN-1986: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/]) YARN-1986. In Fifo Scheduler, node heartbeat in between creating app and attempt causes NPE (Hong Zhiguo via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594476) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java In Fifo Scheduler, node heartbeat in between creating app and attempt causes NPE Key: YARN-1986 URL: https://issues.apache.org/jira/browse/YARN-1986 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Jon Bringhurst Assignee: Hong Zhiguo Priority: Critical Fix For: 2.4.1 Attachments: YARN-1986-2.patch, YARN-1986-3.patch, YARN-1986-testcase.patch, YARN-1986.patch After upgrade from 2.2.0 to 2.4.0, NPE on first job start. -After RM was restarted, the job runs without a problem.- {noformat} 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) at java.lang.Thread.run(Thread.java:744) 19:11:13,443 INFO ResourceManager:604 - Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998227#comment-13998227 ] Hudson commented on YARN-1861: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/]) YARN-1861. Fixed a bug in RM to reset leader-election on fencing that was causing both RMs to be stuck in standby mode when automatic failover is enabled. Contributed by Karthik Kambatla and Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594356) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java Both RM stuck in standby mode when automatic failover is enabled Key: YARN-1861 URL: https://issues.apache.org/jira/browse/YARN-1861 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Karthik Kambatla Priority: Blocker Fix For: 2.4.1 Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, YARN-1861.5.patch, YARN-1861.7.patch, yarn-1861-1.patch, yarn-1861-6.patch In our HA tests we noticed that the tests got stuck because both RM's got into standby state and no one became active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler
[ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998164#comment-13998164 ] Alejandro Abdelnur commented on YARN-1368: -- [~ jianhe], I understand the patch is taking a different approach, which is based on the work Anubhav started. Instead hijacking the JIRA, the correct way should have been proposing -to the assignee/author of the original patch- the changes and offering to contribute/breakdown tasks. Please do so next time. Common work to re-populate containers’ state into scheduler --- Key: YARN-1368 URL: https://issues.apache.org/jira/browse/YARN-1368 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.combined.001.patch, YARN-1368.preliminary.patch YARN-1367 adds support for the NM to tell the RM about all currently running containers upon registration. The RM needs to send this information to the schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1751) Improve MiniYarnCluster for log aggregation testing
[ https://issues.apache.org/jira/browse/YARN-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998171#comment-13998171 ] Hudson commented on YARN-1751: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/]) YARN-1751. Improve MiniYarnCluster for log aggregation testing. Contributed by Ming Ma (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594275) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java Improve MiniYarnCluster for log aggregation testing --- Key: YARN-1751 URL: https://issues.apache.org/jira/browse/YARN-1751 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Ming Ma Assignee: Ming Ma Fix For: 3.0.0, 2.5.0 Attachments: YARN-1751-trunk.patch, YARN-1751.patch MiniYarnCluster specifies individual remote log aggregation root dir for each NM. Test code that uses MiniYarnCluster won't be able to get the value of log aggregation root dir. The following code isn't necessary in MiniYarnCluster. File remoteLogDir = new File(testWorkDir, MiniYARNCluster.this.getName() + -remoteLogDir-nm- + index); remoteLogDir.mkdir(); config.set(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, remoteLogDir.getAbsolutePath()); In LogCLIHelpers.java, dumpAllContainersLogs should pass its conf object to FileContext.getFileContext() call. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2016) Yarn getApplicationRequest start time range is not honored
[ https://issues.apache.org/jira/browse/YARN-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998507#comment-13998507 ] Junping Du commented on YARN-2016: -- bq. It would be good to have a unit test as I mentioned before. The test case I uploaded was specific to one issue, but tests with directions of the wire transfers and something like that would be also. May be that is something I will consider adding. [~venkatnrangan], you are right that end-to-end functional test (cover whole process of client, wire and server) like your demo test code is also very helpful. It would be great if you can file some JIRA and contribute it. I will help to review it. Thanks! Yarn getApplicationRequest start time range is not honored -- Key: YARN-2016 URL: https://issues.apache.org/jira/browse/YARN-2016 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Venkat Ranganathan Assignee: Junping Du Fix For: 2.4.1 Attachments: YARN-2016.patch, YarnTest.java When we query for the previous applications by creating an instance of GetApplicationsRequest and setting the start time range and application tag, we see that the start range provided is not honored and all applications with the tag are returned Attaching a reproducer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart
[ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998513#comment-13998513 ] Tsuyoshi OZAWA commented on YARN-556: - {code} Oh. Forgot to mention that. Anubhav Dhoot offered to split up the prototype into multiple patches, one for each of the sub-tasks. If I understand right, his prototype covers almost all the sub-tasks already created. {code} [~adhoot], thanks for your great work. I noticed that you attached a patch on YARN-1367. I'll comment there about the patch. RM Restart phase 2 - Work preserving restart Key: YARN-556 URL: https://issues.apache.org/jira/browse/YARN-556 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Bikas Saha Assignee: Bikas Saha Attachments: Work Preserving RM Restart.pdf, WorkPreservingRestartPrototype.001.patch YARN-128 covered storing the state needed for the RM to recover critical information. This umbrella jira will track changes needed to recover the running state of the cluster so that work can be preserved across RM restarts. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2055) Preemption: Jobs are failing due to AMs are getting launched and killed multiple times
[ https://issues.apache.org/jira/browse/YARN-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2055: -- Target Version/s: 2.5.0 Fix Version/s: (was: 2.1.0-beta) Preemption: Jobs are failing due to AMs are getting launched and killed multiple times -- Key: YARN-2055 URL: https://issues.apache.org/jira/browse/YARN-2055 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Mayank Bansal If Queue A does not have enough capacity to run AM, then AM will borrow capacity from queue B to run AM in that case AM will be killed if queue B will reclaim its capacity and again AM will be launched and killed again, in that case job will be failed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers
[ https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998321#comment-13998321 ] Hadoop QA commented on YARN-2017: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12644678/YARN-2017.3.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3745//console This message is automatically generated. Merge some of the common lib code in schedulers --- Key: YARN-2017 URL: https://issues.apache.org/jira/browse/YARN-2017 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2017.1.patch, YARN-2017.2.patch, YARN-2017.3.patch A bunch of same code is repeated among schedulers, e.g: between FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a common base. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1550) NPE in FairSchedulerAppsBlock#render
[ https://issues.apache.org/jira/browse/YARN-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997742#comment-13997742 ] Karthik Kambatla commented on YARN-1550: The patch doesn't apply anymore. [~fengshen] - mind updating the patch against latest trunk? NPE in FairSchedulerAppsBlock#render Key: YARN-1550 URL: https://issues.apache.org/jira/browse/YARN-1550 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Reporter: caolong Priority: Critical Fix For: 2.2.1 Attachments: YARN-1550.patch three Steps : 1、debug at RMAppManager#submitApplication after code if (rmContext.getRMApps().putIfAbsent(applicationId, application) != null) { String message = Application with id + applicationId + is already present! Cannot add a duplicate!; LOG.warn(message); throw RPCUtil.getRemoteException(message); } 2、submit one application:hadoop jar ~/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-ydh2.2.0-tests.jar sleep -Dhadoop.job.ugi=test2,#11 -Dmapreduce.job.queuename=p1 -m 1 -mt 1 -r 1 3、go in page :http://ip:50030/cluster/scheduler and find 500 ERROR! the log: {noformat} 2013-12-30 11:51:43,795 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /cluster/scheduler java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerAppsBlock.render(FairSchedulerAppsBlock.java:96) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2048) List all of the containers of an application from the yarn web
[ https://issues.apache.org/jira/browse/YARN-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated YARN-2048: --- Affects Version/s: 2.5.0 2.3.0 2.4.0 List all of the containers of an application from the yarn web -- Key: YARN-2048 URL: https://issues.apache.org/jira/browse/YARN-2048 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, webapp Affects Versions: 2.3.0, 2.4.0, 2.5.0 Reporter: Min Zhou Attachments: YARN-2048-trunk-v1.patch Currently, Yarn haven't provide a way to list all of the containers of an application from its web. This kind of information is needed by the application user. They can conveniently know how many containers their applications already acquired as well as which nodes those containers were launched on. They also want to view the logs of each container of an application. One approach is maintain a container list in RMAppImpl and expose this info to Application page. I will submit a patch soon -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1489) [Umbrella] Work-preserving ApplicationMaster restart
[ https://issues.apache.org/jira/browse/YARN-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993962#comment-13993962 ] Karthik Kambatla commented on YARN-1489: Created a couple of sub-tasks based on an offline discussion with Anubhav, Bikas, Jian and Vinod. [Umbrella] Work-preserving ApplicationMaster restart Key: YARN-1489 URL: https://issues.apache.org/jira/browse/YARN-1489 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: Work preserving AM restart.pdf Today if AMs go down, - RM kills all the containers of that ApplicationAttempt - New ApplicationAttempt doesn't know where the previous containers are running - Old running containers don't know where the new AM is running. We need to fix this to enable work-preserving AM restart. The later two potentially can be done at the app level, but it is good to have a common solution for all apps where-ever possible. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2036) Document yarn.resourcemanager.hostname in ClusterSetup
[ https://issues.apache.org/jira/browse/YARN-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993888#comment-13993888 ] Karthik Kambatla commented on YARN-2036: Looks good to me. +1, pending Jenkins. Document yarn.resourcemanager.hostname in ClusterSetup -- Key: YARN-2036 URL: https://issues.apache.org/jira/browse/YARN-2036 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Ray Chiang Priority: Minor Fix For: 2.5.0 Attachments: YARN2036-01.patch, YARN2036-02.patch ClusterSetup doesn't talk about yarn.resourcemanager.hostname - most people should just be able to use that directly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-1039: --- Assignee: Xuan Gong Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Xuan Gong Priority: Minor A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998396#comment-13998396 ] Xuan Gong commented on YARN-1039: - Start to work on it. Will provide a proposal soon. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Xuan Gong Priority: Minor A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2053: - Attachment: YARN-2053.patch Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Attachments: YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1227) Update Single Cluster doc to use yarn.resourcemanager.hostname
[ https://issues.apache.org/jira/browse/YARN-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992678#comment-13992678 ] Akira AJISAKA commented on YARN-1227: - Single Cluster doc was updated in HADOOP-10139 to set the minimal configuration, and that's why yarn.resourcemanager.address, yarn.resourcemanager.scheduler.address, etc., were removed. Update Single Cluster doc to use yarn.resourcemanager.hostname -- Key: YARN-1227 URL: https://issues.apache.org/jira/browse/YARN-1227 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Ray Chiang Labels: newbie Now that yarn.resourcemanager.hostname can be used in place or yarn.resourcemanager.address, yarn.resourcemanager.scheduler.address, etc., we should update the doc to use it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996282#comment-13996282 ] Carlo Curino commented on YARN-2022: Sunil, The problem with AM_CONTAINER_PRIORITY is that it is just a short cut for setting Priority = 0; The use can easily do so from its own code, and unless there are explicit checks that prevent ResourceRequest to assign priority = 0 to all their containers, we have no defense against user abuses. The two options I see are: * we track which container is the AM not via Priority and protect the AM container from preemption whenever possible * we assign a quota of protected-from-preemption containers, and save whichever containers have the lowest priority and fit within the quota. This way the user can specify multiple containers at Priority=0 (think a replicated-AM or some other critical service for the job) and we will save as many of those as it fits in the quota. I think we are agreeing on max-am-percentage... the final goal is to make sure that after preemption the max-am-resource-percent is respected (i.e., no more than a certain amount of the queue is dedicated to AMs). The problem with user-limit-factor goes like this: * Given a queue A of capacity: 10%, max-capacity = 50%, and user-limit-factor = 2 (i.e., a single user can go up to 20% of total resources) * Only one user is active in this queue and it gets 20% of resources (this also require low activity in other queues) * The overall cluster capacity is reduced (e.g., a failing rack) or a refresh of the queues as reduced this queue capacity * The LeafQueue scheduler keeps skipping the scheduling for this user (since he is now over its user-limit-factor) although no other user in the cluster is asking for resources * If we ever get to this situation with the user holding only AMs the system is completely wedged, with the AMs waiting for more containers, and the system systematically skipping this user (as he is above its user-limit-factor). If preemption proceeds systematically killing resources *including* AMs, the chances of this happening are rather low (the head of the queue is only AMs, while the tail contained AMs and other containers), but as we save AMs from preemption, this bad corner case is maybe a little more likely to happen. What I am trying to affect with my comments is that as we try to evolve preemption further, we should look at all the invariants of a queue, and try to make sure that our preemption policy can re-establish not only the capacity invariant but also all others invariants. The CS relies on those invariants heavily, and misbehave if they are violated. An example of this is YARN-1957, where we introduce better handling for max-capacity and zero-size queues. The changes you are proposing are not creating the problem, just making it more likely to happen in practice. A well tuned CS and reasonable load are unlikely to trigger this, but we should build for robustness as much as possible, since we cannot rely on users to understand this internals and tune the CS defensively. [~acmurthy] any thoughts on this? Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2048) List all of the containers of an application from the yarn web
[ https://issues.apache.org/jira/browse/YARN-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996122#comment-13996122 ] Wangda Tan commented on YARN-2048: -- Thanks [~zjshen] and [~coderplay] for this explanation. Now I can understand contexts of them. List all of the containers of an application from the yarn web -- Key: YARN-2048 URL: https://issues.apache.org/jira/browse/YARN-2048 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, webapp Affects Versions: 2.3.0, 2.4.0, 2.5.0 Reporter: Min Zhou Attachments: YARN-2048-trunk-v1.patch Currently, Yarn haven't provide a way to list all of the containers of an application from its web. This kind of information is needed by the application user. They can conveniently know how many containers their applications already acquired as well as which nodes those containers were launched on. They also want to view the logs of each container of an application. One approach is maintain a container list in RMAppImpl and expose this info to Application page. I will submit a patch soon -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services
[ https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992718#comment-13992718 ] Hadoop QA commented on YARN-1702: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643931/apache-yarn-1702.9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3719//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3719//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3719//console This message is automatically generated. Expose kill app functionality as part of RM web services Key: YARN-1702 URL: https://issues.apache.org/jira/browse/YARN-1702 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch, apache-yarn-1702.7.patch, apache-yarn-1702.8.patch, apache-yarn-1702.9.patch Expose functionality to kill an app via the ResourceManager web services API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1981) Nodemanager version is not updated when a node reconnects
[ https://issues.apache.org/jira/browse/YARN-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998229#comment-13998229 ] Hudson commented on YARN-1981: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/]) YARN-1981. Nodemanager version is not updated when a node reconnects (Jason Lowe via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594358) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java Nodemanager version is not updated when a node reconnects - Key: YARN-1981 URL: https://issues.apache.org/jira/browse/YARN-1981 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 3.0.0, 2.5.0 Attachments: YARN-1981.patch When a nodemanager is quickly restarted and happens to change versions during the restart (e.g.: rolling upgrade scenario) the NM version as reported by the RM is not updated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1976) Tracking url missing http protocol for FAILED application
[ https://issues.apache.org/jira/browse/YARN-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998211#comment-13998211 ] Hudson commented on YARN-1976: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/]) YARN-1976. Fix CHANGES.txt for YARN-1976. (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594123) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Tracking url missing http protocol for FAILED application - Key: YARN-1976 URL: https://issues.apache.org/jira/browse/YARN-1976 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Assignee: Junping Du Fix For: 2.4.1 Attachments: YARN-1976-v2.patch, YARN-1976.patch Run yarn application -list -appStates FAILED, It does not print http protocol name like FINISHED apps. {noformat} -bash-4.1$ yarn application -list -appStates FINISHED,FAILED,KILLED 14/04/15 23:55:07 INFO client.RMProxy: Connecting to ResourceManager at host Total number of applications (application-types: [] and states: [FINISHED, FAILED, KILLED]):4 Application-IdApplication-Name Application-Type User Queue State Final-State ProgressTracking-URL application_1397598467870_0004 Sleep job MAPREDUCEhrt_qa defaultFINISHED SUCCEEDED 100% http://host:19888/jobhistory/job/job_1397598467870_0004 application_1397598467870_0003 Sleep job MAPREDUCEhrt_qa defaultFINISHED SUCCEEDED 100% http://host:19888/jobhistory/job/job_1397598467870_0003 application_1397598467870_0002 Sleep job MAPREDUCEhrt_qa default FAILED FAILED 100% host:8088/cluster/app/application_1397598467870_0002 application_1397598467870_0001 word count MAPREDUCEhrt_qa defaultFINISHED SUCCEEDED 100% http://host:19888/jobhistory/job/job_1397598467870_0001 {noformat} It only prints 'host:8088/cluster/app/application_1397598467870_0002' instead 'http://host:8088/cluster/app/application_1397598467870_0002' -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1362) Distinguish between nodemanager shutdown for decommission vs shutdown for restart
[ https://issues.apache.org/jira/browse/YARN-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998225#comment-13998225 ] Hudson commented on YARN-1362: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/]) YARN-1362. Distinguish between nodemanager shutdown for decommission vs shutdown for restart. (Contributed by Jason Lowe) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594421) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java Distinguish between nodemanager shutdown for decommission vs shutdown for restart - Key: YARN-1362 URL: https://issues.apache.org/jira/browse/YARN-1362 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.5.0 Attachments: YARN-1362.patch When a nodemanager shuts down it needs to determine if it is likely to be restarted. If a restart is likely then it needs to preserve container directories, logs, distributed cache entries, etc. If it is being shutdown more permanently (e.g.: like a decommission) then the nodemanager should cleanup directories and logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2027) YARN ignores host-specific resource requests
[ https://issues.apache.org/jira/browse/YARN-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997750#comment-13997750 ] Sandy Ryza commented on YARN-2027: -- Including a rack in your request will allow containers to go anywhere on the rack, even when relaxLocality is set to false. From the AMRMClient.ContainerRequest doc: If locality relaxation is disabled, then only within the same request, a node and its rack may be specified together. This allows for a specific rack with a preference for a specific node within that rack. So try passing in the rack list as null instead of List(/default-rack).toArray[String]. YARN ignores host-specific resource requests Key: YARN-2027 URL: https://issues.apache.org/jira/browse/YARN-2027 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.0 Environment: RHEL 6.1 YARN 2.4 Reporter: Chris Riccomini YARN appears to be ignoring host-level ContainerRequests. I am creating a container request with code that pretty closely mirrors the DistributedShell code: {code} protected def requestContainers(memMb: Int, cpuCores: Int, containers: Int) { info(Requesting %d container(s) with %dmb of memory format (containers, memMb)) val capability = Records.newRecord(classOf[Resource]) val priority = Records.newRecord(classOf[Priority]) priority.setPriority(0) capability.setMemory(memMb) capability.setVirtualCores(cpuCores) // Specifying a host in the String[] host parameter here seems to do nothing. Setting relaxLocality to false also doesn't help. (0 until containers).foreach(idx = amClient.addContainerRequest(new ContainerRequest(capability, null, null, priority))) } {code} When I run this code with a specific host in the ContainerRequest, YARN does not honor the request. Instead, it puts the container on an arbitrary host. This appears to be true for both the FifoScheduler and the CapacityScheduler. Currently, we are running the CapacityScheduler with the following settings: {noformat} configuration property nameyarn.scheduler.capacity.maximum-applications/name value1/value description Maximum number of applications that can be pending and running. /description /property property nameyarn.scheduler.capacity.maximum-am-resource-percent/name value0.1/value description Maximum percent of resources in the cluster which can be used to run application masters i.e. controls number of concurrent running applications. /description /property property nameyarn.scheduler.capacity.resource-calculator/name valueorg.apache.hadoop.yarn.util.resource.DefaultResourceCalculator/value description The ResourceCalculator implementation to be used to compare Resources in the scheduler. The default i.e. DefaultResourceCalculator only uses Memory while DominantResourceCalculator uses dominant-resource to compare multi-dimensional resources such as Memory, CPU etc. /description /property property nameyarn.scheduler.capacity.root.queues/name valuedefault/value description The queues at the this level (root is the root queue). /description /property property nameyarn.scheduler.capacity.root.default.capacity/name value100/value descriptionSamza queue target capacity./description /property property nameyarn.scheduler.capacity.root.default.user-limit-factor/name value1/value description Default queue user limit a percentage from 0.0 to 1.0. /description /property property nameyarn.scheduler.capacity.root.default.maximum-capacity/name value100/value description The maximum capacity of the default queue. /description /property property nameyarn.scheduler.capacity.root.default.state/name valueRUNNING/value description The state of the default queue. State can be one of RUNNING or STOPPED. /description /property property nameyarn.scheduler.capacity.root.default.acl_submit_applications/name value*/value description The ACL of who can submit jobs to the default queue. /description /property property nameyarn.scheduler.capacity.root.default.acl_administer_queue/name value*/value description The ACL of who can administer jobs on the default queue. /description /property property nameyarn.scheduler.capacity.node-locality-delay/name value40/value description Number of missed scheduling opportunities after which the CapacityScheduler attempts to schedule rack-local containers. Typically
[jira] [Updated] (YARN-1937) Add entity-level access control of the timeline data for owners only
[ https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1937: -- Summary: Add entity-level access control of the timeline data for owners only (was: Access control of per-framework data) Add entity-level access control of the timeline data for owners only Key: YARN-1937 URL: https://issues.apache.org/jira/browse/YARN-1937 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1937.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2059) Extend access control for admin and configured user/group list
Zhijie Shen created YARN-2059: - Summary: Extend access control for admin and configured user/group list Key: YARN-2059 URL: https://issues.apache.org/jira/browse/YARN-2059 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-896) Roll up for long-lived services in YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-896: --- Assignee: (was: Xuan Gong) Roll up for long-lived services in YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1104) NMs to support rolling logs of stdout stderr
[ https://issues.apache.org/jira/browse/YARN-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998397#comment-13998397 ] Xuan Gong commented on YARN-1104: - Start to work on it. Will provide a proposal soon. NMs to support rolling logs of stdout stderr -- Key: YARN-1104 URL: https://issues.apache.org/jira/browse/YARN-1104 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Steve Loughran Assignee: Xuan Gong Currently NMs stream the stdout and stderr streams of a container to a file. For longer lived processes those files need to be rotated so that the log doesn't overflow -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1986) After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
[ https://issues.apache.org/jira/browse/YARN-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997224#comment-13997224 ] Sandy Ryza commented on YARN-1986: -- Sorry for being so slow on this. +1 to the change. I looked at the code for the fair and capacity schedulers and they don't seem to face the same issue. After upgrade from 2.2.0 to 2.4.0, NPE on first job start. -- Key: YARN-1986 URL: https://issues.apache.org/jira/browse/YARN-1986 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Jon Bringhurst Assignee: Hong Zhiguo Priority: Critical Attachments: YARN-1986-2.patch, YARN-1986-3.patch, YARN-1986-testcase.patch, YARN-1986.patch After upgrade from 2.2.0 to 2.4.0, NPE on first job start. -After RM was restarted, the job runs without a problem.- {noformat} 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) at java.lang.Thread.run(Thread.java:744) 19:11:13,443 INFO ResourceManager:604 - Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2064) MR job successful but Note: Container killed by the ApplicationMaster.
dlim created YARN-2064: -- Summary: MR job successful but Note: Container killed by the ApplicationMaster. Key: YARN-2064 URL: https://issues.apache.org/jira/browse/YARN-2064 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager, scheduler Reporter: dlim Hi, just a short question for everyone I got MR job run on YARN, normally for small jobs, it succeeded without any note in the URL page. However, when running long-running job, it ends with successful status but with note: Container killed by the ApplicationMaster. The job is still running and i hesitate to kill it. Anyone know if it is actually successful or not ?? I know there is a previous post on this, but the answers are not so clear for me. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1993) Cross-site scripting vulnerability in TextView.java
[ https://issues.apache.org/jira/browse/YARN-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima updated YARN-1993: -- Attachment: YARN-1993.patch For example, how about to use StringEscapeUtils like this patch? Cross-site scripting vulnerability in TextView.java --- Key: YARN-1993 URL: https://issues.apache.org/jira/browse/YARN-1993 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Ted Yu Attachments: YARN-1993.patch In hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/TextView.java , method echo() e.g. : {code} for (Object s : args) { out.print(s); } {code} Printing s to an HTML page allows cross-site scripting, because it was not properly sanitized for context HTML attribute name. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1986) After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
[ https://issues.apache.org/jira/browse/YARN-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned YARN-1986: Assignee: Sandy Ryza (was: Hong Zhiguo) After upgrade from 2.2.0 to 2.4.0, NPE on first job start. -- Key: YARN-1986 URL: https://issues.apache.org/jira/browse/YARN-1986 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Jon Bringhurst Assignee: Sandy Ryza Priority: Critical Attachments: YARN-1986-2.patch, YARN-1986-3.patch, YARN-1986-testcase.patch, YARN-1986.patch After upgrade from 2.2.0 to 2.4.0, NPE on first job start. -After RM was restarted, the job runs without a problem.- {noformat} 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) at java.lang.Thread.run(Thread.java:744) 19:11:13,443 INFO ResourceManager:604 - Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2064) MR job successful but Note: Container killed by the ApplicationMaster.
[ https://issues.apache.org/jira/browse/YARN-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He resolved YARN-2064. --- Resolution: Not a Problem Closed this. MR job successful but Note: Container killed by the ApplicationMaster. -- Key: YARN-2064 URL: https://issues.apache.org/jira/browse/YARN-2064 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager, scheduler Reporter: dlim Hi, just a short question for everyone I got MR job run on YARN, normally for small jobs, it succeeded without any note in the URL page. However, when running long-running job, it ends with successful status but with note: Container killed by the ApplicationMaster. The job is still running and i hesitate to kill it. Anyone know if it is actually successful or not ?? I know there is a previous post on this, but the answers are not so clear for me. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1872) TestDistributedShell occasionally fails in trunk
[ https://issues.apache.org/jira/browse/YARN-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998493#comment-13998493 ] Binglin Chang commented on YARN-1872: - Hi, testDSShell fails with asser failed, don't know whether it is relevant: https://builds.apache.org/job/Hadoop-Yarn-trunk/561/consoleText testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 27.557 sec FAILURE! java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198) Results : Failed tests: TestDistributedShell.testDSShell:198 expected:1 but was:0 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0 TestDistributedShell occasionally fails in trunk Key: YARN-1872 URL: https://issues.apache.org/jira/browse/YARN-1872 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Assignee: Hong Zhiguo Attachments: TestDistributedShell.out, YARN-1872.patch From https://builds.apache.org/job/Hadoop-Yarn-trunk/520/console : TestDistributedShell#testDSShellWithCustomLogPropertyFile failed and TestDistributedShell#testDSShell timed out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2058) .gitignore should ignore .orig and .rej files
Karthik Kambatla created YARN-2058: -- Summary: .gitignore should ignore .orig and .rej files Key: YARN-2058 URL: https://issues.apache.org/jira/browse/YARN-2058 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla .gitignore file should ignore .orig and .rej files -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997781#comment-13997781 ] Tsuyoshi OZAWA commented on YARN-2052: -- {quote} e.g. container_XXX_1000 after epoch 1. {quote} This approach can be compatible change. ConverterUtils.toContainerId(containerIdStr) works without any changes if the container id with the epoch is under Integer.MAX_VALUE. What's happens if id overflows? Maybe container id collision occurs. If we can handle it correctly, this approach is simple and good choice. I'll take a moment about this approach. ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-373) Allow an AM to reuse the resources allocated to container for a new container
[ https://issues.apache.org/jira/browse/YARN-373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur resolved YARN-373. - Resolution: Won't Fix [doing self-clean up of JIRAs] Allow an AM to reuse the resources allocated to container for a new container - Key: YARN-373 URL: https://issues.apache.org/jira/browse/YARN-373 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur When a container completes, instead the corresponding resources being freed up, it should be possible for the AM to reuse the assigned resources for a new container. As part of the reallocation, the AM would notify the RM about partial resources being freed up and the RM would make the necessary corrections in the corresponding node. With this functionality, an AM can ensure it gets a container in the same node where previous containers run. This will allow getting rid of the ShuffleHandler as a service in the NMs and run it as regular container task of the corresponding AM. In this case, the reallocation would reduce the CPU/MEM obtained for the original container to the what is needed for serving the shuffle. Note that in this example the MR AM would only do this reallocation for one of the many tasks that may have run in a particular node (as a single shuffle task could serve all the map outputs from all map tasks run in that node). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1981) Nodemanager version is not updated when a node reconnects
[ https://issues.apache.org/jira/browse/YARN-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998191#comment-13998191 ] Hudson commented on YARN-1981: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/]) YARN-1981. Nodemanager version is not updated when a node reconnects (Jason Lowe via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594358) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java Nodemanager version is not updated when a node reconnects - Key: YARN-1981 URL: https://issues.apache.org/jira/browse/YARN-1981 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 3.0.0, 2.5.0 Attachments: YARN-1981.patch When a nodemanager is quickly restarted and happens to change versions during the restart (e.g.: rolling upgrade scenario) the NM version as reported by the RM is not updated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1362) Distinguish between nodemanager shutdown for decommission vs shutdown for restart
[ https://issues.apache.org/jira/browse/YARN-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998187#comment-13998187 ] Hudson commented on YARN-1362: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/]) YARN-1362. Distinguish between nodemanager shutdown for decommission vs shutdown for restart. (Contributed by Jason Lowe) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594421) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java Distinguish between nodemanager shutdown for decommission vs shutdown for restart - Key: YARN-1362 URL: https://issues.apache.org/jira/browse/YARN-1362 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.5.0 Attachments: YARN-1362.patch When a nodemanager shuts down it needs to determine if it is likely to be restarted. If a restart is likely then it needs to preserve container directories, logs, distributed cache entries, etc. If it is being shutdown more permanently (e.g.: like a decommission) then the nodemanager should cleanup directories and logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2011) Fix typo and warning in TestLeafQueue
[ https://issues.apache.org/jira/browse/YARN-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998188#comment-13998188 ] Hudson commented on YARN-2011: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/]) YARN-2011. Fix typo and warning in TestLeafQueue (Contributed by Chen He) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593804) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java Fix typo and warning in TestLeafQueue - Key: YARN-2011 URL: https://issues.apache.org/jira/browse/YARN-2011 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Chen He Assignee: Chen He Priority: Trivial Fix For: 2.5.0 Attachments: YARN-2011-v2.patch, YARN-2011.patch a.assignContainers(clusterResource, node_0); assertEquals(2*GB, a.getUsedResources().getMemory()); assertEquals(2*GB, app_0.getCurrentConsumption().getMemory()); assertEquals(0*GB, app_1.getCurrentConsumption().getMemory()); assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G // Again one to user_0 since he hasn't exceeded user limit yet a.assignContainers(clusterResource, node_0); assertEquals(3*GB, a.getUsedResources().getMemory()); assertEquals(2*GB, app_0.getCurrentConsumption().getMemory()); assertEquals(1*GB, app_1.getCurrentConsumption().getMemory()); assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1982) Rename the daemon name to timelineserver
[ https://issues.apache.org/jira/browse/YARN-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998190#comment-13998190 ] Hudson commented on YARN-1982: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/]) YARN-1982. Renamed the daemon name to be TimelineServer instead of History Server and deprecated the old usage. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593748) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh Rename the daemon name to timelineserver Key: YARN-1982 URL: https://issues.apache.org/jira/browse/YARN-1982 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 2.4.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: cli Fix For: 2.5.0 Attachments: YARN-1982.1.patch Nowadays, it's confusing that we call the new component timeline server, but we use {code} yarn historyserver yarn-daemon.sh start historyserver {code} to start the daemon. Before the confusion keeps being propagated, we'd better to modify command line asap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998189#comment-13998189 ] Hudson commented on YARN-1861: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/]) YARN-1861. Fixed a bug in RM to reset leader-election on fencing that was causing both RMs to be stuck in standby mode when automatic failover is enabled. Contributed by Karthik Kambatla and Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594356) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java Both RM stuck in standby mode when automatic failover is enabled Key: YARN-1861 URL: https://issues.apache.org/jira/browse/YARN-1861 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Karthik Kambatla Priority: Blocker Fix For: 2.4.1 Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, YARN-1861.5.patch, YARN-1861.7.patch, yarn-1861-1.patch, yarn-1861-6.patch In our HA tests we noticed that the tests got stuck because both RM's got into standby state and no one became active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers
[ https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995898#comment-13995898 ] Sandy Ryza commented on YARN-2017: -- Thanks for working on this Jian. A couple questions: Why take out the header comment in SchedulerNode? Can we use generics to avoid all the casting (and findbugs)? I.e. class CapacityScheduler extends AbstractYarnSchedulerFiCaSchedulerApp, FiCaSchedulerNode? Merge some of the common lib code in schedulers --- Key: YARN-2017 URL: https://issues.apache.org/jira/browse/YARN-2017 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2017.1.patch A bunch of same code is repeated among schedulers, e.g: between FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a common base. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever
[ https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995746#comment-13995746 ] Hadoop QA commented on YARN-2049: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12644497/YARN-2049.1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3738//console This message is automatically generated. Delegation token stuff for the timeline sever - Key: YARN-2049 URL: https://issues.apache.org/jira/browse/YARN-2049 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2049.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2017) Merge some of the common lib code in schedulers
[ https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2017: -- Attachment: YARN-2017.2.patch Merge some of the common lib code in schedulers --- Key: YARN-2017 URL: https://issues.apache.org/jira/browse/YARN-2017 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2017.1.patch, YARN-2017.2.patch A bunch of same code is repeated among schedulers, e.g: between FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a common base. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-352) Inconsistent picture of how a container was killed when querying RM and NM in case of preemption
[ https://issues.apache.org/jira/browse/YARN-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-352: - Issue Type: Sub-task (was: Bug) Parent: YARN-45 Inconsistent picture of how a container was killed when querying RM and NM in case of preemption Key: YARN-352 URL: https://issues.apache.org/jira/browse/YARN-352 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah When the RM preempts a container, it records the exit status as -100. However, at the NM, it registers the preempted container's exit status as simply killed by an external via SIGTERM or SIGKILL. When the AM queries the RM and NM for the same container's status, it will get 2 different values. When killing a container, the exit reason should likely be more defined via an exit status code for the AM to act on in addition to providing of the diagnostic messages that can contain more detailed information ( though probably not programmatically interpret-able by the AM ). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1751) Improve MiniYarnCluster for log aggregation testing
[ https://issues.apache.org/jira/browse/YARN-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995739#comment-13995739 ] Hadoop QA commented on YARN-1751: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12644486/YARN-1751.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3736//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3736//console This message is automatically generated. Improve MiniYarnCluster for log aggregation testing --- Key: YARN-1751 URL: https://issues.apache.org/jira/browse/YARN-1751 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Ming Ma Assignee: Ming Ma Attachments: YARN-1751-trunk.patch, YARN-1751.patch MiniYarnCluster specifies individual remote log aggregation root dir for each NM. Test code that uses MiniYarnCluster won't be able to get the value of log aggregation root dir. The following code isn't necessary in MiniYarnCluster. File remoteLogDir = new File(testWorkDir, MiniYARNCluster.this.getName() + -remoteLogDir-nm- + index); remoteLogDir.mkdir(); config.set(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, remoteLogDir.getAbsolutePath()); In LogCLIHelpers.java, dumpAllContainersLogs should pass its conf object to FileContext.getFileContext() call. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1937) Access control of per-framework data
[ https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1937: -- Issue Type: Sub-task (was: Bug) Parent: YARN-1935 Access control of per-framework data Key: YARN-1937 URL: https://issues.apache.org/jira/browse/YARN-1937 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2050) Fix LogCLIHelpers to create the correct FileContext
[ https://issues.apache.org/jira/browse/YARN-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996685#comment-13996685 ] Jason Lowe commented on YARN-2050: -- bq. remoteAppLogDir.toUri().getScheme() returns null and AbstractFileSystem.createFileSystem doesn't like it if dumpAllContainersLogs calls FileContext.getFileContext(remoteAppLogDir.toUri()) Argh right, I forgot that FileContext is less-than-helpful in this regard. It needs to be something like this: {code} Path qualifiedLogDir = FileContext.getFileContext(getConf()).makeQualified(remoteAppLogDir); FileContext fc = FileContext.getFileContext(qualifiedLogDir.toUri(), getConf()); nodeFiles = fc.listStatus(qualifiedLogDir); {code} This allows the code to handle cases where the remote log dir has been configured to be a different filesystem than the default filesystem. Fix LogCLIHelpers to create the correct FileContext --- Key: YARN-2050 URL: https://issues.apache.org/jira/browse/YARN-2050 Project: Hadoop YARN Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Attachments: YARN-2050.patch LogCLIHelpers calls FileContext.getFileContext() without any parameters. Thus the FileContext created isn't necessarily the FileContext for remote log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2048) List all of the containers of an application from the yarn web
[ https://issues.apache.org/jira/browse/YARN-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996096#comment-13996096 ] Zhijie Shen commented on YARN-2048: --- bq. Seems Zhijie Shen's patch fetch containers from ApplicationContext. Currently, the history web UI fetches data (app/attempt/container) from ApplicationContext, while RM web UI does it from RM context. My ultimate goal to uniform both history and rm web UI, and uniform the data source with the RPC protocol. List all of the containers of an application from the yarn web -- Key: YARN-2048 URL: https://issues.apache.org/jira/browse/YARN-2048 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, webapp Affects Versions: 2.3.0, 2.4.0, 2.5.0 Reporter: Min Zhou Attachments: YARN-2048-trunk-v1.patch Currently, Yarn haven't provide a way to list all of the containers of an application from its web. This kind of information is needed by the application user. They can conveniently know how many containers their applications already acquired as well as which nodes those containers were launched on. They also want to view the logs of each container of an application. One approach is maintain a container list in RMAppImpl and expose this info to Application page. I will submit a patch soon -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2001) Threshold for RM to accept requests from AM after failover
[ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996348#comment-13996348 ] Tsuyoshi OZAWA commented on YARN-2001: -- Created YARN-2052 for tracking container id discussion to make it easier to track. Threshold for RM to accept requests from AM after failover -- Key: YARN-2001 URL: https://issues.apache.org/jira/browse/YARN-2001 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He After failover, RM may require a certain threshold to determine whether it’s safe to make scheduling decisions and start accepting new container requests from AMs. The threshold could be a certain amount of nodes. i.e. RM waits until a certain amount of nodes joining before accepting new container requests. Or it could simply be a timeout, only after the timeout RM accepts new requests. NMs joined after the threshold can be treated as new NMs and instructed to kill all its containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-182) Unnecessary Container killed by the ApplicationMaster message for successful containers
[ https://issues.apache.org/jira/browse/YARN-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995087#comment-13995087 ] Jason Lowe commented on YARN-182: - I don't believe this is related to YARN-903, rather it seems more likely to be related to MAPREDUCE-5465. The MapReduce ApplicationMaster kills tasks as soon as they report success via the umbilical connection, and sometimes that kill arrives before the task exits on its own. In those cases the containers will be marked as killed by the ApplicationMaster. Unnecessary Container killed by the ApplicationMaster message for successful containers - Key: YARN-182 URL: https://issues.apache.org/jira/browse/YARN-182 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.1-alpha Reporter: zhengqiu cai Assignee: Omkar Vinit Joshi Labels: hadoop, usability Attachments: Log.txt I was running wordcount and the resourcemanager web UI shown the status as FINISHED SUCCEEDED, but the log shown Container killed by the ApplicationMaster -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2048) List all of the containers of an application from the yarn web
[ https://issues.apache.org/jira/browse/YARN-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou resolved YARN-2048. Resolution: Duplicate Duplicate with YARN-1809 List all of the containers of an application from the yarn web -- Key: YARN-2048 URL: https://issues.apache.org/jira/browse/YARN-2048 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, webapp Affects Versions: 2.3.0, 2.4.0, 2.5.0 Reporter: Min Zhou Attachments: YARN-2048-trunk-v1.patch Currently, Yarn haven't provide a way to list all of the containers of an application from its web. This kind of information is needed by the application user. They can conveniently know how many containers their applications already acquired as well as which nodes those containers were launched on. They also want to view the logs of each container of an application. One approach is maintain a container list in RMAppImpl and expose this info to Application page. I will submit a patch soon -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1927) Preemption message shouldn’t be created multiple times for same container-id in ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1927: -- Issue Type: Sub-task (was: Bug) Parent: YARN-45 Preemption message shouldn’t be created multiple times for same container-id in ProportionalCapacityPreemptionPolicy Key: YARN-1927 URL: https://issues.apache.org/jira/browse/YARN-1927 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Priority: Minor Attachments: YARN-1927.patch Currently, after each editSchedule() called, preemption message will be created and sent to scheduler. ProportionalCapacityPreemptionPolicy should only send preemption message once for each container. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2048) List all of the containers of an application from the yarn web
[ https://issues.apache.org/jira/browse/YARN-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995553#comment-13995553 ] Tsuyoshi OZAWA commented on YARN-2048: -- +1 for the idea. Looking forward. List all of the containers of an application from the yarn web -- Key: YARN-2048 URL: https://issues.apache.org/jira/browse/YARN-2048 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, webapp Reporter: Min Zhou Currently, Yarn haven't provide a way to list all of the containers of an application from its web. This kind of information is needed by the application user. They can conveniently know how many containers their applications already acquired as well as which nodes those containers were launched on. They also want to view the logs of each container of an application. One approach is maintain a container list in RMAppImpl and expose this info to Application page. I will submit a patch soon -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2033) Investigate merging generic-history into the Timeline Store
Vinod Kumar Vavilapalli created YARN-2033: - Summary: Investigate merging generic-history into the Timeline Store Key: YARN-2033 URL: https://issues.apache.org/jira/browse/YARN-2033 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Having two different stores isn't amicable to generic insights on what's happening with applications. This is to investigate porting generic-history into the Timeline Store. One goal is to try and retain most of the client side interfaces as close to what we have today. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2016) Yarn getApplicationRequest start time range is not honored
[ https://issues.apache.org/jira/browse/YARN-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2016: - Attachment: YARN-2016.patch Fix the issues in PBImpl and deliver a test to verify it works now. Yarn getApplicationRequest start time range is not honored -- Key: YARN-2016 URL: https://issues.apache.org/jira/browse/YARN-2016 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Venkat Ranganathan Assignee: Junping Du Attachments: YARN-2016.patch, YarnTest.java When we query for the previous applications by creating an instance of GetApplicationsRequest and setting the start time range and application tag, we see that the start range provided is not honored and all applications with the tag are returned Attaching a reproducer. -- This message was sent by Atlassian JIRA (v6.2#6252)