[jira] [Commented] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id
[ https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794942#comment-13794942 ] Devaraj K commented on YARN-1299: - can you also take care of avoiding lines longer than 80 characters guide line for the changes. Improve 'checking for deactivate...' log message by adding app id - Key: YARN-1299 URL: https://issues.apache.org/jira/browse/YARN-1299 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Devaraj K Attachments: yarn-1299.patch {code:xml} 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... {code} In RM log, it gives message saying 'checking for deactivate...'. It would give better meaning if this log message contains app id. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794949#comment-13794949 ] Hadoop QA commented on YARN-1068: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608393/yarn-1068-11.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2177//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2177//console This message is automatically generated. Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-10.patch, yarn-1068-11.patch, yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, yarn-1068-9.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1259) In Fair Scheduler web UI, queue num pending and num active apps switched
[ https://issues.apache.org/jira/browse/YARN-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795069#comment-13795069 ] Hudson commented on YARN-1259: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #363 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/363/]) YARN-1259. In Fair Scheduler web UI, queue num pending and num active apps switched. (Robert Kanter via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532094) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java In Fair Scheduler web UI, queue num pending and num active apps switched Key: YARN-1259 URL: https://issues.apache.org/jira/browse/YARN-1259 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Robert Kanter Labels: newbie Fix For: 2.2.1 Attachments: YARN-1259.patch The values returned in FairSchedulerLeafQueueInfo by numPendingApplications and numActiveApplications should be switched. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()
[ https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795071#comment-13795071 ] Hudson commented on YARN-1182: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #363 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/363/]) YARN-1182. MiniYARNCluster creates and inits the RM/NM only on start() (Karthik Kambatla via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532109) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java MiniYARNCluster creates and inits the RM/NM only on start() --- Key: YARN-1182 URL: https://issues.apache.org/jira/browse/YARN-1182 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Fix For: 2.3.0 Attachments: yarn-1182-1.patch, yarn-1182-2.patch MiniYARNCluster creates and inits the RM/NM only on start(). It should create and init() during init() itself. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1259) In Fair Scheduler web UI, queue num pending and num active apps switched
[ https://issues.apache.org/jira/browse/YARN-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795173#comment-13795173 ] Hudson commented on YARN-1259: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1553 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1553/]) YARN-1259. In Fair Scheduler web UI, queue num pending and num active apps switched. (Robert Kanter via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532094) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java In Fair Scheduler web UI, queue num pending and num active apps switched Key: YARN-1259 URL: https://issues.apache.org/jira/browse/YARN-1259 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Robert Kanter Labels: newbie Fix For: 2.2.1 Attachments: YARN-1259.patch The values returned in FairSchedulerLeafQueueInfo by numPendingApplications and numActiveApplications should be switched. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()
[ https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795175#comment-13795175 ] Hudson commented on YARN-1182: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1553 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1553/]) YARN-1182. MiniYARNCluster creates and inits the RM/NM only on start() (Karthik Kambatla via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532109) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java MiniYARNCluster creates and inits the RM/NM only on start() --- Key: YARN-1182 URL: https://issues.apache.org/jira/browse/YARN-1182 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Fix For: 2.3.0 Attachments: yarn-1182-1.patch, yarn-1182-2.patch MiniYARNCluster creates and inits the RM/NM only on start(). It should create and init() during init() itself. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1259) In Fair Scheduler web UI, queue num pending and num active apps switched
[ https://issues.apache.org/jira/browse/YARN-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795222#comment-13795222 ] Hudson commented on YARN-1259: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1579 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1579/]) YARN-1259. In Fair Scheduler web UI, queue num pending and num active apps switched. (Robert Kanter via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532094) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java In Fair Scheduler web UI, queue num pending and num active apps switched Key: YARN-1259 URL: https://issues.apache.org/jira/browse/YARN-1259 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Robert Kanter Labels: newbie Fix For: 2.2.1 Attachments: YARN-1259.patch The values returned in FairSchedulerLeafQueueInfo by numPendingApplications and numActiveApplications should be switched. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()
[ https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795224#comment-13795224 ] Hudson commented on YARN-1182: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1579 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1579/]) YARN-1182. MiniYARNCluster creates and inits the RM/NM only on start() (Karthik Kambatla via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532109) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java MiniYARNCluster creates and inits the RM/NM only on start() --- Key: YARN-1182 URL: https://issues.apache.org/jira/browse/YARN-1182 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Fix For: 2.3.0 Attachments: yarn-1182-1.patch, yarn-1182-2.patch MiniYARNCluster creates and inits the RM/NM only on start(). It should create and init() during init() itself. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-451) Add more metrics to RM page
[ https://issues.apache.org/jira/browse/YARN-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795266#comment-13795266 ] Jason Lowe commented on YARN-451: - bq. current allocation can be seen from the scheduler page. I took a look at the scheduler page, and all I see for current allocation is per-user-per-queue and not per app. Where are you seeing the current assignment for each app on the scheduler page? As for the instance you recently encountered, showing the current ask would have quickly isolated the issue as all 30K maps would have been asked for once the app launched. My main concern with a current-plus-estimated-future approach is that it's optional for AMs to implement and requires an API change. I see showing the current and/or ask as more robust across different app frameworks (doesn't require AMs to implement anything), easier to implement, and should solve most of the problems with identifying where the bottlenecks currently are in scheduling containers. Doing so doesn't preclude adding a total estimate metric at some point. Quick question on the estimate -- is it a calculation of the total app weight at the start of the app or do the values decrease as containers are granted? The former is useful as a gauge of how big an app is/was overall, while the latter is more useful for identifying upcoming demands if the application has been running for some time. Add more metrics to RM page --- Key: YARN-451 URL: https://issues.apache.org/jira/browse/YARN-451 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Assignee: Sangjin Lee Priority: Blocker Attachments: in_progress_2x.png, yarn-451-trunk-20130916.1.patch ResourceManager webUI shows list of RUNNING applications, but it does not tell which applications are requesting more resource compared to others. With cluster running hundreds of applications at once it would be useful to have some kind of metric to show high-resource usage applications vs low-resource usage ones. At the minimum showing number of containers is good option. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-975) Add a file-system implementation for history-storage
[ https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795350#comment-13795350 ] Zhijie Shen commented on YARN-975: -- Having thought more about the implementation detail: 1. It seems that the cache mechanism is required immediately. It is a general case that users will access the information of the application, its attempts and containers consequently by clicking the links on the web page. If we don't have the cache mechanism, for every single piece of information, we need to read the TFile again from HDFS, which results in poor performance. To cache the complete history data of an application, we've two choices: one is cache the raw TFile, and the other is cache the all the protobuf objects recovered from the TFile. I incline to the latter choice, because we can organize them in a better data structure for quick access. 2. The current APIs allow users to write each piece of the information in the scope of one application individually. Limited by the current API design, we need to open a TFile, when it's a first writing operation for a certain application, and keep it open until the last writing operation is finished. Then, the problem is how we judge all the information for one application has been written. One method is to tell the history storage how many attempts and containers the application has. Another method is to let the caller to explicitly say closing the TFile. However, these two methods will involve the interface change, opening more methods. 3. It further raises the question w.r.t the integrity of the history data. In a normal case, we expect all the application, the attempts and the containers are written into a TFile. However, for some reason, one piece of information is missing, and writing operation for it is never done. Then, TFile will always be open to wait the missing piece. Probably we need a timeout trigger to close the TFile no matter all the data comes in or not. However, then, should we persist the TFile into HDFS? The history data for this application is not complete. 4. However, if we have a timeout trigger for a TFile, RM cannot write the each piece of the history information at the end of each object's life cycle without coordination. We will then want the writing operations of all the pieces to be scheduled together. Then, RM side need more work to coordinate the write operations (YARN-953). [~vinodkv], any suggestions? Add a file-system implementation for history-storage Key: YARN-975 URL: https://issues.apache.org/jira/browse/YARN-975 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-975.1.patch, YARN-975.2.patch, YARN-975.3.patch, YARN-975.4.patch, YARN-975.5.patch HDFS implementation should be a standard persistence strategy of history storage -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-975) Add a file-system implementation for history-storage
[ https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795391#comment-13795391 ] Vinod Kumar Vavilapalli commented on YARN-975: -- Looks good overall. One file per app is reasonable. We already use TFile for log-aggregation, so yeah it is good to pick that up too. Regarding the implementation, see JobHistoryEventHandler. There we flush based on two triggers: An upper limit on unflushed records and a time-based trigger. We can add one more trigger: Application state change. Add a file-system implementation for history-storage Key: YARN-975 URL: https://issues.apache.org/jira/browse/YARN-975 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-975.1.patch, YARN-975.2.patch, YARN-975.3.patch, YARN-975.4.patch, YARN-975.5.patch HDFS implementation should be a standard persistence strategy of history storage -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-1292) De-link container life cycle from the process it runs
[ https://issues.apache.org/jira/browse/YARN-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-1292. --- Resolution: Duplicate YARN-1040 existed before this. Closing as duplicate. De-link container life cycle from the process it runs - Key: YARN-1292 URL: https://issues.apache.org/jira/browse/YARN-1292 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.1-beta Reporter: Bikas Saha Currently, a container is considered done when its OS process exits. This makes it cumbersome for apps to be able to reuse containers for different processes. Long running daemons may want to run in the same containers as the previous versions. So eg. is an hbase region server crashes/upgraded it would want to restart in the same container where everything it needs would already be warm and ready. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Reopened] (YARN-925) HistoryStorage Reader Interface for Application History Server
[ https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reopened YARN-925: -- The interface may be changed accordingly like the writer interface HistoryStorage Reader Interface for Application History Server -- Key: YARN-925 URL: https://issues.apache.org/jira/browse/YARN-925 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Fix For: YARN-321 Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, YARN-925-4.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1308) set default value for nodemanager aux service
[ https://issues.apache.org/jira/browse/YARN-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795418#comment-13795418 ] Arpit Gupta commented on YARN-1308: --- I think we should set the defaults to {code} property nameyarn.nodemanager.aux-services/name valuemapreduce_shuffle/value descriptionAuxilliary services of NodeManager/description /property property nameyarn.nodemanager.aux-services.mapreduce_shuffle.class/name valueorg.apache.hadoop.mapred.ShuffleHandler/value /property {code} so nodemanagers will start out of the box. set default value for nodemanager aux service - Key: YARN-1308 URL: https://issues.apache.org/jira/browse/YARN-1308 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Arpit Gupta Priority: Minor Currently in order to get the nodemanagers to start you have to define yarn.nodemanager.aux-services and yarn.nodemanager.aux-services.mapreduce_shuffle.class. We should set these as defaults. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1308) set default value for nodemanager aux service
Arpit Gupta created YARN-1308: - Summary: set default value for nodemanager aux service Key: YARN-1308 URL: https://issues.apache.org/jira/browse/YARN-1308 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Arpit Gupta Priority: Minor Currently in order to get the nodemanagers to start you have to define yarn.nodemanager.aux-services and yarn.nodemanager.aux-services.mapreduce_shuffle.class. We should set these as defaults. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-975) Add a file-system implementation for history-storage
[ https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795423#comment-13795423 ] Mayank Bansal commented on YARN-975: [~zjshen] Is it one TFIle per application or 3 files (1 for application, 1 for attempt and one for All containers)? Protobuf cache is a good Idea. [~vinodkv] I agrre with you that we should have all the three triggers. Thanks, Mayank Add a file-system implementation for history-storage Key: YARN-975 URL: https://issues.apache.org/jira/browse/YARN-975 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-975.1.patch, YARN-975.2.patch, YARN-975.3.patch, YARN-975.4.patch, YARN-975.5.patch HDFS implementation should be a standard persistence strategy of history storage -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-925) HistoryStorage Reader Interface for Application History Server
[ https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795425#comment-13795425 ] Mayank Bansal commented on YARN-925: [~zjshen] Why you think readers will be changed? HistoryStorage Reader Interface for Application History Server -- Key: YARN-925 URL: https://issues.apache.org/jira/browse/YARN-925 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Fix For: YARN-321 Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, YARN-925-4.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1042) add ability to specify affinity/anti-affinity in container requests
[ https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1042: -- Issue Type: Sub-task (was: New Feature) Parent: YARN-397 add ability to specify affinity/anti-affinity in container requests --- Key: YARN-1042 URL: https://issues.apache.org/jira/browse/YARN-1042 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Junping Du Attachments: YARN-1042-demo.patch container requests to the AM should be able to request anti-affinity to ensure that things like Region Servers don't come up on the same failure zones. Similarly, you may be able to want to specify affinity to same host or rack without specifying which specific host/rack. Example: bringing up a small giraph cluster in a large YARN cluster would benefit from having the processes in the same rack purely for bandwidth reasons. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1308) set default value for nodemanager aux service
[ https://issues.apache.org/jira/browse/YARN-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795428#comment-13795428 ] Sandy Ryza commented on YARN-1308: -- This looks like a duplicate of YARN-1289 set default value for nodemanager aux service - Key: YARN-1308 URL: https://issues.apache.org/jira/browse/YARN-1308 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Arpit Gupta Priority: Minor Currently in order to get the nodemanagers to start you have to define yarn.nodemanager.aux-services and yarn.nodemanager.aux-services.mapreduce_shuffle.class. We should set these as defaults. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-1308) set default value for nodemanager aux service
[ https://issues.apache.org/jira/browse/YARN-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved YARN-1308. -- Resolution: Duplicate set default value for nodemanager aux service - Key: YARN-1308 URL: https://issues.apache.org/jira/browse/YARN-1308 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Arpit Gupta Priority: Minor Currently in order to get the nodemanagers to start you have to define yarn.nodemanager.aux-services and yarn.nodemanager.aux-services.mapreduce_shuffle.class. We should set these as defaults. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-796: - Issue Type: Sub-task (was: New Feature) Parent: YARN-397 Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Arun C Murthy It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned YARN-445: Assignee: Andrey Klochkov Ability to signal containers Key: YARN-445 URL: https://issues.apache.org/jira/browse/YARN-445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Jason Lowe Assignee: Andrey Klochkov Attachments: YARN-445--n2.patch, YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT, SIGUSR1, etc. For example, in order to replicate the jstack-on-task-timeout feature implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a container. For that specific feature we could implement it as an additional field in the StopContainerRequest. However that would not address other potential features like the ability for an AM to trigger jstacks on arbitrary tasks *without* killing them. The latter feature would be a very useful debugging tool for users who do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery
[ https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795461#comment-13795461 ] Omkar Vinit Joshi commented on YARN-1185: - I think it would be fair to assume that rename operation is atomic in nature and we can split the existing writeFile operation into two calls * First write the data to .tmp file * rename it to actual file. Similarly when we are loading the state if we encounter any file with .tmp extension then we will discard it. Attaching the patch which does the same thing. Let me know your thoughts. FileSystemRMStateStore can leave partial files that prevent subsequent recovery --- Key: YARN-1185 URL: https://issues.apache.org/jira/browse/YARN-1185 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Attachments: YARN-1185.1.patch FileSystemRMStateStore writes directly to the destination file when storing state. However if the RM were to crash in the middle of the write, the recovery method could encounter a partially-written file and either outright crash during recovery or silently load incomplete state. To avoid this, the data should be written to a temporary file and renamed to the destination file afterwards. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery
[ https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1185: Attachment: YARN-1185.1.patch FileSystemRMStateStore can leave partial files that prevent subsequent recovery --- Key: YARN-1185 URL: https://issues.apache.org/jira/browse/YARN-1185 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Attachments: YARN-1185.1.patch FileSystemRMStateStore writes directly to the destination file when storing state. However if the RM were to crash in the middle of the write, the recovery method could encounter a partially-written file and either outright crash during recovery or silently load incomplete state. To avoid this, the data should be written to a temporary file and renamed to the destination file afterwards. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795460#comment-13795460 ] Vinod Kumar Vavilapalli commented on YARN-445: -- Sorry for jumping real late on this. I see Andrey has been working on patches, but haven't looked at them. Trying to see if we are doing it right. bq. Add YARN API support for ContainerLaunchContext to accept a mapping of externally-triggered command names to code. (i.e. ctx.setExternalCommand(gracefulShutdown, kill -TERM $CONTAINER_PID). I think this is a better approach overall. We already support running arbitrary command-lines as part of start-container. Even without signalling, we have a stopContainer API which clearly indicates that the container be shut-down. Either via a flag or a new API, for signalling containers, why don't we just implement it as an additional command that is run on the NM. NM can provide important information, like user-name, pid, pgrpid, sid etc in a platform agnostic manner for that command and we should be all done? Ability to signal containers Key: YARN-445 URL: https://issues.apache.org/jira/browse/YARN-445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Jason Lowe Assignee: Andrey Klochkov Attachments: YARN-445--n2.patch, YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT, SIGUSR1, etc. For example, in order to replicate the jstack-on-task-timeout feature implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a container. For that specific feature we could implement it as an additional field in the StopContainerRequest. However that would not address other potential features like the ability for an AM to trigger jstacks on arbitrary tasks *without* killing them. The latter feature would be a very useful debugging tool for users who do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-896) Roll up for long-lived services in YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-896: - Summary: Roll up for long-lived services in YARN (was: Roll up for long lived YARN) Roll up for long-lived services in YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1292) De-link container life cycle from the process it runs
[ https://issues.apache.org/jira/browse/YARN-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795489#comment-13795489 ] Vinod Kumar Vavilapalli commented on YARN-1292: --- bq. Please also copy relevant comments from duplicate jira into the parent so that they dont get lost. I have done it for this one. If a duplicate isn't caught early, it is difficult to capture all the conversation on both the tickets. We just link both the tickets and assume that conversation just moves over. De-link container life cycle from the process it runs - Key: YARN-1292 URL: https://issues.apache.org/jira/browse/YARN-1292 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.1-beta Reporter: Bikas Saha Currently, a container is considered done when its OS process exits. This makes it cumbersome for apps to be able to reuse containers for different processes. Long running daemons may want to run in the same containers as the previous versions. So eg. is an hbase region server crashes/upgraded it would want to restart in the same container where everything it needs would already be warm and ready. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery
[ https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795511#comment-13795511 ] Hadoop QA commented on YARN-1185: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608545/YARN-1185.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2178//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2178//console This message is automatically generated. FileSystemRMStateStore can leave partial files that prevent subsequent recovery --- Key: YARN-1185 URL: https://issues.apache.org/jira/browse/YARN-1185 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Attachments: YARN-1185.1.patch FileSystemRMStateStore writes directly to the destination file when storing state. However if the RM were to crash in the middle of the write, the recovery method could encounter a partially-written file and either outright crash during recovery or silently load incomplete state. To avoid this, the data should be written to a temporary file and renamed to the destination file afterwards. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Reopened] (YARN-947) Defining the history data classes for the implementation of the reading/writing interface
[ https://issues.apache.org/jira/browse/YARN-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reopened YARN-947: -- Need to add more history records Defining the history data classes for the implementation of the reading/writing interface - Key: YARN-947 URL: https://issues.apache.org/jira/browse/YARN-947 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: YARN-321 Attachments: YARN-947.1.patch, YARN-947.2.patch We need to define the history data classes have the exact fields to be stored. Therefore, all the implementations don't need to have the duplicate logic to exact the required information from RMApp, RMAppAttempt and RMContainer. We use protobuf to define these classes, such that they can be ser/des to/from bytes, which are easier for persistence. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-925) HistoryStorage Reader Interface for Application History Server
[ https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795590#comment-13795590 ] Mayank Bansal commented on YARN-925: As discussed closing it as there is no change here. Thanks, Mayank HistoryStorage Reader Interface for Application History Server -- Key: YARN-925 URL: https://issues.apache.org/jira/browse/YARN-925 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Fix For: YARN-321 Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, YARN-925-4.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-925) HistoryStorage Reader Interface for Application History Server
[ https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal resolved YARN-925. Resolution: Fixed HistoryStorage Reader Interface for Application History Server -- Key: YARN-925 URL: https://issues.apache.org/jira/browse/YARN-925 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Fix For: YARN-321 Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, YARN-925-4.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1289) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle.
[ https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795597#comment-13795597 ] Hitesh Shah commented on YARN-1289: --- I believe this jira should be considered invalid. YARN does not and should not have any implicit dependencies on mapreduce. *If* anyone wants to runs MapReduce jobs on a YARN cluster, configuration of the mapreduce shuffle service is mandatory - however this does not hold true for cases where someone is using YARN to run non-MR applications. If someone wanted to change the implementation of the shuffle service to a potentially better/faster implementation, defining a default would create a problem. Also, in terms of future-proofing, what is the expectation if MR in later versions switches to use a different service? Is that expectation that default services will keep on changing over time based on MR implementation changes? Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle. -- Key: YARN-1289 URL: https://issues.apache.org/jira/browse/YARN-1289 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: wenwupeng Assignee: Junping Du Attachments: YARN-1289.patch Failed to run benchmark when not configure yarn.nodemanager.aux-services value in yarn-site.xml', it is better to configure default value. 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : attempt_1381371516570_0001_m_00_1, Status : FAILED Container launch failed for container_1381371516570_0001_01_05 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1289) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle.
[ https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795604#comment-13795604 ] Hitesh Shah commented on YARN-1289: --- [~wenwu] Can you confirm whether your yarn-site.xml has no aux-services configured or the mapreduce shuffle service was mis-configured using mapreduce.shuffle instead of mapreduce_shuffle? Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle. -- Key: YARN-1289 URL: https://issues.apache.org/jira/browse/YARN-1289 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: wenwupeng Assignee: Junping Du Attachments: YARN-1289.patch Failed to run benchmark when not configure yarn.nodemanager.aux-services value in yarn-site.xml', it is better to configure default value. 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : attempt_1381371516570_0001_m_00_1, Status : FAILED Container launch failed for container_1381371516570_0001_01_05 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795610#comment-13795610 ] Bikas Saha commented on YARN-1068: -- Looks good to me. Will give a day or so for some other committers to take a look. There isnt any need for this to wrap the IOException in another exception. The base AdminService protocol signature already supports throwing IOException. (ResourceManagerAdministrationProtocol). If its small enough, we could fix it this here or do it in a separate jira. {code} private UserGroupInformation checkAcls(String method) throws YarnException { -UserGroupInformation user; try { - user = UserGroupInformation.getCurrentUser(); + return RMServerUtils.verifyAccess(adminAcl, method, LOG); } catch (IOException ioe) { - LOG.warn(Couldn't get current user, ioe); - - RMAuditLogger.logFailure(UNKNOWN, method, - adminAcl.toString(), AdminService, - Couldn't get current user); throw RPCUtil.getRemoteException(ioe); } {code} Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-10.patch, yarn-1068-11.patch, yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, yarn-1068-9.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1289) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle.
[ https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795611#comment-13795611 ] Karthik Kambatla commented on YARN-1289: +1 to not adding an MR specific config value by default to YARN. Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle. -- Key: YARN-1289 URL: https://issues.apache.org/jira/browse/YARN-1289 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: wenwupeng Assignee: Junping Du Attachments: YARN-1289.patch Failed to run benchmark when not configure yarn.nodemanager.aux-services value in yarn-site.xml', it is better to configure default value. 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : attempt_1381371516570_0001_m_00_1, Status : FAILED Container launch failed for container_1381371516570_0001_01_05 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1308) set default value for nodemanager aux service
[ https://issues.apache.org/jira/browse/YARN-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795605#comment-13795605 ] Hitesh Shah commented on YARN-1308: --- [~arpitgupta] Can you confirm whether your yarn-site.xml has no aux-services configured or the mapreduce shuffle service was mis-configured using mapreduce.shuffle instead of mapreduce_shuffle? This clarification will help get to the underlying issue. set default value for nodemanager aux service - Key: YARN-1308 URL: https://issues.apache.org/jira/browse/YARN-1308 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Arpit Gupta Priority: Minor Currently in order to get the nodemanagers to start you have to define yarn.nodemanager.aux-services and yarn.nodemanager.aux-services.mapreduce_shuffle.class. We should set these as defaults. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1309) AdminService unnecessarily wraps an IOException into a YarnException
Karthik Kambatla created YARN-1309: -- Summary: AdminService unnecessarily wraps an IOException into a YarnException Key: YARN-1309 URL: https://issues.apache.org/jira/browse/YARN-1309 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla ResourceManagerAdministrationProtocol allows methods to throw an IOException. Still, AdminService wraps IOExceptions as YarnExceptions before throwing them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795629#comment-13795629 ] Karthik Kambatla commented on YARN-1068: Thanks Bikas. bq. There isnt any need for this to wrap the IOException in another exception. The base AdminService protocol signature already supports throwing IOException. Agree. I did consider leaving it as IOE. However, there are several places in AdminService where an IOE is being wrapped into a YarnException. We should probably address all of them together in another JIRA. Created YARN-1309. Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-10.patch, yarn-1068-11.patch, yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, yarn-1068-9.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1295) In UnixLocalWrapperScriptBuilder, using bash -c can cause Text file busy errors
[ https://issues.apache.org/jira/browse/YARN-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795669#comment-13795669 ] Hudson commented on YARN-1295: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4609 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4609/]) YARN-1295. In UnixLocalWrapperScriptBuilder, using bash -c can cause Text file busy errors. (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532532) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java In UnixLocalWrapperScriptBuilder, using bash -c can cause Text file busy errors - Key: YARN-1295 URL: https://issues.apache.org/jira/browse/YARN-1295 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.2.1 Attachments: YARN-1295.patch I missed this when working on YARN-1271. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-1309) AdminService unnecessarily wraps an IOException into a YarnException
[ https://issues.apache.org/jira/browse/YARN-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-1309. --- Resolution: Invalid All YARN protocols only allow IOException for the sake of RPC layer exceptions. Exceptions coming from the application layer (as opposed to RPC layer) should all be YarnExceptions. See the discussion at YARN-142. AdminService unnecessarily wraps an IOException into a YarnException Key: YARN-1309 URL: https://issues.apache.org/jira/browse/YARN-1309 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla ResourceManagerAdministrationProtocol allows methods to throw an IOException. Still, AdminService wraps IOExceptions as YarnExceptions before throwing them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-202) Log Aggregation generates a storm of fsync() for namenode
[ https://issues.apache.org/jira/browse/YARN-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-202: - Issue Type: Sub-task (was: Bug) Parent: YARN-431 Log Aggregation generates a storm of fsync() for namenode - Key: YARN-202 URL: https://issues.apache.org/jira/browse/YARN-202 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.2-alpha, 0.23.4 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Fix For: 3.0.0, 2.0.3-alpha, 0.23.5 Attachments: yarn-202.patch When the log aggregation is on, write to each aggregated container log causes hflush() to be called. For large clusters, this can creates a lot of fsync() calls for namenode. We have seen 6-7x increase in the average number of fsync operations compared to 1.0.x on a large busy cluster. Over 99% of fsync ops were for log aggregation writing to tmp files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1303) Allow multiple commands separating with ;
[ https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1303: Attachment: YARN-1303.1.patch In this patch, the Client code will check whether If client give multiple --shell_command or --shell_script options, ds will output a message and saying something like Do not support it, please create a shell_script for them Also, it will check whether in --shell_command has exit ';' or '|', if they are exist, ds will output a message and saying something like Do not support multiple shell commands and command pipeline, please create a shell_script for them Allow multiple commands separating with ; - Key: YARN-1303 URL: https://issues.apache.org/jira/browse/YARN-1303 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1303.1.patch In shell, we can do ls; ls to run 2 commands at once. In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;
[ https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795696#comment-13795696 ] Xuan Gong commented on YARN-1303: - Did the test on a single cluster: Input: {code} hadoop jar hadoop-yarn-project-3.0.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar org.apache.hadoop.yarn.applications.distributedshell.Client --jar hadoop-yarn-project-3.0.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar --shell_command pwd --help {code} chosen Output: {code} -shell_command argShell command to be executed by the Application Master. Does not support multiple --shell_command options, multiple shell commands and command pipline. For multiple shell commands or command pipeline, please create a shell script and use --shell_script option -shell_script arg Location of the shell script to be executed. Support only one --shell_script option. For multiple shell scripts, combine them into one shell script {code} Allow multiple commands separating with ; - Key: YARN-1303 URL: https://issues.apache.org/jira/browse/YARN-1303 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1303.1.patch In shell, we can do ls; ls to run 2 commands at once. In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;
[ https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795698#comment-13795698 ] Xuan Gong commented on YARN-1303: - Input: {code} hadoop-3.0.0-SNAPSHOT/bin/hadoop jar hadoop-yarn-project-3.0.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar org.apache.hadoop.yarn.applications.distributedshell.Client --jar hadoop-yarn-project-3.0.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar --shell_command pwd --shell_command ls {code} part of output: {code} INFO distributedshell.Client: Initializing Client DistributedShell does not support multiple shell commands. Please create a shell script and use --shell_script option. {code} Allow multiple commands separating with ; - Key: YARN-1303 URL: https://issues.apache.org/jira/browse/YARN-1303 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1303.1.patch In shell, we can do ls; ls to run 2 commands at once. In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1309) AdminService unnecessarily wraps an IOException into a YarnException
[ https://issues.apache.org/jira/browse/YARN-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795702#comment-13795702 ] Karthik Kambatla commented on YARN-1309: Thanks [~vinodkv]. Makes sense. IIUC, on YARN-1068, even the RMHAProtocolService should throw YarnExceptions and not IOException. AdminService unnecessarily wraps an IOException into a YarnException Key: YARN-1309 URL: https://issues.apache.org/jira/browse/YARN-1309 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla ResourceManagerAdministrationProtocol allows methods to throw an IOException. Still, AdminService wraps IOExceptions as YarnExceptions before throwing them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;
[ https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795701#comment-13795701 ] Xuan Gong commented on YARN-1303: - Input: {code} hadoop-3.0.0-SNAPSHOT/bin/hadoop jar hadoop-yarn-project-3.0.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar org.apache.hadoop.yarn.applications.distributedshell.Client --jar hadoop-yarn-project-3.0.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar --shell_command ls|pwd {code} part of output: {code} 13/10/15 14:37:41 INFO distributedshell.Client: Initializing Client DistributedShell does not support multiple commands or command pipeline. Please create a shell script for them and use --shell_script option {code} Allow multiple commands separating with ; - Key: YARN-1303 URL: https://issues.apache.org/jira/browse/YARN-1303 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1303.1.patch In shell, we can do ls; ls to run 2 commands at once. In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1303) Allow multiple commands separating with ;
[ https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1303: Attachment: YARN-1303.2.patch Fix a typo Allow multiple commands separating with ; - Key: YARN-1303 URL: https://issues.apache.org/jira/browse/YARN-1303 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1303.1.patch, YARN-1303.2.patch In shell, we can do ls; ls to run 2 commands at once. In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;
[ https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795708#comment-13795708 ] Hitesh Shah commented on YARN-1303: --- [~xgong] Could you clarify why ls;ls or ls | grep foo does not work in the first place? Is there a bug in the implementation that needs to be fixed to address this basic functionality? Allow multiple commands separating with ; - Key: YARN-1303 URL: https://issues.apache.org/jira/browse/YARN-1303 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1303.1.patch, YARN-1303.2.patch In shell, we can do ls; ls to run 2 commands at once. In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;
[ https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795719#comment-13795719 ] Hadoop QA commented on YARN-1303: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608581/YARN-1303.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2179//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2179//console This message is automatically generated. Allow multiple commands separating with ; - Key: YARN-1303 URL: https://issues.apache.org/jira/browse/YARN-1303 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1303.1.patch, YARN-1303.2.patch In shell, we can do ls; ls to run 2 commands at once. In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795723#comment-13795723 ] Karthik Kambatla commented on YARN-1068: Per discussion on YARN-1309 and YARN-142, looks like we should throw YarnException and not IOException. However, the actual exceptions to be thrown are defined in HAServiceProtocol which doesn't have YarnException listed. So, I guess we will have to leave the RMHAProtocolService as is. Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-10.patch, yarn-1068-11.patch, yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, yarn-1068-9.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;
[ https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795727#comment-13795727 ] Hadoop QA commented on YARN-1303: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608583/YARN-1303.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2180//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2180//console This message is automatically generated. Allow multiple commands separating with ; - Key: YARN-1303 URL: https://issues.apache.org/jira/browse/YARN-1303 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1303.1.patch, YARN-1303.2.patch In shell, we can do ls; ls to run 2 commands at once. In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-261) Ability to kill AM attempts
[ https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov updated YARN-261: - Attachment: YARN-261--n5.patch Jason, thanks for review. All your points make sense for me. Attaching a patch with fixes. Ability to kill AM attempts --- Key: YARN-261 URL: https://issues.apache.org/jira/browse/YARN-261 Project: Hadoop YARN Issue Type: New Feature Components: api Affects Versions: 2.0.3-alpha Reporter: Jason Lowe Assignee: Andrey Klochkov Attachments: YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, YARN-261--n5.patch, YARN-261.patch It would be nice if clients could ask for an AM attempt to be killed. This is analogous to the task attempt kill support provided by MapReduce. This feature would be useful in a scenario where AM retries are enabled, the AM supports recovery, and a particular AM attempt is stuck. Currently if this occurs the user's only recourse is to kill the entire application, requiring them to resubmit a new application and potentially breaking downstream dependent jobs if it's part of a bigger workflow. Killing the attempt would allow a new attempt to be started by the RM without killing the entire application, and if the AM supports recovery it could potentially save a lot of work. It could also be useful in workflow scenarios where the failure of the entire application kills the workflow, but the ability to kill an attempt can keep the workflow going if the subsequent attempt succeeds. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1172) Convert *SecretManagers in the RM to services
[ https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1172: - Attachment: YARN-1172.1.patch This patch is for verifying which strategy is better - convert *SecretManagers to services with composite pattern or convert SecretManager to be an AbstractService. In this patch, *SecretManagers are converted to services with composite pattern. Note that this implementation has lots duplication of the code. This code is also a bit tricky, because we need to implement Service interface to composite the instance of AbstraceService. If this change is not acceptable, we should convert SecretManager to be an AbstractService in HADOOP-10043. Convert *SecretManagers in the RM to services - Key: YARN-1172 URL: https://issues.apache.org/jira/browse/YARN-1172 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-1172.1.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;
[ https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795743#comment-13795743 ] Xuan Gong commented on YARN-1303: - [~hitesh] bq.Could you clarify why ls;ls or ls | grep foo does not work in the first place? Is there a bug in the implementation that needs to be fixed to address this basic functionality? I am not sure whether this can be counted as implementation bug. Why those commands does not work is on how bash read them. For example : I give the --shell_command ls;pwd (the command pipeline has the same issue) The script that used to launch ApplicationMaster has something like this : {code} exec /bin/bash -c $JAVA_HOME/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 128 --container_vcores 1 --num_containers 2 --priority 0 --shell_command ls;pwd 1/Users/xuan/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1381875664135_0001/container_1381875664135_0001_01_01/AppMaster.stdout 2/Users/xuan/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1381875664135_0001/container_1381875664135_0001_01_01/AppMaster.stderr {code} The bash will treat that as two separate command. The one is {code} $JAVA_HOME/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 128 --container_vcores 1 --num_containers 2 --priority 0 --shell_command ls {code} And all the containers will execute shell_command ls. Verify it by checking the shell script for container {code} exec /bin/bash -c ls 1/Users/xuan/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1381875664135_0001/container_1381875664135_0001_01_02/stdout 2/Users/xuan/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1381875664135_0001/container_1381875664135_0001_01_02/stderr {code} The other one is : {code} pwd 1/Users/xuan/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1381875664135_0001/container_1381875664135_0001_01_01/AppMaster.stdout 2/Users/xuan/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1381875664135_0001/container_1381875664135_0001_01_01/AppMaster.stderr {code} At the AppMaster.stdout, we can only find out those message {code} /Users/xuan/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-localDir-nm-0_0/usercache/xuan/appcache/application_1381875664135_0001/container_1381875664135_0001_01_01 {code} Which the result when we do pwd Allow multiple commands separating with ; - Key: YARN-1303 URL: https://issues.apache.org/jira/browse/YARN-1303 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1303.1.patch, YARN-1303.2.patch In shell, we can do ls; ls to run 2 commands at once. In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795753#comment-13795753 ] Andrey Klochkov commented on YARN-445: -- Vinod, Accepting a mapping of arbitrary commands is indeed the most powerful approach. Also, this would require lots of changes in the Yarn, as well as an additional complexity for app writers. At the same time, are we sure that this flexibility is needed, and it won't be an over-engineering and probably an abstraction leak in the Yarn framework? By the latter I mean that we will give app writers an ability to run arbitrary commands on any node at any point of time, but is it in the Yarn responsibilities to do that? I'm not a Yarn expert so I'm just asking. Anyway, the scope of what I has proposed with the patch is much smaller and solves the task the initial description of this Jira stated - troubleshooting of timed out containers by dumping jstack. This would be useful for many Yarn uses, so I thought it may make sense to implement it this way now and extend in the future if there is a demand. Agree that the way it is exposed in the API may be changed to a signal value in the stopContainers request instead of a separate call which is indeed a bit confusing. Ability to signal containers Key: YARN-445 URL: https://issues.apache.org/jira/browse/YARN-445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Jason Lowe Assignee: Andrey Klochkov Attachments: YARN-445--n2.patch, YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT, SIGUSR1, etc. For example, in order to replicate the jstack-on-task-timeout feature implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a container. For that specific feature we could implement it as an additional field in the StopContainerRequest. However that would not address other potential features like the ability for an AM to trigger jstacks on arbitrary tasks *without* killing them. The latter feature would be a very useful debugging tool for users who do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1289) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle.
[ https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795755#comment-13795755 ] Junping Du commented on YARN-1289: -- OK. I am good with removing other MR related configurations from YARN and agree this (decoupling MR and YARN) is the right direction. Will file a JIRA soon. Thanks for sharing the vision, [~hitesh]! Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle. -- Key: YARN-1289 URL: https://issues.apache.org/jira/browse/YARN-1289 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: wenwupeng Assignee: Junping Du Attachments: YARN-1289.patch Failed to run benchmark when not configure yarn.nodemanager.aux-services value in yarn-site.xml', it is better to configure default value. 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : attempt_1381371516570_0001_m_00_1, Status : FAILED Container launch failed for container_1381371516570_0001_01_05 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;
[ https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795762#comment-13795762 ] Hitesh Shah commented on YARN-1303: --- [[~xgong] From what you mention, it seems like that there is a bug in the client code which is not escaping and quoting the command line args for the ApplicationMaster correctly. i.e. it should be doing something like: org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 128 --container_vcores 1 --num_containers 2 --priority 0 --shell_command ls;pwd Allow multiple commands separating with ; - Key: YARN-1303 URL: https://issues.apache.org/jira/browse/YARN-1303 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1303.1.patch, YARN-1303.2.patch In shell, we can do ls; ls to run 2 commands at once. In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1310) Get rid of MR settings in YARN configuration
Junping Du created YARN-1310: Summary: Get rid of MR settings in YARN configuration Key: YARN-1310 URL: https://issues.apache.org/jira/browse/YARN-1310 Project: Hadoop YARN Issue Type: Improvement Reporter: Junping Du Per discussion in YARN-1289, we should get rid of MR settings (like below) and default values in YARN configuration which put unnecessary dependency for YARN on MR. {code} !--Map Reduce configuration-- property nameyarn.nodemanager.aux-services.mapreduce_shuffle.class/name valueorg.apache.hadoop.mapred.ShuffleHandler/value /property property namemapreduce.job.jar/name value/ /property property namemapreduce.job.hdfs-servers/name value${fs.defaultFS}/value /property {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1289) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle.
[ https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1289: - Assignee: (was: Junping Du) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle. -- Key: YARN-1289 URL: https://issues.apache.org/jira/browse/YARN-1289 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: wenwupeng Attachments: YARN-1289.patch Failed to run benchmark when not configure yarn.nodemanager.aux-services value in yarn-site.xml', it is better to configure default value. 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : attempt_1381371516570_0001_m_00_1, Status : FAILED Container launch failed for container_1381371516570_0001_01_05 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1289) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle.
[ https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795770#comment-13795770 ] Junping Du commented on YARN-1289: -- Filed YARN-1310 to track removing MR settings in yarn configuration. If nobody against, will mark this JIRA as invalid later. Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle. -- Key: YARN-1289 URL: https://issues.apache.org/jira/browse/YARN-1289 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: wenwupeng Assignee: Junping Du Attachments: YARN-1289.patch Failed to run benchmark when not configure yarn.nodemanager.aux-services value in yarn-site.xml', it is better to configure default value. 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : attempt_1381371516570_0001_m_00_1, Status : FAILED Container launch failed for container_1381371516570_0001_01_05 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov resolved YARN-677. -- Resolution: Won't Fix Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Assignee: Andrey Klochkov Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1310) Get rid of MR settings in YARN configuration
[ https://issues.apache.org/jira/browse/YARN-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1310: --- Hadoop Flags: Incompatible change Marking this as an incompatible change as it requires user-action. Get rid of MR settings in YARN configuration Key: YARN-1310 URL: https://issues.apache.org/jira/browse/YARN-1310 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Junping Du Per discussion in YARN-1289, we should get rid of MR settings (like below) and default values in YARN configuration which put unnecessary dependency for YARN on MR. {code} !--Map Reduce configuration-- property nameyarn.nodemanager.aux-services.mapreduce_shuffle.class/name valueorg.apache.hadoop.mapred.ShuffleHandler/value /property property namemapreduce.job.jar/name value/ /property property namemapreduce.job.hdfs-servers/name value${fs.defaultFS}/value /property {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1310) Get rid of MR settings in YARN configuration
[ https://issues.apache.org/jira/browse/YARN-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1310: --- Affects Version/s: 2.2.0 Get rid of MR settings in YARN configuration Key: YARN-1310 URL: https://issues.apache.org/jira/browse/YARN-1310 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Junping Du Per discussion in YARN-1289, we should get rid of MR settings (like below) and default values in YARN configuration which put unnecessary dependency for YARN on MR. {code} !--Map Reduce configuration-- property nameyarn.nodemanager.aux-services.mapreduce_shuffle.class/name valueorg.apache.hadoop.mapred.ShuffleHandler/value /property property namemapreduce.job.jar/name value/ /property property namemapreduce.job.hdfs-servers/name value${fs.defaultFS}/value /property {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services
[ https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796209#comment-13796209 ] Karthik Kambatla commented on YARN-1172: Copying the code from AbstractService doesn't seem like a good idea. I think we should avoid it if possible. Convert *SecretManagers in the RM to services - Key: YARN-1172 URL: https://issues.apache.org/jira/browse/YARN-1172 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-1172.1.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services
[ https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796212#comment-13796212 ] Karthik Kambatla commented on YARN-1172: Just thinking out loud. Along the lines of Suresh's suggestion, how about create a Service for each of the YARN-related SecretManger which actually instantiates, starts and kills the SecretManager. For instance, there could be a RMContainerTokenSecretManagerService that has an instance of RMContainerTokenSecretManager. Creates it on init(), starts it on start(), and stops it on stop(). Thoughts? Convert *SecretManagers in the RM to services - Key: YARN-1172 URL: https://issues.apache.org/jira/browse/YARN-1172 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-1172.1.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-261) Ability to kill AM attempts
[ https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796225#comment-13796225 ] Hadoop QA commented on YARN-261: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608589/YARN-261--n5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapred.TestJobCleanup The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.v2.TestUberAM The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2181//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2181//console This message is automatically generated. Ability to kill AM attempts --- Key: YARN-261 URL: https://issues.apache.org/jira/browse/YARN-261 Project: Hadoop YARN Issue Type: New Feature Components: api Affects Versions: 2.0.3-alpha Reporter: Jason Lowe Assignee: Andrey Klochkov Attachments: YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, YARN-261--n5.patch, YARN-261.patch It would be nice if clients could ask for an AM attempt to be killed. This is analogous to the task attempt kill support provided by MapReduce. This feature would be useful in a scenario where AM retries are enabled, the AM supports recovery, and a particular AM attempt is stuck. Currently if this occurs the user's only recourse is to kill the entire application, requiring them to resubmit a new application and potentially breaking downstream dependent jobs if it's part of a bigger workflow. Killing the attempt would allow a new attempt to be started by the RM without killing the entire application, and if the AM supports recovery it could potentially save a lot of work. It could also be useful in workflow scenarios where the failure of the entire application kills the workflow, but the ability to kill an attempt can keep the workflow going if the subsequent attempt succeeds. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1308) set default value for nodemanager aux service
[ https://issues.apache.org/jira/browse/YARN-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796235#comment-13796235 ] Arpit Gupta commented on YARN-1308: --- [~hitesh] Confirmed when yarn.nodemanager.aux-services and yarn.nodemanager.aux-services.mapreduce_shuffle.class are not defined the nodemanager comes up and registers with the RM. set default value for nodemanager aux service - Key: YARN-1308 URL: https://issues.apache.org/jira/browse/YARN-1308 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Arpit Gupta Priority: Minor Currently in order to get the nodemanagers to start you have to define yarn.nodemanager.aux-services and yarn.nodemanager.aux-services.mapreduce_shuffle.class. We should set these as defaults. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1311) Fix app specific scheduler-events' names to be app-attempt based
Vinod Kumar Vavilapalli created YARN-1311: - Summary: Fix app specific scheduler-events' names to be app-attempt based Key: YARN-1311 URL: https://issues.apache.org/jira/browse/YARN-1311 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Trivial Today, APP_ADDED and APP_REMOVED are sent to the scheduler. They are misnomers as schedulers only deal with AppAttempts today. This JIRA is for fixing their names so that we can add App-level events in the near future, notably for work-preserving RM-restart. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1311) Fix app specific scheduler-events' names to be app-attempt based
[ https://issues.apache.org/jira/browse/YARN-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1311: -- Attachment: YARN-1311-20131015.txt Straight forward patch with event-renaming. No change to any logic. Fix app specific scheduler-events' names to be app-attempt based Key: YARN-1311 URL: https://issues.apache.org/jira/browse/YARN-1311 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Trivial Attachments: YARN-1311-20131015.txt Today, APP_ADDED and APP_REMOVED are sent to the scheduler. They are misnomers as schedulers only deal with AppAttempts today. This JIRA is for fixing their names so that we can add App-level events in the near future, notably for work-preserving RM-restart. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services
[ https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796248#comment-13796248 ] Tsuyoshi OZAWA commented on YARN-1172: -- I came up with creating SecretManagerServiceT as a base class for YARN-related SecretManager like this:https://gist.github.com/oza/7000796. This may be better than creating a Service for each of the YARN-related SecretManger, because we can avoid code duplication between *SecretManagerService. Convert *SecretManagers in the RM to services - Key: YARN-1172 URL: https://issues.apache.org/jira/browse/YARN-1172 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-1172.1.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1311) Fix app specific scheduler-events' names to be app-attempt based
[ https://issues.apache.org/jira/browse/YARN-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796275#comment-13796275 ] Hadoop QA commented on YARN-1311: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608614/YARN-1311-20131015.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2182//console This message is automatically generated. Fix app specific scheduler-events' names to be app-attempt based Key: YARN-1311 URL: https://issues.apache.org/jira/browse/YARN-1311 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Trivial Attachments: YARN-1311-20131015.txt Today, APP_ADDED and APP_REMOVED are sent to the scheduler. They are misnomers as schedulers only deal with AppAttempts today. This JIRA is for fixing their names so that we can add App-level events in the near future, notably for work-preserving RM-restart. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1172) Convert *SecretManagers in the RM to services
[ https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1172: - Attachment: YARN-1172.2.patch Convert *SecretManagers in the RM to services - Key: YARN-1172 URL: https://issues.apache.org/jira/browse/YARN-1172 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-1172.1.patch, YARN-1172.2.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1181) Implement MiniYARNHACluster
[ https://issues.apache.org/jira/browse/YARN-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796292#comment-13796292 ] Karthik Kambatla commented on YARN-1181: Have worked on this some. I think the best way to do this is to actually augment MiniYARNCluster to allow creating a cluster with multiple RMs, instead of duplicating the code in another class. Implement MiniYARNHACluster --- Key: YARN-1181 URL: https://issues.apache.org/jira/browse/YARN-1181 Project: Hadoop YARN Issue Type: Sub-task Reporter: Karthik Kambatla Assignee: Karthik Kambatla MiniYARNHACluster, along the lines of MiniYARNCluster, is needed for end-to-end HA tests. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1181) Implement MiniYARNHACluster
[ https://issues.apache.org/jira/browse/YARN-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1181: --- Attachment: yarn-1181-1.patch First-cut patch that adds the functionality. Implement MiniYARNHACluster --- Key: YARN-1181 URL: https://issues.apache.org/jira/browse/YARN-1181 Project: Hadoop YARN Issue Type: Sub-task Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1181-1.patch MiniYARNHACluster, along the lines of MiniYARNCluster, is needed for end-to-end HA tests. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services
[ https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796321#comment-13796321 ] Hadoop QA commented on YARN-1172: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608627/YARN-1172.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2183//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2183//console This message is automatically generated. Convert *SecretManagers in the RM to services - Key: YARN-1172 URL: https://issues.apache.org/jira/browse/YARN-1172 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-1172.1.patch, YARN-1172.2.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1312) Job History server queue attribute incorrectly reports default when username is actually used for queue at runtime
Philip Zeyliger created YARN-1312: - Summary: Job History server queue attribute incorrectly reports default when username is actually used for queue at runtime Key: YARN-1312 URL: https://issues.apache.org/jira/browse/YARN-1312 Project: Hadoop YARN Issue Type: Bug Reporter: Philip Zeyliger If you run a MapReduce job with the fair scheduler and you query the JobHistory server for its metadata, you might see something like the following at http://jh_host:19888/ws/v1/history/mapreduce/jobs/job_1381878638171_0001/ {code} job startTime1381890132608/startTime finishTime1381890141988/finishTime idjob_1381878638171_0001/id nameTeraGen/name queuedefault/queue userhdfs/user ... /job {code} The same is true if you query the RM while it's running via http://rm_host:8088/ws/v1/cluster/apps/application_1381878638171_0002: {code} app idapplication_1381878638171_0002/id userhdfs/user nameTeraGen/name queuedefault/queue ... /app {code} As it turns out, in both of these cases, the job is actually executing in root.hdfs and not in root.default because {{yarn.scheduler.fair.user-as-default-queue}} is set to true. This makes it hard to figure out after the fact (or during!) what queue the MR job was running under. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-947) Defining the history data classes for the implementation of the reading/writing interface
[ https://issues.apache.org/jira/browse/YARN-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-947: - Attachment: YARN-947.3.patch Created a new patch incrementally, which includes the following modifications: 1. The biggest thing here is to add another two set of protobuf records in addition to the set of HistoryData, which are the set of StartData and that of FinishData. In fact, HistoryData = StartData + FinishData. The duplicate part is the Id, which serves as the key. StartData contains the fields that are determined when the object (RMApp, RMAppAttempt and RMContainer) starts, while FinishData contains the fields that are determined when the object finishes. With the separated records, we can redesign the writer interface to write part of the data when the object starts and the other when the object finishes, therefore reducing the loss of information when the history data cannot be completely record (e.g. RM crash). 2. Change all protobuf records from interface to abstract class, and add the builtin newInstance method for users to call. 3. Improve toString() of PBImpl here as well, which is filed in YARN-1066. Therefore, I'll close that jira as duplicate 4. Fix a bug in ContainerHistoryDataPBImpl. 5. Instead of recording ContainerState, I change to record ContainerExitCode. The reason is stated in YARN-1123: https://issues.apache.org/jira/browse/YARN-1123?focusedCommentId=13793962page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13793962 ContainerState is always FINISHED for all the containers, which is meaningless. Instead, ContainerExitStatus, which is the exit code, can indicate the problems in the container. [~vinodkv], would you please review it again? Defining the history data classes for the implementation of the reading/writing interface - Key: YARN-947 URL: https://issues.apache.org/jira/browse/YARN-947 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: YARN-321 Attachments: YARN-947.1.patch, YARN-947.2.patch, YARN-947.3.patch We need to define the history data classes have the exact fields to be stored. Therefore, all the implementations don't need to have the duplicate logic to exact the required information from RMApp, RMAppAttempt and RMContainer. We use protobuf to define these classes, such that they can be ser/des to/from bytes, which are easier for persistence. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1066) Improve toString implementation for PBImpls for AHS
[ https://issues.apache.org/jira/browse/YARN-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796392#comment-13796392 ] Zhijie Shen commented on YARN-1066: --- YARN-947 is reopened, so let's fix the issue together there. Close this ticket as duplicate Improve toString implementation for PBImpls for AHS --- Key: YARN-1066 URL: https://issues.apache.org/jira/browse/YARN-1066 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen YARN-1045 improves toString implementation for PBImpls, AHS's PBImpls should be changed accordingly -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-1066) Improve toString implementation for PBImpls for AHS
[ https://issues.apache.org/jira/browse/YARN-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1066. --- Resolution: Duplicate Improve toString implementation for PBImpls for AHS --- Key: YARN-1066 URL: https://issues.apache.org/jira/browse/YARN-1066 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen YARN-1045 improves toString implementation for PBImpls, AHS's PBImpls should be changed accordingly -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-947) Defining the history data classes for the implementation of the reading/writing interface
[ https://issues.apache.org/jira/browse/YARN-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796399#comment-13796399 ] Hadoop QA commented on YARN-947: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608648/YARN-947.3.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2184//console This message is automatically generated. Defining the history data classes for the implementation of the reading/writing interface - Key: YARN-947 URL: https://issues.apache.org/jira/browse/YARN-947 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: YARN-321 Attachments: YARN-947.1.patch, YARN-947.2.patch, YARN-947.3.patch We need to define the history data classes have the exact fields to be stored. Therefore, all the implementations don't need to have the duplicate logic to exact the required information from RMApp, RMAppAttempt and RMContainer. We use protobuf to define these classes, such that they can be ser/des to/from bytes, which are easier for persistence. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-934) HistoryStorage writer interface for Application History Server
[ https://issues.apache.org/jira/browse/YARN-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-934: - Attachment: YARN-934.4.patch Talked to [~vinodkv], and we thought it's better to split the each writing operations into two. One is executed when the object (RMApp, RMAppAttempt or RMContainer) is started, recording the information that is already available. The other is executed when the object reaches its finishing stage, recording the information that is finally determined. I uploaded a new incremental patch to draft the new writer interface. In addition to that, I modified ApplicationHistoryStore as well. I change it from the interface to the abstract class, which extends AbstractService. Therefore, The implementations of it (e.g. FS storage, DB storage) can make use of the life cycle of a service, doing the necessary initialization and cleanup work in the corresponding stage. HistoryStorage writer interface for Application History Server -- Key: YARN-934 URL: https://issues.apache.org/jira/browse/YARN-934 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: YARN-321 Attachments: YARN-934.1.patch, YARN-934.2.patch, YARN-934.3.patch, YARN-934.4.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1002) Optimizing the reading/writing operations of FileSystemHistoryStorage
[ https://issues.apache.org/jira/browse/YARN-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796420#comment-13796420 ] Zhijie Shen commented on YARN-1002: --- Brainstormed with [~vinodkv] and [~mayank_bansal]. Since we've already made the proof-of-concept end-to-end AHS work, we should move on to make the production-ready AHS. Therefore, we need to make sure FileSystemHistoryStorage performs well before merging AHS into trunk. Therefore, this optimization work will be currently done with YARN-975. Close this ticket as duplicate. Optimizing the reading/writing operations of FileSystemHistoryStorage - Key: YARN-1002 URL: https://issues.apache.org/jira/browse/YARN-1002 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Whenever the end-to-end system is done, we need to improve the performance of the reading/writing operations of FileSystemHistoryStorage. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-1002) Optimizing the reading/writing operations of FileSystemHistoryStorage
[ https://issues.apache.org/jira/browse/YARN-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1002. --- Resolution: Duplicate Optimizing the reading/writing operations of FileSystemHistoryStorage - Key: YARN-1002 URL: https://issues.apache.org/jira/browse/YARN-1002 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Whenever the end-to-end system is done, we need to improve the performance of the reading/writing operations of FileSystemHistoryStorage. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-934) HistoryStorage writer interface for Application History Server
[ https://issues.apache.org/jira/browse/YARN-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796422#comment-13796422 ] Hadoop QA commented on YARN-934: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608654/YARN-934.4.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2185//console This message is automatically generated. HistoryStorage writer interface for Application History Server -- Key: YARN-934 URL: https://issues.apache.org/jira/browse/YARN-934 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: YARN-321 Attachments: YARN-934.1.patch, YARN-934.2.patch, YARN-934.3.patch, YARN-934.4.patch -- This message was sent by Atlassian JIRA (v6.1#6144)