[jira] [Assigned] (YARN-1551) Allow user-specified reason for killApplication
[ https://issues.apache.org/jira/browse/YARN-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov reassigned YARN-1551: --- Assignee: Gera Shegalov Allow user-specified reason for killApplication --- Key: YARN-1551 URL: https://issues.apache.org/jira/browse/YARN-1551 Project: Hadoop YARN Issue Type: Improvement Reporter: Gera Shegalov Assignee: Gera Shegalov This completes MAPREDUCE-5648 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1551) Allow user-specified reason for killApplication
Gera Shegalov created YARN-1551: --- Summary: Allow user-specified reason for killApplication Key: YARN-1551 URL: https://issues.apache.org/jira/browse/YARN-1551 Project: Hadoop YARN Issue Type: Improvement Reporter: Gera Shegalov This completes MAPREDUCE-5648 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1551) Allow user-specified reason for killApplication
[ https://issues.apache.org/jira/browse/YARN-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated YARN-1551: Attachment: YARN-1551.v01.patch Allow user-specified reason for killApplication --- Key: YARN-1551 URL: https://issues.apache.org/jira/browse/YARN-1551 Project: Hadoop YARN Issue Type: Improvement Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1551.v01.patch This completes MAPREDUCE-5648 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1551) Allow user-specified reason for killApplication
[ https://issues.apache.org/jira/browse/YARN-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858803#comment-13858803 ] Hadoop QA commented on YARN-1551: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620843/YARN-1551.v01.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2757//console This message is automatically generated. Allow user-specified reason for killApplication --- Key: YARN-1551 URL: https://issues.apache.org/jira/browse/YARN-1551 Project: Hadoop YARN Issue Type: Improvement Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1551.v01.patch This completes MAPREDUCE-5648 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1551) Allow user-specified reason for killApplication
[ https://issues.apache.org/jira/browse/YARN-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858871#comment-13858871 ] Karthik Kambatla commented on YARN-1551: We should wrap this JIRA up before MAPREDUCE-5648. Just skimmed through the patch. Looks reasonable at first glance. There are still map reduce changes in the patch, can we move them to the MR jira? Also, we should add some YARN specific tests. Allow user-specified reason for killApplication --- Key: YARN-1551 URL: https://issues.apache.org/jira/browse/YARN-1551 Project: Hadoop YARN Issue Type: Improvement Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1551.v01.patch This completes MAPREDUCE-5648 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858993#comment-13858993 ] Zhijie Shen commented on YARN-1399: --- Hm... It makes sense. The only tag from oozie seems not to be enough to identify the apps of a workflow. For example, in the multiple tenancy situation, two oozie workflows submitted by two different users happen to have the same workflow ID. Then, killing the apps have this workflow ID will screw up both workflow. To solve the problem in YARN-1390, we may need more information to identify the apps belonging to a unique workflow: 1. Having GUID in the workflow ID to prevent conflicting IDs in a YARN cluster (unless others intentionally attack) 2. Check owner of the application as well to avoid the case that attackers copy the workflow ID. Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859025#comment-13859025 ] Sandy Ryza commented on YARN-1461: -- The patch is looking good. A few comments: {code} + @Private + @Unstable + public abstract SetString getTags(); {code} Getters in GetApplicationsRequest should be Public/Stable. {code} + + + /** + * Get tags for the application {code} Two lines not needed here. {code} + public abstract void setTags(SetString tags) throws IllegalArgumentException; {code} IllegalArgumentException is a RuntimeException - we don't need a throws for it. I think using a checked exception here would require unnecessary try/catch for users who are doing it right. If they're using tags that violate the constraints, they should be changing their code, not handling the exception. {code} +SetString appTags = parseQueries(tags, false); +if (!appTags.isEmpty()) { + checkAppTags = true; +} {code} Null check? {code} +appInfo.getTags(.append(\,\) + .append(StringEscapeUtils.escapeJavaScript(StringEscapeUtils.escapeHtml( {code} We need to add a column in the table header for this, right? {code} + protected String tags = ; // initialize to an empty string {code} Comment is unnecessary {code} + if (app.getTags() != null !app.getTags().isEmpty()) { +for (String tag : app.getTags()) { + this.tags += tag + ,; +} +this.tags = this.tags.substring(0, tags.length() - 1); + } {code} Use Joiner.on(,) {code} + + @Override + public SetString getTags() { return null; } {code} Use multiple lines as is done for other methods that return null in the class. RM API and RM changes to handle tags for running jobs - Key: YARN-1461 URL: https://issues.apache.org/jira/browse/YARN-1461 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, yarn-1461-4.patch, yarn-1461-5.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1029: --- Attachment: yarn-1029-8.patch Noticed occasional failures of TestRMFailover in some Jenkins builds (e.g. https://builds.apache.org/job/PreCommit-YARN-Build/2750//testReport/org.apache.hadoop.yarn.client/TestRMFailover/testExplicitFailover/). New patch that bumps up the timeout for NM-RM connection to 10 seconds (granularity remains 100 ms). Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1551) Allow user-specified reason for killApplication
[ https://issues.apache.org/jira/browse/YARN-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859036#comment-13859036 ] Gera Shegalov commented on YARN-1551: - [~kkambatl], I find it difficult to separate MR code out of this patch. The reason is ResourceMgrDelegate extends abstract YarnClient. You are right, the test is not YARN-specific but it tests end-to-end cli and YARN-api on a MR-job. [~vinodkv], thanks for suggestion. I'll work on incorporating a limit here and in MAPREDUCE-5648. Allow user-specified reason for killApplication --- Key: YARN-1551 URL: https://issues.apache.org/jira/browse/YARN-1551 Project: Hadoop YARN Issue Type: Improvement Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1551.v01.patch This completes MAPREDUCE-5648 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1461: --- Attachment: yarn-1461-6.patch RM API and RM changes to handle tags for running jobs - Key: YARN-1461 URL: https://issues.apache.org/jira/browse/YARN-1461 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1552) Promote GetApplicationsRequest APIs to Stable
Karthik Kambatla created YARN-1552: -- Summary: Promote GetApplicationsRequest APIs to Stable Key: YARN-1552 URL: https://issues.apache.org/jira/browse/YARN-1552 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.2.0 Reporter: Karthik Kambatla GetApplicationsRequest is used to fetch applications from the RM. Currently, all APIs are Public/Unstable. I think it is time to graduate these to Stable APIs -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859044#comment-13859044 ] Sandy Ryza commented on YARN-1029: -- Looked over the minicluster changes. A couple tiny nits, otherwise LGTM: * In MiniYarnCluster, failoverTimeout does not need to be initialized to 0 because it will always get set in serviceInit. * In initResourceManager, index does not need to be final * In initResourceManager, having the open paren on the line after register looks a little weird, and new EventHandlerRMAppAttemptEvent() should be at the same indentation level as RMAppAttemptEventType.class. * The thread in startResourceManager should be given a name (including the index). Though if that's unrelated to this patch, leaving it how it is is fine. * Why the added null check in getActiveRMIndex? When would one of the entries in the resourceManagers array be null? Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859048#comment-13859048 ] Karthik Kambatla commented on YARN-1461: Thanks Sandy. Posted a patch that incorporates most of your suggestions. bq. Getters in GetApplicationsRequest should be Public/Stable. All other getters and setts in GetApplicationsRequest are Public/Unstable. I would like to leave it this way for this patch. Created YARN-1552 to graduate all of them to Public/Stable. {quote} {code} +SetString appTags = parseQueries(tags, false); +if (!appTags.isEmpty()) { + checkAppTags = true; +} {code} Null check? {quote} parseQueries always returns a non-null set. There are couple of other uses without a null check. I ll leave it consistent with others if that is okay. RM API and RM changes to handle tags for running jobs - Key: YARN-1461 URL: https://issues.apache.org/jira/browse/YARN-1461 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859047#comment-13859047 ] Sandy Ryza commented on YARN-1461: -- In GetApplicationsRequest, getTags still hasn't been changed to Public/Stable {code} - public void setApplicationStates(EnumSetYarnApplicationState applicationStates) { + public void setApplicationStates(EnumSet YarnApplicationState applicationStates) { {code} RM API and RM changes to handle tags for running jobs - Key: YARN-1461 URL: https://issues.apache.org/jira/browse/YARN-1461 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859051#comment-13859051 ] Hadoop QA commented on YARN-1461: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620866/yarn-1461-6.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2759//console This message is automatically generated. RM API and RM changes to handle tags for running jobs - Key: YARN-1461 URL: https://issues.apache.org/jira/browse/YARN-1461 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1552) Make all GetApplicationsRequest getter APIs Public/Stable
[ https://issues.apache.org/jira/browse/YARN-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1552: - Description: GetApplicationsRequest is used to fetch applications from the RM. Currently, all setters and some getters are Private/Unstable and some getters are Public/Stable. We should graduate all the getters to Public/Stable. was:GetApplicationsRequest is used to fetch applications from the RM. Currently, all APIs are Public/Unstable. I think it is time to graduate these to Stable APIs Summary: Make all GetApplicationsRequest getter APIs Public/Stable (was: Promote GetApplicationsRequest APIs to Stable) Make all GetApplicationsRequest getter APIs Public/Stable - Key: YARN-1552 URL: https://issues.apache.org/jira/browse/YARN-1552 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.2.0 Reporter: Karthik Kambatla GetApplicationsRequest is used to fetch applications from the RM. Currently, all setters and some getters are Private/Unstable and some getters are Public/Stable. We should graduate all the getters to Public/Stable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859053#comment-13859053 ] Robert Kanter commented on YARN-1399: - {quote} Having GUID in the workflow ID to prevent conflicting IDs in a YARN cluster (unless others intentionally attack) {quote} Oozie workflow ID's are configurable, but by default they are {{job_number\-timesteamp\-system_id-job_type}}. These should be unique so we don't need a separate GUID. We'd actually use the action ID instead of the workflow ID, but that's just {{workflow_ID@action_name}}, which is also unique. For example: {{005-131218150050953-oozie-oozi-W@pig-node}} Unless the user configured multiple Oozie servers such that the IDs would be similar, the only way to get the same workflow or action id would be to start the two servers at exactly the same time so they'd have the same timestamp, which I don't think we need to worry about. Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1461: --- Attachment: yarn-1461-6.patch Resubmitting same patch to see if Jenkins likes it better. RM API and RM changes to handle tags for running jobs - Key: YARN-1461 URL: https://issues.apache.org/jira/browse/YARN-1461 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch, yarn-1461-6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859062#comment-13859062 ] Hadoop QA commented on YARN-1029: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620865/yarn-1029-8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.ha.TestZKFailoverController {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2758//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2758//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2758//console This message is automatically generated. Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859083#comment-13859083 ] Hadoop QA commented on YARN-1461: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620867/yarn-1461-6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2760//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2760//console This message is automatically generated. RM API and RM changes to handle tags for running jobs - Key: YARN-1461 URL: https://issues.apache.org/jira/browse/YARN-1461 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch, yarn-1461-6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1522) TestApplicationCleanup.testAppCleanup occasionally fails
[ https://issues.apache.org/jira/browse/YARN-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1522: -- Attachment: YARN-1522-2.txt Same patch as before, for appeasing Jenkins. TestApplicationCleanup.testAppCleanup occasionally fails Key: YARN-1522 URL: https://issues.apache.org/jira/browse/YARN-1522 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Liyin Liang Assignee: Liyin Liang Attachments: YARN-1522-1.diff, YARN-1522-2.txt, YARN-1522-2.txt TestApplicationCleanup is occasionally failing with the error: {code} --- Test set: org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup --- Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.215 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup testAppCleanup(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup) Time elapsed: 5.555 sec FAILURE! junit.framework.AssertionFailedError: expected:1 but was:0 at org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup.testAppCleanup(TestApplicationCleanup.java:119) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1461: --- Attachment: yarn-1461-7.patch Removed one false change in GetApplicationsRequestPBImpl. RM API and RM changes to handle tags for running jobs - Key: YARN-1461 URL: https://issues.apache.org/jira/browse/YARN-1461 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch, yarn-1461-6.patch, yarn-1461-7.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859103#comment-13859103 ] Sandy Ryza commented on YARN-1461: -- +1 RM API and RM changes to handle tags for running jobs - Key: YARN-1461 URL: https://issues.apache.org/jira/browse/YARN-1461 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch, yarn-1461-6.patch, yarn-1461-7.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859130#comment-13859130 ] Zhijie Shen commented on YARN-1461: --- [~kkambatl], given Vinod's concern of using tags to identify the applications of a oozie workflow (see YARN-1399), would you mind holding off your patches for a while, such that we can think more about it? RM API and RM changes to handle tags for running jobs - Key: YARN-1461 URL: https://issues.apache.org/jira/browse/YARN-1461 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch, yarn-1461-6.patch, yarn-1461-7.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1522) TestApplicationCleanup.testAppCleanup occasionally fails
[ https://issues.apache.org/jira/browse/YARN-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859140#comment-13859140 ] Hudson commented on YARN-1522: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4940 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4940/]) YARN-1522. Fixed a race condition in the test TestApplicationCleanup that was causing it to randomly fail. Contributed by Liyin Liang. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1554328) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java TestApplicationCleanup.testAppCleanup occasionally fails Key: YARN-1522 URL: https://issues.apache.org/jira/browse/YARN-1522 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Liyin Liang Assignee: Liyin Liang Fix For: 2.4.0 Attachments: YARN-1522-1.diff, YARN-1522-2.txt, YARN-1522-2.txt TestApplicationCleanup is occasionally failing with the error: {code} --- Test set: org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup --- Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.215 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup testAppCleanup(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup) Time elapsed: 5.555 sec FAILURE! junit.framework.AssertionFailedError: expected:1 but was:0 at org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup.testAppCleanup(TestApplicationCleanup.java:119) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1121) RMStateStore should flush all pending store events before closing
[ https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1121: -- Attachment: YARN-1121.11.patch RMStateStore should flush all pending store events before closing - Key: YARN-1121 URL: https://issues.apache.org/jira/browse/YARN-1121 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1121.1.patch, YARN-1121.10.patch, YARN-1121.11.patch, YARN-1121.2.patch, YARN-1121.2.patch, YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, YARN-1121.6.patch, YARN-1121.7.patch, YARN-1121.8.patch, YARN-1121.9.patch on serviceStop it should wait for all internal pending events to drain before stopping. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing
[ https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859141#comment-13859141 ] Jian He commented on YARN-1121: --- Agree, made the change to check the blockNewEvents flag before acquiring the lock and calling notify, and added the comment also. RMStateStore should flush all pending store events before closing - Key: YARN-1121 URL: https://issues.apache.org/jira/browse/YARN-1121 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1121.1.patch, YARN-1121.10.patch, YARN-1121.11.patch, YARN-1121.2.patch, YARN-1121.2.patch, YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, YARN-1121.6.patch, YARN-1121.7.patch, YARN-1121.8.patch, YARN-1121.9.patch on serviceStop it should wait for all internal pending events to drain before stopping. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859148#comment-13859148 ] Karthik Kambatla commented on YARN-1399: bq. Tags or the source/group originally proposed won't help the oozie case as described on YARN-1390. Or to be more accurate, they make it unwieldy. Let's say oozie uses a tag workflow_123_566 for all apps in a workflow, any other application from any other user SHOULD not set that tag. Or run the risk of getting killed by oozie. As Robert mentioned workflow@action is an involved string, and I don't think a user accidentally setting that tag should be a concern. Oozie will kill an application as the user who submitted the job. I agree Oozie will kill an app if the same user (who submitted the workflow) submits another app with a tag matching the action id. It doesn't sound likely, and I don't think it is really a concern. Even if it were a concern, I don't think it is YARN's responsibility to limit what tags can be set or not. bq. To avoid it, we'll need to depend on oozie to not kill as a privileged user. Oozie runs jobs as the user that submits the workflow. The same rules will apply to killing a job. Even if Oozie somehow becomes malicious, it is YARN's responsibility to not let it run/kill jobs as a privileged user. No? bq. Further, I could make any other user's application-search cumbersome by reusing his/her tags for my own applications. Seems like the tag-search should be linked to and limited by some other entity like user - search for apps matching a tag for a given user/queue etc. That is definitely a thought. I am open to enforcing specifying either a user/queue when searching for a tag. However, in principle, this could happen with application-types as well: a user could submit a number of random YARN applications with type MAPREDUCE. I thought the way we were restricting exposing these (tags/types) was through ACLs on a secure cluster. Only users with permissions to view someone else's apps should be able to view the tags. Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859151#comment-13859151 ] Karthik Kambatla commented on YARN-1399: The following blurb in {{ClientRMService#getApplications}} restricts view access: {code} boolean allowAccess = checkAccess(callerUGI, application.getUser(), ApplicationAccessType.VIEW_APP, application); reports.add(application.createAndGetApplicationReport( callerUGI.getUserName(), allowAccess)); {code} Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1489) [Umbrella] Work-preserving ApplicationMaster restart
[ https://issues.apache.org/jira/browse/YARN-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859156#comment-13859156 ] Zhijie Shen commented on YARN-1489: --- Thanks Vinod for the proposal. One thought when I read the following point. bq. In case of apps like MapReduce where containers need to communicate directly with AMs, the old running-containers don’t know where the new ApplicationMaster is running and how to reach it (service addresses). During AM restarting, the container may try to send messages to AM in some application, and these messages may get lost. Is good to buffer the outstanding messages and send them to AM when rebinding? [Umbrella] Work-preserving ApplicationMaster restart Key: YARN-1489 URL: https://issues.apache.org/jira/browse/YARN-1489 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: Work preserving AM restart.pdf Today if AMs go down, - RM kills all the containers of that ApplicationAttempt - New ApplicationAttempt doesn't know where the previous containers are running - Old running containers don't know where the new AM is running. We need to fix this to enable work-preserving AM restart. The later two potentially can be done at the app level, but it is good to have a common solution for all apps where-ever possible. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing
[ https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859157#comment-13859157 ] Hadoop QA commented on YARN-1121: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620880/YARN-1121.11.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2763//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2763//console This message is automatically generated. RMStateStore should flush all pending store events before closing - Key: YARN-1121 URL: https://issues.apache.org/jira/browse/YARN-1121 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1121.1.patch, YARN-1121.10.patch, YARN-1121.11.patch, YARN-1121.2.patch, YARN-1121.2.patch, YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, YARN-1121.6.patch, YARN-1121.7.patch, YARN-1121.8.patch, YARN-1121.9.patch on serviceStop it should wait for all internal pending events to drain before stopping. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1029: --- Attachment: yarn-1029-9.patch Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch, yarn-1029-9.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859160#comment-13859160 ] Karthik Kambatla commented on YARN-1029: Thanks Sandy. Posted a new patch that addresses most of your comments. bq. Why the added null check in getActiveRMIndex? When would one of the entries in the resourceManagers array be null? stopResourceManager(i) stops and nullifies resourceManagers[i]. restart() resets this and points to a new RM. Currently, we are forced to do this because a stopped service can't be restarted, and the only way to trigger a failover automatically is to kill the RM. Have marked these two methods @Private to limit their access to YARN. Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch, yarn-1029-9.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1552) Make all GetApplicationsRequest getter APIs Public/Stable
[ https://issues.apache.org/jira/browse/YARN-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859167#comment-13859167 ] Zhijie Shen commented on YARN-1552: --- Having checked GetApplicationsRequest, I think two issues may be treated independently: 1. Private - Public: the methods are supposed to be user oriented. Not sure why they were Private before. 2. Unstable - Stable: I agree with Hitesh on not doing this until we figure out the all the filters we require. It seems that RM's webservices still provide more filtering options than RPC. In the future, we may add tags as well. Make all GetApplicationsRequest getter APIs Public/Stable - Key: YARN-1552 URL: https://issues.apache.org/jira/browse/YARN-1552 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.2.0 Reporter: Karthik Kambatla GetApplicationsRequest is used to fetch applications from the RM. Currently, all setters and some getters are Private/Unstable and some getters are Public/Stable. We should graduate all the getters to Public/Stable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1552) Make all GetApplicationsRequest getter APIs Public/Stable
[ https://issues.apache.org/jira/browse/YARN-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859173#comment-13859173 ] Karthik Kambatla commented on YARN-1552: Makes sense to wait until history server work is done. [~zjshen] - can you add a blocker link on a related history server item? Make all GetApplicationsRequest getter APIs Public/Stable - Key: YARN-1552 URL: https://issues.apache.org/jira/browse/YARN-1552 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.2.0 Reporter: Karthik Kambatla GetApplicationsRequest is used to fetch applications from the RM. Currently, all setters and some getters are Private/Unstable and some getters are Public/Stable. We should graduate all the getters to Public/Stable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1552) Make all GetApplicationsRequest getter APIs Public/Stable
[ https://issues.apache.org/jira/browse/YARN-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859178#comment-13859178 ] Hitesh Shah commented on YARN-1552: --- [~kkambatl] [~zjshen] Not sure that there should be a blocker history server task for this. What I meant to imply is that once the history server is implemented and merged in, a fresh look should be taken at this api and eventually be marked stable once the history server apis also reach stability. I doubt the history server apis will be stable when the work is ready to be merged into trunk. Make all GetApplicationsRequest getter APIs Public/Stable - Key: YARN-1552 URL: https://issues.apache.org/jira/browse/YARN-1552 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.2.0 Reporter: Karthik Kambatla GetApplicationsRequest is used to fetch applications from the RM. Currently, all setters and some getters are Private/Unstable and some getters are Public/Stable. We should graduate all the getters to Public/Stable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1136) Replace junit.framework.Assert with org.junit.Assert
[ https://issues.apache.org/jira/browse/YARN-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-1136: -- Attachment: yarn1136.patch Replace junit.framework.Assert with org.junit.Assert Key: YARN-1136 URL: https://issues.apache.org/jira/browse/YARN-1136 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Chen He Labels: newbie, test Attachments: yarn1136.patch There are several places where we are using junit.framework.Assert instead of org.junit.Assert. {code}grep -rn junit.framework.Assert hadoop-yarn-project/ --include=*.java{code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1552) Make all GetApplicationsRequest getter APIs Public/Stable
[ https://issues.apache.org/jira/browse/YARN-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859180#comment-13859180 ] Karthik Kambatla commented on YARN-1552: I should have been clearer. I meant to say, this JIRA should depend on (linked as - blocked by) one of the HistoryServer API JIRAs, so we know when to look into this again. Make all GetApplicationsRequest getter APIs Public/Stable - Key: YARN-1552 URL: https://issues.apache.org/jira/browse/YARN-1552 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.2.0 Reporter: Karthik Kambatla GetApplicationsRequest is used to fetch applications from the RM. Currently, all setters and some getters are Private/Unstable and some getters are Public/Stable. We should graduate all the getters to Public/Stable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859196#comment-13859196 ] Hadoop QA commented on YARN-1029: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620883/yarn-1029-9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2764//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2764//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2764//console This message is automatically generated. Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch, yarn-1029-9.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859199#comment-13859199 ] Karthik Kambatla commented on YARN-1410: Like the idea of calling createApplication() from within submitApplication() if the appId is not set in ASC. There should also be a version of ASC.newInstance() that doesn't require an applicationId and can be used by the clients. Another bizarre alternative would be to prematurely write the appId to the state-store on createApplication() even before an application is submitted. So, there could be appIds in the store that don't correspond to any applications - which is kind of weird and can lead to the store bloating up and other undesirable side-effects. Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410.1.patch App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859206#comment-13859206 ] Karthik Kambatla commented on YARN-1521: I don't know the contracts on each of these APIs well enough (yet) to say whether they are idempotent - it would be nice for someone like [~vinodkv] to comment on this. I think allocate(), finishApplicationMaster(), getQueueInfo() are idempotent. If not already, get/renewDelegationToken(), getCluster*() should also be (made) idempotent. Other items in the list sound okay to me. Mark appropriate protocol methods with the idempotent annotation Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1553) Do not use HttpConfig.isSecure() in YARN
Haohui Mai created YARN-1553: Summary: Do not use HttpConfig.isSecure() in YARN Key: YARN-1553 URL: https://issues.apache.org/jira/browse/YARN-1553 Project: Hadoop YARN Issue Type: Bug Reporter: Haohui Mai HDFS-5305 and related jira decide that each individual project will have their own configuration on http policy. {{HttpConfig.isSecure}} is a global static method which does not fit the design anymore. The same functionality should be moved into the YARN code base. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1553) Do not use HttpConfig.isSecure() in YARN
[ https://issues.apache.org/jira/browse/YARN-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated YARN-1553: - Attachment: YARN-1553.000.patch Do not use HttpConfig.isSecure() in YARN Key: YARN-1553 URL: https://issues.apache.org/jira/browse/YARN-1553 Project: Hadoop YARN Issue Type: Bug Reporter: Haohui Mai Attachments: YARN-1553.000.patch HDFS-5305 and related jira decide that each individual project will have their own configuration on http policy. {{HttpConfig.isSecure}} is a global static method which does not fit the design anymore. The same functionality should be moved into the YARN code base. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1549) TestUnmanagedAMLauncher#testDSShell fails in trunk
[ https://issues.apache.org/jira/browse/YARN-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1549: -- Attachment: YARN-1549.1.patch Same patch with better formatting, reordered logic for better readability. Will check it if Jenkins says okay.. TestUnmanagedAMLauncher#testDSShell fails in trunk -- Key: YARN-1549 URL: https://issues.apache.org/jira/browse/YARN-1549 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.2.0 Reporter: Ted Yu Assignee: haosdent Attachments: YARN-1549.1.patch, YARN-1549.patch The following error is reproducible: {code} testDSShell(org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher) Time elapsed: 14.911 sec ERROR! java.lang.RuntimeException: Failed to receive final expected state in ApplicationReport, CurrentState=RUNNING, ExpectedStates=FINISHED,FAILED,KILLED at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.monitorApplication(UnmanagedAMLauncher.java:447) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.run(UnmanagedAMLauncher.java:352) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.testDSShell(TestUnmanagedAMLauncher.java:147) {code} See https://builds.apache.org/job/Hadoop-Yarn-trunk/435 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing
[ https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859227#comment-13859227 ] Hudson commented on YARN-1121: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4941 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4941/]) YARN-1121. Addendum patch. Fixed AsyncDispatcher hang issue during stop due to a race condition caused by the previous patch. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1554344) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java RMStateStore should flush all pending store events before closing - Key: YARN-1121 URL: https://issues.apache.org/jira/browse/YARN-1121 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1121.1.patch, YARN-1121.10.patch, YARN-1121.11.patch, YARN-1121.2.patch, YARN-1121.2.patch, YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, YARN-1121.6.patch, YARN-1121.7.patch, YARN-1121.8.patch, YARN-1121.9.patch on serviceStop it should wait for all internal pending events to drain before stopping. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859231#comment-13859231 ] Karthik Kambatla commented on YARN-1492: Comments on the design: # In the client protocol, if a cleaner instance (or run) starts after R2 and before R2', the client wouldn't know of this cleaner's existence. # Dangling cleaner locks: Using ZK here would probably make it easier to handle these dangling locks. If the Cleaner crashes, the corresponding connection to ZK is severed, and all locks are automatically cleaned up (if using ephemeral nodes). As others have mentioned earlier, I think it is okay to assume one ZK quorum running. For instance, RM HA requires this. # We should probably mandate running CleanerService if shared-cache is enabled, and should run as part of the RM and periodically. truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1549) TestUnmanagedAMLauncher#testDSShell fails in trunk
[ https://issues.apache.org/jira/browse/YARN-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859236#comment-13859236 ] Hadoop QA commented on YARN-1549: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620892/YARN-1549.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2765//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2765//console This message is automatically generated. TestUnmanagedAMLauncher#testDSShell fails in trunk -- Key: YARN-1549 URL: https://issues.apache.org/jira/browse/YARN-1549 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.2.0 Reporter: Ted Yu Assignee: haosdent Attachments: YARN-1549.1.patch, YARN-1549.patch The following error is reproducible: {code} testDSShell(org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher) Time elapsed: 14.911 sec ERROR! java.lang.RuntimeException: Failed to receive final expected state in ApplicationReport, CurrentState=RUNNING, ExpectedStates=FINISHED,FAILED,KILLED at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.monitorApplication(UnmanagedAMLauncher.java:447) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.run(UnmanagedAMLauncher.java:352) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.testDSShell(TestUnmanagedAMLauncher.java:147) {code} See https://builds.apache.org/job/Hadoop-Yarn-trunk/435 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1549) TestUnmanagedAMLauncher#testDSShell fails in trunk
[ https://issues.apache.org/jira/browse/YARN-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859276#comment-13859276 ] haosdent commented on YARN-1549: [~vinodkv] Thank you very much for your review. TestUnmanagedAMLauncher#testDSShell fails in trunk -- Key: YARN-1549 URL: https://issues.apache.org/jira/browse/YARN-1549 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.2.0 Reporter: Ted Yu Assignee: haosdent Attachments: YARN-1549.1.patch, YARN-1549.patch The following error is reproducible: {code} testDSShell(org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher) Time elapsed: 14.911 sec ERROR! java.lang.RuntimeException: Failed to receive final expected state in ApplicationReport, CurrentState=RUNNING, ExpectedStates=FINISHED,FAILED,KILLED at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.monitorApplication(UnmanagedAMLauncher.java:447) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.run(UnmanagedAMLauncher.java:352) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.testDSShell(TestUnmanagedAMLauncher.java:147) {code} See https://builds.apache.org/job/Hadoop-Yarn-trunk/435 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859336#comment-13859336 ] Sunil G commented on YARN-1408: --- Hi Devaraj As per your comments, I have made the changes. 1. Need to handle the invalid transition and during the transition container to be removed from ContainerAllocationExpirer to avoid the timeout. [Sunil]: When we remove this extra preempted container from the newlyAllocatedContainers, the invalid transition got handled. Because, when heartbeat comes, this extra container will not be there in newlyAllocatedContainers and hence ACQUIRED event will not be fired at this container. 2. In the patch, trying to remove from newlyAllocatedContainers. This can be removed directly from newlyAllocatedContainers using java.util.List.remove(Object o), instead of iterating, checking and then removing. [Sunil]: Yes, i changed it by removing directly from the list 3. Can you also add test to demonstrate this case. [Sunil]: Change has done to remove an element from the newlyAllocatedContainers. There are no functions added. Now the verification is done by manual testing to ensure the removal is performed. Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins -- Key: YARN-1408 URL: https://issues.apache.org/jira/browse/YARN-1408 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Fix For: 2.2.0 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.patch Capacity preemption is enabled as follows. * yarn.resourcemanager.scheduler.monitor.enable= true , * yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy Queue = a,b Capacity of Queue A = 80% Capacity of Queue B = 20% Step 1: Assign a big jobA on queue a which uses full cluster capacity Step 2: Submitted a jobB to queue b which would use less than 20% of cluster capacity JobA task which uses queue b capcity is been preempted and killed. This caused below problem: 1. New Container has got allocated for jobA in Queue A as per node update from an NM. 2. This container has been preempted immediately as per preemption. Here ACQUIRED at KILLED Invalid State exception came when the next AM heartbeat reached RM. ERROR org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: ACQUIRED at KILLED This also caused the Task to go for a timeout for 30minutes as this Container was already killed by preemption. attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-1408: -- Attachment: Yarn-1408.2.patch Updated as per comments Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins -- Key: YARN-1408 URL: https://issues.apache.org/jira/browse/YARN-1408 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Fix For: 2.2.0 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.patch Capacity preemption is enabled as follows. * yarn.resourcemanager.scheduler.monitor.enable= true , * yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy Queue = a,b Capacity of Queue A = 80% Capacity of Queue B = 20% Step 1: Assign a big jobA on queue a which uses full cluster capacity Step 2: Submitted a jobB to queue b which would use less than 20% of cluster capacity JobA task which uses queue b capcity is been preempted and killed. This caused below problem: 1. New Container has got allocated for jobA in Queue A as per node update from an NM. 2. This container has been preempted immediately as per preemption. Here ACQUIRED at KILLED Invalid State exception came when the next AM heartbeat reached RM. ERROR org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: ACQUIRED at KILLED This also caused the Task to go for a timeout for 30minutes as this Container was already killed by preemption. attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.1.5#6160)