[jira] [Updated] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing
[ https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4837: -- Attachment: YARN-4837-branch-2.8.1.txt Updated patch fixing the javadoc issue. And checkstyles issues that can be addressed. The unit test failures are unrelated, pass on my local box and are tracked at YARN-5208 / HADOOP-12687. > User facing aspects of 'AM blacklisting' feature need fixing > > > Key: YARN-4837 > URL: https://issues.apache.org/jira/browse/YARN-4837 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: YARN-4837-20160515.txt, YARN-4837-20160520.1.txt, > YARN-4837-20160520.txt, YARN-4837-20160527.txt, YARN-4837-20160604.txt, > YARN-4837-branch-2.005.patch, YARN-4837-branch-2.8.1.txt, > YARN-4837-branch-2.8.txt > > > Was reviewing the user-facing aspects that we are releasing as part of 2.8.0. > Looking at the 'AM blacklisting feature', I see several things to be fixed > before we release it in 2.8.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing
[ https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4837: -- Attachment: YARN-4837-branch-2.8.txt Uploading a 2.8 patch - fixing conflicts, test-issues etc. > User facing aspects of 'AM blacklisting' feature need fixing > > > Key: YARN-4837 > URL: https://issues.apache.org/jira/browse/YARN-4837 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: YARN-4837-20160515.txt, YARN-4837-20160520.1.txt, > YARN-4837-20160520.txt, YARN-4837-20160527.txt, YARN-4837-20160604.txt, > YARN-4837-branch-2.005.patch, YARN-4837-branch-2.8.txt > > > Was reviewing the user-facing aspects that we are releasing as part of 2.8.0. > Looking at the 'AM blacklisting feature', I see several things to be fixed > before we release it in 2.8.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing
[ https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4837: - Attachment: YARN-4837-branch-2.005.patch Attached patch to branch-2 to trigger Jenkins build. > User facing aspects of 'AM blacklisting' feature need fixing > > > Key: YARN-4837 > URL: https://issues.apache.org/jira/browse/YARN-4837 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: YARN-4837-20160515.txt, YARN-4837-20160520.1.txt, > YARN-4837-20160520.txt, YARN-4837-20160527.txt, YARN-4837-20160604.txt, > YARN-4837-branch-2.005.patch > > > Was reviewing the user-facing aspects that we are releasing as part of 2.8.0. > Looking at the 'AM blacklisting feature', I see several things to be fixed > before we release it in 2.8.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing
[ https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4837: -- Attachment: YARN-4837-20160604.txt Updated patch against the latest trunk. [~kasha] bq. RMAppAttemptImpl#shouldCountTowardsNodeBlacklisting: For ContainerExitStatus.DISKS_FAILED, doesn't that mean at least one disk failed? And, the NM can continue running with remaining disks right? Is the idea that even if we schedule it to the same node, the NM wouldn't give the same local directory? If yes, should we clarify the comment accordingly? No, a container is marked with tContainerExitStatus.DISKS_FAILED means that the node is already be marked unhealthy given that most of the disks failed. So, no more containers will be scheduled on that node. Edited the comment for more clarity to reflect the same. > User facing aspects of 'AM blacklisting' feature need fixing > > > Key: YARN-4837 > URL: https://issues.apache.org/jira/browse/YARN-4837 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: YARN-4837-20160515.txt, YARN-4837-20160520.1.txt, > YARN-4837-20160520.txt, YARN-4837-20160527.txt, YARN-4837-20160604.txt > > > Was reviewing the user-facing aspects that we are releasing as part of 2.8.0. > Looking at the 'AM blacklisting feature', I see several things to be fixed > before we release it in 2.8.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing
[ https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4837: -- Attachment: YARN-4837-20160527.txt Updated patch fixing the conflicts. Can't fix the checkstyle warning - Method length is 174 lines, and the test-failures are unrelated. > User facing aspects of 'AM blacklisting' feature need fixing > > > Key: YARN-4837 > URL: https://issues.apache.org/jira/browse/YARN-4837 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: YARN-4837-20160515.txt, YARN-4837-20160520.1.txt, > YARN-4837-20160520.txt, YARN-4837-20160527.txt > > > Was reviewing the user-facing aspects that we are releasing as part of 2.8.0. > Looking at the 'AM blacklisting feature', I see several things to be fixed > before we release it in 2.8.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing
[ https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4837: -- Attachment: YARN-4837-20160520.1.txt Updated patch fixing the checkstyle issues and the two tests broken by the patch - TestAppSchedulingInfo & TestFSAppAttempt. > User facing aspects of 'AM blacklisting' feature need fixing > > > Key: YARN-4837 > URL: https://issues.apache.org/jira/browse/YARN-4837 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: YARN-4837-20160515.txt, YARN-4837-20160520.1.txt, > YARN-4837-20160520.txt > > > Was reviewing the user-facing aspects that we are releasing as part of 2.8.0. > Looking at the 'AM blacklisting feature', I see several things to be fixed > before we release it in 2.8.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing
[ https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4837: -- Attachment: YARN-4837-20160520.txt Here's an updated patch with fixes and new test. [~rohithsharma] bq. One suggestion is can default value for threshold reduce to less than 50%? bq. 1) Should we disable am-blacklisting by default? I would like to tackle both of these as part of YARN-4685 so that others also can see. bq. 2. I am little bit confused with naming convention for blacklist with placesBlacklist. And is there any plan to support blacklist racks in the future? Yes, that's the idea. In other parts of the code, this list gets passed along to filter both nodes and racks. [~leftnoteasy] Regarding renames, I've included the ones you pointed out. There are lot more to be done, but I deliberately avoided them given the current size of the patch. Addressed other comments. bq. 6) ResourceBlacklistRequest -> (Resource)Place(ment)BlacklistRequest? This is public API, cannot rename it now. Created a new TestNodeBlacklistingOnAMFailures, moved existing tests from TestAMRestart to this new class file. testAMBlacklistPreventsRestartOnSameNodeForMinicluster() is a bogus test, removed it. [~sunilg] bq. 1. yarn.resourcemanager.am-scheduling.node-blacklisting-enabled and yarn.resourcemanager.am-scheduling.node-blacklisting-disable-threshold to be added in yarn-default.xml. Again, I deliberately deleted them for now. I'd like to discuss their re-addition as part of the outcome for YARN-4685. > User facing aspects of 'AM blacklisting' feature need fixing > > > Key: YARN-4837 > URL: https://issues.apache.org/jira/browse/YARN-4837 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: YARN-4837-20160515.txt, YARN-4837-20160520.txt > > > Was reviewing the user-facing aspects that we are releasing as part of 2.8.0. > Looking at the 'AM blacklisting feature', I see several things to be fixed > before we release it in 2.8.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing
[ https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4837: -- Attachment: YARN-4837-20160515.txt Here's a (longish) patch that addresses some of my proposal above - This still doesn't solve the other bugs like YARN-4685 that can happen with the blacklisting logic. There are a bunch of UI bugs too (callers of BlacklistManager.getBlacklistUpdates() only look at the additions), sigh. - Renamed the configuration names, but marked all the new configurations private till we figure the rest of the story with the bugs. - Removed the user facing APIs and related classes: AMBlackListingRequest, AMBlackListingRequestPBImpl.java, BlacklistUpdates.java (instead using ResourceBlacklistRequest directly), AMBlackListingRequestInfo.java - Removed the references to above from ApplicationSubmissionContext, yarn_protos.proto, ApplicationSubmissionContextPBImpl.java, RMAppImpl.java, ApplicationSubmissionContextInfo.java - The above two essentially revert YARN-4389. - Changed RMAppAttemptImpl to explicitly encode container exit-statuses in deciding when to blacklist nodes - Made a bunch of internal renames to better reflect what they are doing. > User facing aspects of 'AM blacklisting' feature need fixing > > > Key: YARN-4837 > URL: https://issues.apache.org/jira/browse/YARN-4837 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: YARN-4837-20160515.txt > > > Was reviewing the user-facing aspects that we are releasing as part of 2.8.0. > Looking at the 'AM blacklisting feature', I see several things to be fixed > before we release it in 2.8.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing
[ https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4837: -- Target Version/s: 2.8.0 Priority: Critical (was: Major) This must go into 2.8.0, marking so.. > User facing aspects of 'AM blacklisting' feature need fixing > > > Key: YARN-4837 > URL: https://issues.apache.org/jira/browse/YARN-4837 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > > Was reviewing the user-facing aspects that we are releasing as part of 2.8.0. > Looking at the 'AM blacklisting feature', I see several things to be fixed > before we release it in 2.8.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org