[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk
[ https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925147#comment-13925147 ] Tsuyoshi OZAWA commented on YARN-1591: -- [~vinodkv], can you check it? TestResourceTrackerService fails randomly on trunk -- Key: YARN-1591 URL: https://issues.apache.org/jira/browse/YARN-1591 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1591.1.patch As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1789) ApplicationSummary does not escape newlines in the app name
[ https://issues.apache.org/jira/browse/YARN-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA reassigned YARN-1789: Assignee: Tsuyoshi OZAWA ApplicationSummary does not escape newlines in the app name --- Key: YARN-1789 URL: https://issues.apache.org/jira/browse/YARN-1789 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.3.0 Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Priority: Minor Labels: newbie YARN-side of MAPREDUCE-5778. ApplicationSummary is not escaping newlines in the app name. This can result in an application summary log entry that spans multiple lines when users are expecting one-app-per-line output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1793) yarn application -kill doesn't kill UnmanagedAMs
[ https://issues.apache.org/jira/browse/YARN-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1793: --- Attachment: yarn-1793-3.patch Thanks Jian and Bikas. Here is an updated patch that incorporates your suggestions. yarn application -kill doesn't kill UnmanagedAMs Key: YARN-1793 URL: https://issues.apache.org/jira/browse/YARN-1793 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: yarn-1793-0.patch, yarn-1793-1.patch, yarn-1793-2.patch, yarn-1793-3.patch Trying to kill an Unmanaged AM though CLI (yarn application -kill id) logs a success, but doesn't actually kill the AM or reclaim the containers allocated to it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1793) yarn application -kill doesn't kill UnmanagedAMs
[ https://issues.apache.org/jira/browse/YARN-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925349#comment-13925349 ] Hadoop QA commented on YARN-1793: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633609/yarn-1793-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3306//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3306//console This message is automatically generated. yarn application -kill doesn't kill UnmanagedAMs Key: YARN-1793 URL: https://issues.apache.org/jira/browse/YARN-1793 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: yarn-1793-0.patch, yarn-1793-1.patch, yarn-1793-2.patch, yarn-1793-3.patch Trying to kill an Unmanaged AM though CLI (yarn application -kill id) logs a success, but doesn't actually kill the AM or reclaim the containers allocated to it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1619) Add cli to kill yarn container
[ https://issues.apache.org/jira/browse/YARN-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925389#comment-13925389 ] Xuan Gong commented on YARN-1619: - https://issues.apache.org/jira/browse/YARN-445 is the parent ticket for adding ability to signal containers. Let us start from here. Close this as duplicate Add cli to kill yarn container -- Key: YARN-1619 URL: https://issues.apache.org/jira/browse/YARN-1619 Project: Hadoop YARN Issue Type: New Feature Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.4.0 It will be useful to have a generic cli tool to kill containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-1619) Add cli to kill yarn container
[ https://issues.apache.org/jira/browse/YARN-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong resolved YARN-1619. - Resolution: Duplicate Add cli to kill yarn container -- Key: YARN-1619 URL: https://issues.apache.org/jira/browse/YARN-1619 Project: Hadoop YARN Issue Type: New Feature Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.4.0 It will be useful to have a generic cli tool to kill containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1764) Handle RM fail overs after the submitApplication call.
[ https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1764: Attachment: YARN-1764.2.patch Handle RM fail overs after the submitApplication call. -- Key: YARN-1764 URL: https://issues.apache.org/jira/browse/YARN-1764 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1764.1.patch, YARN-1764.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1764) Handle RM fail overs after the submitApplication call.
[ https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925427#comment-13925427 ] Xuan Gong commented on YARN-1764: - bq. Can you add a log in YarnClientImpl when we retry the submission? DONE bq. Can you improvement the documentation of submitApp() API in ApplicationClientProtocol about the clients needing to retry when the specified exception happens? ADDED bq. Also add the exception to the documentation to the base protocol. ADDED bq. Document YarnClient's submit API that we automatically retry when this issue happens. ADDED bq. All the new files added in the patch have some formatting issues. FIXED bq. In both the test-cases, after the fail-over, we assert for the states that are not expected (assertFalse). Can we explicitly test for the cases that we expect (assertTrue) ? changed bq. I think we should also mark getApplicationReport() to be idempotent in this patch itself as RM can fail-over after submitApplication() returned but during a getApplicationReport(). We will need to add some tests for this too. ADDED Handle RM fail overs after the submitApplication call. -- Key: YARN-1764 URL: https://issues.apache.org/jira/browse/YARN-1764 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1764.1.patch, YARN-1764.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1764) Handle RM fail overs after the submitApplication call.
[ https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925445#comment-13925445 ] Hadoop QA commented on YARN-1764: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633623/YARN-1764.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3307//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3307//console This message is automatically generated. Handle RM fail overs after the submitApplication call. -- Key: YARN-1764 URL: https://issues.apache.org/jira/browse/YARN-1764 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1764.1.patch, YARN-1764.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1804) Signal container request delivery from client to resourcemanager
[ https://issues.apache.org/jira/browse/YARN-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1804: Description: It could include the following work items 1. Define the OS independent SignalContainerCMD enum commands. We will start with known requirements such as KILL. We can expand the list later. 2. Add a new method signalContainer to ApplicationClientProtocol. signalContainerRequest will include containerId as well as SignalContainerCMD. 3. Add signalContainer method to YarnClient and YarnClientImpl. 4. RM will deliver the request to the RMNode object that owns the container. 5. RM needs to have the proper authorization for the signal request. was: It could include the following work items 1. Define the OS independent SignalContainerCMD enum commands. We will start with known requirements such as QUIT. We can expand the list later. 2. Add a new method signalContainer to ApplicationClientProtocol. signalContainerRequest will include containerId as well as SignalContainerCMD. 3. Add signalContainer method to YarnClient and YarnClientImpl. 4. RM will deliver the request to the RMNode object that owns the container. 5. RM needs to have the proper authorization for the signal request. Signal container request delivery from client to resourcemanager Key: YARN-1804 URL: https://issues.apache.org/jira/browse/YARN-1804 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ming Ma Assignee: Xuan Gong It could include the following work items 1. Define the OS independent SignalContainerCMD enum commands. We will start with known requirements such as KILL. We can expand the list later. 2. Add a new method signalContainer to ApplicationClientProtocol. signalContainerRequest will include containerId as well as SignalContainerCMD. 3. Add signalContainer method to YarnClient and YarnClientImpl. 4. RM will deliver the request to the RMNode object that owns the container. 5. RM needs to have the proper authorization for the signal request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-893) Capacity scheduler allocates vcores to containers but does not report it in headroom
[ https://issues.apache.org/jira/browse/YARN-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima reassigned YARN-893: Assignee: Kenji Kikushima Capacity scheduler allocates vcores to containers but does not report it in headroom Key: YARN-893 URL: https://issues.apache.org/jira/browse/YARN-893 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Kenji Kikushima In non-DRF mode, it reports 0 vcores in the headroom but it allocates 1 vcore to containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-893) Capacity scheduler allocates vcores to containers but does not report it in headroom
[ https://issues.apache.org/jira/browse/YARN-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima updated YARN-893: - Attachment: YARN-893.patch DefaultResourceCalculator#computeAvailableContainers depends on memory only. I added a consideration for available vcores, and modified other methods to report vcore consumption for headroom. Capacity scheduler allocates vcores to containers but does not report it in headroom Key: YARN-893 URL: https://issues.apache.org/jira/browse/YARN-893 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Kenji Kikushima Attachments: YARN-893.patch In non-DRF mode, it reports 0 vcores in the headroom but it allocates 1 vcore to containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1799) Enhance LocalDirAllocator in NM to consider DiskMaxUtilization cutoff
[ https://issues.apache.org/jira/browse/YARN-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925465#comment-13925465 ] Sunil G commented on YARN-1799: --- Yes Karthik. Existing logic is not considering the previously assigned space for the same local directory assignment. This scenario will be exposed while many tasks were asking for the path at same time. For eg, Task1 may have granted with 100MB to write and started its writing, but may be within fraction of second Task2 came for a 80MB to write. Same path will be given Task2 based on Free space at that point of time. At this time, may be Task1 would have finished only 10MB of its write. And 90MB may be still to be written. This delta of 90MB is not considered while allocating space for Task2. A better approach might be to 'reserve' disk-space for a duration of time With this approach a small problem will be like calculating disk speed of write. time remaining to be derived for a task for its local write. And also to derive a factor how much is written and how much more to be written at a given time for a task. Given the disk write speed as a configuration (based on disk type, rpm etc), these factors can be derived. And allotted space for a task can also be considered. This can almost solve the problem. Please share your thoughts on this approach. Enhance LocalDirAllocator in NM to consider DiskMaxUtilization cutoff - Key: YARN-1799 URL: https://issues.apache.org/jira/browse/YARN-1799 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Sunil G LocalDirAllocator provides paths for all tasks for its local write. This considers the good list of directories which are selected by the HealthCheck mechamnism in LocalDirsHandlerService getLocalPathForWrite() considers whether input demand size can meet the capacity in lastAccessed directory. If more tasks asks for path from LocalDirAllocator, then it is possible that the allocation is done based on the current disk availability at that given time. But this path would have earlier given to some other tasks to write and they may be sequentially doing writing. It is better to check for an upper cutoff for disk availability -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1789) ApplicationSummary does not escape newlines in the app name
[ https://issues.apache.org/jira/browse/YARN-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925471#comment-13925471 ] Tsuyoshi OZAWA commented on YARN-1789: -- [~ajisakaa] and [~jlowe], I checked code around *Summary. IIUC, ApplicationSummary doesn't exist on YARN side, because it's a application-side class. On the other hand, LogAggregationService logs applications information, but it uses ApplicationId(not app name). It doesn't break one-app-per-line output manner, so we don't have points to fix. Please correct me if I get wrong. Thanks. ApplicationSummary does not escape newlines in the app name --- Key: YARN-1789 URL: https://issues.apache.org/jira/browse/YARN-1789 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.3.0 Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Priority: Minor Labels: newbie YARN-side of MAPREDUCE-5778. ApplicationSummary is not escaping newlines in the app name. This can result in an application summary log entry that spans multiple lines when users are expecting one-app-per-line output. -- This message was sent by Atlassian JIRA (v6.2#6252)