[ 
https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1449:
-----------------------------

    Attachment: yarn-1449.4.patch

[~sandyr], I've updated my code according to your comments (thanks again!). But 
there're some places I haven't changed.

{quote}
{code}
+    // first add all decreased containers to failed, we will support it in the
+    // future
{code}
What is it that we will support in the future?
{quote}

Originally, I planned not support container decrease, so I put all decrease 
requests directly to failed list. This patch included decrease requests support 
and such decreased containers will be added to next node heartbeat sent to RM.

{quote}
{code}
+public class ContainerChangeMonitoringEvent extends ContainersMonitorEvent {
+  private final long vmemLimit;
+  private final long pmemLimit;
{code}
No need to calculate / include the vmem here. The containers monitor should 
know about it.
{quote}

I think we needed them. Originally this is created by 
ContainerImpl.LaunchTransition, and ContainersMonitoring will not care about 
this, this is passed in to CsM by ContainerStartMonitoringEvent. I just follow 
the same way of what ContainerStartMonitoringEvent does.

{quote}
Unnecessary change
{code}
-  private static class ProcessTreeInfo {
+  static class ProcessTreeInfo {
{code}
{quote}

This is used for testing, the easiest way to test ContainersMonitoring is 
change pmem/vmem by setting fields in ResourceCalculatorProcessTree. We can get 
ProcessTreeInfo in trackingContainers of ContainersMonitor, but we cannot use 
this class because its private. I don't if I should propose another jira for 
this, but I think this can make our easier to write tests for 
ContainersMonitoring.

{quote}
{code}
+  public void testResourceChange() throws Exception {
{code}
It looks like this test relies on sleeps to wait for events to be handled. This 
both makes the test run longer and can cause flaky failures. I think there is a 
way to actually wait for the events to be completed. I don't know it off the 
top of my head but can look it up for you if that would be helpful.
{quote}

I've thinked about this but not come up with a better way to do it. This will 
be failed only in some extremely bad cases (like very low resource available). 
I've set YarnConfiguration.NM_CONTAINER_MON_INTERVAL_MS to 20ms in test, and 
200ms seems enough for ContainersMonitor react to bad containers. Please let me 
know if you have any other ideas!

And any thoughts about this patch?

> Protocol changes and implementations in NM side to support change container 
> resource
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-1449
>                 URL: https://issues.apache.org/jira/browse/YARN-1449
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.2.0
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch
>
>
> As described in YARN-1197, we need add API/implementation changes,
> 1) Add a "changeContainersResources" method in ContainerManagementProtocol
> 2) Can get succeed/failed increased/decreased containers in response of 
> "changeContainersResources"
> 3) Add a "new decreased containers" field in NodeStatus which can help NM 
> notify RM such changes
> 4) Added changeContainersResources implementation in ContainerManagerImpl
> 5) Added changes in ContainersMonitorImpl to support change resource limit of 
> containers



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to