[ 
https://issues.apache.org/jira/browse/YARN-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310276#comment-16310276
 ] 

Miklos Szegedi commented on YARN-7693:
--------------------------------------

Thank you for the reply [~yangjiandan].
+0 on the approach adding a separarate monitor class for this. I think it is 
useful to be able to change the monitor.
In terms of the feature you described I have some suggestions, you may want to 
consider.
First of all please consider using a JIRA feature for your project making this 
as a sub-task. How about doing this as part of YARN-1747 or even better 
YARN-1011?
You may want to leverage the option to simply turn off the current cgroups 
memory enforcement using the configuration added in YARN-7064. It also handles 
monitoring resource utilization using cgroups.
bq. 1) Separate containers into two different group Opportunistic_Group and 
Guaranteed_Group under hadoop-yarn
The reason why it is useful to have a single cgroup hadoop-yarn for all 
containers that you can set a single logic and control the OOM killer for all. 
I would be happy to look at the actual code, but adjusting two different 
cgroups may add too much complexity. It is especially problematic in case of 
promotion. When an opportunistic container is promoted to guaranted, you need 
to move to the other cgroup but this requires heavy lifting from the kernel 
that takes significant time. See 
https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt for details.
bq. 2) Monitor system resource utilization and dynamically adjust resource of 
Opportunistic_Group
The concern here is that dynamically adjusting does not work in the current 
implementation either. This is because it is too slow to respond in extreme 
cases. Please check out YARN-6677, YARN-4599 and YARN-1014. The idea there is 
to disable the OOM killer on hadoop-yarn as you also suggested, so that we get 
notified by the kernel when the system resource utilization is low. YARN can 
then decide which container to preempt or adjust the soft limit, while the 
containers are paused. The preemption unblocks the containers. Please let us 
know, if you have time and you would like to contribute.
bq. 3) Kill container only when adjust resource fail for given times
I absolutely agree with this. A sudden spike in cpu usage should not trigger 
immediate preemption. In case of memory I am not sure how much you can adjust 
though. My understanding is that the basic design of opportunistic containers 
is that they never affect the performance of guaranteed ones but using IO for 
swapping would exactly do that. How would you reduce memory usage if not 
preempting?

> ContainersMonitor support configurable
> --------------------------------------
>
>                 Key: YARN-7693
>                 URL: https://issues.apache.org/jira/browse/YARN-7693
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Jiandan Yang 
>            Assignee: Jiandan Yang 
>            Priority: Minor
>         Attachments: YARN-7693.001.patch, YARN-7693.002.patch
>
>
> Currently ContainersMonitor has only one default implementation 
> ContainersMonitorImpl,
> After introducing Opportunistic Container, ContainersMonitor needs to monitor 
> system metrics and even dynamically adjust Opportunistic and Guaranteed 
> resources in the cgroup, so another ContainersMonitor may need to be 
> implemented. 
> The current ContainerManagerImpl ContainersMonitorImpl direct new 
> ContainerManagerImpl, so ContainersMonitor need to be configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to