Peter Szucs created YARN-11733:
----------------------------------

             Summary: Fix the order of updating CPU controls with cgroup v1
                 Key: YARN-11733
                 URL: https://issues.apache.org/jira/browse/YARN-11733
             Project: Hadoop YARN
          Issue Type: Bug
          Components: yarn
            Reporter: Peter Szucs
            Assignee: Peter Szucs


After YARN-11674 (Update CpuResourceHandler implementation for cgroup v2 
support) the order of updating cpu.cfs_period_us and cpu.cfs_quota_us controls 
have changed which can cause the below errors when launching containers with 
CPU limits on cgroupv1:
 
{code:java}
PrintWriter unable to write to 
/var/cgroupv1/cpu/hadoop-yarn/container_e02_1727079571170_0040_02_000001/cpu.cfs_quota_us
 with value: 112500{code}
 

*Reproduction:*

I set CPU limits on yarn-site.xml for cgroup:
{code:java}
yarn.nodemanager.resource.percentage-physical-cpu-limit: 90
yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage: 
true{code}
After that the limits were applied on the hadoop-yarn root hierarchy:

 
{code:java}
root@pszucs-test-2 hadoop-yarn]# cat cpu.cfs_period_us 1000000
root@pszucs-test-2 hadoop-yarn]# cat cpu.cfs_quota_us 900000
{code}
 

When I tried to launch a container it gave me the following error:
{code:java}
PrintWriter unable to write to 
/var/cgroupv1/cpu/hadoop-yarn/container_e02_1727079571170_0040_02_000001/cpu.cfs_quota_us
 with value: 112500{code}
It is because the container tries to exceed the limit defined at higher level 
with the 112 500 value for cfs_quota_us. If I try to create a test cgroup 
manually and try to update this control it lets me to do that up to the value 
of 90 000 as well:
{code:java}
[root@pszucs-test-2 hadoop-yarn]# cat test/cpu.cfs_period_us
100000
[root@pszucs-test-2 hadoop-yarn]# echo "90001" > test/cpu.cfs_quota_us
-bash: echo: write error: Invalid argument
[root@pszucs-test-2 hadoop-yarn]# echo "90000" > test/cpu.cfs_quota_us{code}
 

*Solution:*

The cause for this issue is that the cfs_period_us control get the default 
value of 100 000 when a new cgroup is created, but when YARN calculates the 
limit, it uses 1 000 000 for that. Because of this we need to update 
cpu.cfs_period_us before cpu.cfs_quota_us, to keep the ratio between the 2 
values and not to overcome the limit defined at parent level.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to