[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707359#comment-16707359 ]
Zhankun Tang edited comment on YARN-9060 at 12/3/18 3:27 PM: ------------------------------------------------------------- [~leftnoteasy] , Let's first see the bug(YARN-9073) we involve in current implementation like GPU/FPGA. {code:java} Scenario: One host has 1,2,3,4,5,6. And "GPU.allowed = 1,2,3" configured in c-e.cfg. But yarn-site.xml configured "auto" which means allow 1,2,3,4,5,6. And one application request 4 GPU, the scheduler allocated 1,2,4,5. So --excluded-gpus is "3". And c-e will check that 3 is in allowed list(1,2,3) and then only deny 3 in cgroups. In this case, c-e's allowed-list (1,2,3) doesn't work because the application can access 4,5,6 now. {code} It seems that if we passed allowed devices from Java layer (1,2,4,5) and check it with "GPU.allowed"(1,2,3) should solve this issue. In this case, it does solve the bug. The 4 and 5 is not in (1,2,3) and will throw an error. But another bug still exists. Still, use an example, assume one host has (1,2,3,4,5,6). And "GPU.allowed=1,2,3,4" configured in c-e.cfg. yarn-site.xml indicates devices(1,2,3) can be scheduled. An application request 2 devices, java layers allowed devices are (1). Denied devices will be (2,3). Both (1) and (2,3) are in configured allowed devices. But the application can actually consume (4,5,6). *The root cause of these bugs* is that the c-e cannot know the exact devices to deny based on "GPU.allowed" and java layer excluded GPUs. To avoid the above bugs, we can use below solutions. The configuration in c-e.cfg is as follows. We use "denied-numbers" to let the administrator define what is not permitted exactly. The original "devices.allowed-numbers" can exist but is unnecessary once we use denied-numbers. Better to remove it. {code:java} [devices] module.enabled=true device.allowed-numbers=8:32 # this will be unnecessary. devices.denied-numbers=8:48,8:16 #comma separated major:minor. Empty means allow default devices reported by device plugin.{code} The CLI options are as below: {code:java} c-e --module-devices \ --excluded_devices b-8:32-rwm \ --allowed_devices 8:16,8:48 \ --container_id container_x_y {code} The "devices.denied" in c-e.cfg is a blacklist that will be added(no duplicate update) to cgroup "devices.deny" just like the handling of "–excluded_devices" values. In the above examples, the value of "–allowed_devices" passed from java layer is checked against "devices.denied-numbers" to see if any devices want by Java layer are invalid. Will report error if found. Without this "–allowed_devices" check and error threw, a bug will exist (all devices are (1,2,3). "devices.denied-numbers" is 3, an app request 2 devices, scheduler allocated (1,3). The value of "–excluded_devies" is 2, (2,3) are updated to cgroups. And the app can only use 1 device which is less than expected. When we have --allowed_devices, (1,3) contains denied value 3 configured in c-e.cfg and will report an error to avoid the bug). was (Author: tangzhankun): [~leftnoteasy] , Let's first see the bug(YARN-9073) we involve in current implementation as GPU/FPGA. {code:java} Scenario: One host has 1,2,3,4,5,6. And "GPU.allowed = 1,2,3" configured in c-e.cfg. But yarn-site.xml configured "auto" which means allow 1,2,3,4,5,6. And one application request 4 GPU, the scheduler allocated 1,2,4,5. So --excluded-gpus is "3". And c-e will check that 3 is in allowed list(1,2,3) and then only deny 3 in cgroups. In this case, c-e's allowed-list (1,2,3) doesn't work because the application can access 4,5,6 now. {code} It seems that if we passed allowed devices from Java layer (1,2,4,5) and check it with "GPU.allowed"(1,2,3) should solve this issue. In this case, it does solve the bug. The 4 and 5 is not in (1,2,3) and will throw an error. But another bug still exists. Still, use an example, assume one host has (1,2,3,4,5,6). And "GPU.allowed=1,2,3,4" configured in c-e.cfg. yarn-site.xml indicates devices(1,2,3) can be scheduled. An application request 2 devices, java layers allowed devices are (1). Denied devices will be (2,3). Both (1) and (2,3) are in configured allowed devices. But the application can actually consume (4,5,6). *The root cause of these bugs* is that the c-e cannot know the exact devices to deny based on "GPU.allowed" and java layer excluded GPUs. To avoid the above bugs, we can use below solutions. The configuration in c-e.cfg is as follows. We use "denied-numbers" to let the administrator define what is not permitted exactly. The original "devices.allowed-numbers" can exist but is unnecessary once we use denied-numbers. Better to remove it. {code:java} [devices] module.enabled=true device.allowed-numbers=8:32 # this will be unnecessary. devices.denied-numbers=8:48,8:16 #comma separated major:minor. Empty means allow default devices reported by device plugin.{code} The CLI options are as below: {code:java} c-e --module-devices \ --excluded_devices b-8:32-rwm \ --allowed_devices 8:16,8:48 \ --container_id container_x_y {code} The "devices.denied" in c-e.cfg is a blacklist that will be added(no duplicate update) to cgroup "devices.deny" just like the handling of "–excluded_devices" values. In the above examples, the value of "–allowed_devices" passed from java layer is checked against "devices.denied-numbers" to see if any devices want by Java layer are invalid. Will report error if found. Without this "–allowed_devices" check and error threw, a bug will exist (all devices are (1,2,3). "devices.denied-numbers" is 3, an app request 2 devices, scheduler allocated (1,3). The value of "–excluded_devies" is 2, (2,3) are updated to cgroups. And the app can only use 1 device which is less than expected. When we have --allowed_devices, (1,3) contains denied value 3 configured in c-e.cfg and will report an error to avoid the bug). > [YARN-8851] Phase 1 - Support device isolation in native container-executor > --------------------------------------------------------------------------- > > Key: YARN-9060 > URL: https://issues.apache.org/jira/browse/YARN-9060 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Zhankun Tang > Assignee: Zhankun Tang > Priority: Major > Attachments: YARN-9060-trunk.001.patch, YARN-9060-trunk.002.patch > > > Due to the cgroups v1 implementation policy in linux kernel, we cannot update > the value of the device cgroups controller unless we have the root permission > ([here|https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/security/device_cgroup.c#L604]). > So we need to support this in container-executor for Java layer to invoke. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org