[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

Bibin A Chundatt (JIRA) Mon, 04 Jun 2018 02:21:26 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499938#comment-16499938
 ]


Bibin A Chundatt commented on YARN-8320:
----------------------------------------

Thank you [~cheersyang]/[~yangjiandan] for design doc

{quote}
When a container’s cpu_share_mode is EXCLUSIVE/RESERVED, the number of allocated
processor  allocateProcessorNum =  container_vcore / Vcore_Ratio,  request 
will be
rejected if allocateProcessorNum <= 0;
{quote}
# IIUC If we don't  have slots to bind container will be rejecting the 
container start request. Which will be considered as failed. Scheduler could 
again allocate container to same nodemanager rt ??
# When nm processors/ nm vcores < 1  and share mode have you considered 
*strictness per containers* ?? ie using the periods and quota also along with 
Cpuset assignment ?? If no other process is using  cpu then process will be 
consuming more than what its supposed to rt ??

Thoughts on having CpuBindHandlerImpl includes 2 Allocators for cgroups 
subgroups one for cpu and another for cpuset?

Could you also consider the following in design

# Using fixed set of folders for assignment in Allocator (Reduce overload of 
creation and deletion on containers.)
# Resource calculation could go wrong incase of preemption of  containers rt . 
kill reject could get processed after container start.



> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> -------------------------------------------------------------------
>
>                 Key: YARN-8320
>                 URL: https://issues.apache.org/jira/browse/YARN-8320
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Jiandan Yang 
>            Priority: Major
>         Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

Reply via email to