[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503212#comment-16503212
 ] 

Weiwei Yang commented on YARN-8320:
-----------------------------------

Hi [~miklos.szeg...@cloudera.com]

I mean by letting user setup both #vcore and #cpus in their resource request is 
too complex. Even for phase 1, if only EXCLUSIVE mode is supported, for example:

{noformat}
NM:
  #vcore: 100
  #cpu: 10
{noformat}

User want to use exclusive, so the request must be like

{noformat}
Request1:
  #vcore: 10 * N
  #cpu: N (0<N<=10)
{noformat}

if {{#vcore < 10 * N}}, that means some cpu is wasted.  If user sets this to

{noformat}
Request2:
  #vcore: 80
  #cpu: 9
{noformat}

after allocation, NM capacity left

{noformat}
NM:
  #vcore: 20
  #cpu: 1
{noformat}

now when a #vcore=20 container landed on this node, it can only get 10% cputime 
(instead of 20%) since 9 cpus are already occupied by request2. This is not 
expected. And if you think about RESERVED/SHARED mode, it will be more complex. 
User will not able to know how many number of cpus to specify in their request 
to achieve a RESERVED/SHARED mode cpu sharing.

Does this make sense?
Thanks



 

> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> -------------------------------------------------------------------
>
>                 Key: YARN-8320
>                 URL: https://issues.apache.org/jira/browse/YARN-8320
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Jiandan Yang 
>            Priority: Major
>         Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to