[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503212#comment-16503212 ]
Weiwei Yang commented on YARN-8320: ----------------------------------- Hi [~miklos.szeg...@cloudera.com] I mean by letting user setup both #vcore and #cpus in their resource request is too complex. Even for phase 1, if only EXCLUSIVE mode is supported, for example: {noformat} NM: #vcore: 100 #cpu: 10 {noformat} User want to use exclusive, so the request must be like {noformat} Request1: #vcore: 10 * N #cpu: N (0<N<=10) {noformat} if {{#vcore < 10 * N}}, that means some cpu is wasted. If user sets this to {noformat} Request2: #vcore: 80 #cpu: 9 {noformat} after allocation, NM capacity left {noformat} NM: #vcore: 20 #cpu: 1 {noformat} now when a #vcore=20 container landed on this node, it can only get 10% cputime (instead of 20%) since 9 cpus are already occupied by request2. This is not expected. And if you think about RESERVED/SHARED mode, it will be more complex. User will not able to know how many number of cpus to specify in their request to achieve a RESERVED/SHARED mode cpu sharing. Does this make sense? Thanks > [Umbrella] Support CPU isolation for latency-sensitive (LS) service > ------------------------------------------------------------------- > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager > Reporter: Jiandan Yang > Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org