[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516290#comment-16516290 ] Miklos Szegedi commented on YARN-8320: -- [~cheersyang], thank you for the response. Indeed, I was assuming users will set some typical values for vcores(cores) that translate to cpu.shares and cpuset. These are like 0.2, 0.5 for shareable and 1.0 or 2.0 for exclusive. Values like 4.5 are too hard to understand indeed. Please go ahead with your approach. It would be useful to keep in mind that the system needs to be able to set both cpu.shares and cpuset dimensions some time in the future. > [Umbrella] Support CPU isolation for latency-sensitive (LS) service > --- > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505950#comment-16505950 ] Weiwei Yang commented on YARN-8320: --- Hi [~miklos.szeg...@cloudera.com] The solution you proposed looks like quite interesting, especially the idea about cpu.share. Please see my comments If use the "simple" approach you shared, 1. Such request {noformat} Request1: #vcores: 9*1024 {noformat} this breaks the basic semantic for vcore we've been using for years, this is core API level incompatible. 2. For the formula you gave, {noformat} #cpu.shares: (#vcores + 1023) % 1024 + 1 = 1024 {noformat} What if I specify #vcore=8*512 (=4 * 1024), what #cpu.shares and #cpuset will be? I don't think you can get 2 var result from 1 var input. If we consider #cpu.shares and #cpuset as resources, # NM and RM needs to know all info about physical processors, including their (virtual) shares, this introduces extra complexity for both NM and RM. Moreover, current resource API doesn't support such multidimensional resource. # Precise weighting sounds like an interesting idea, but I doubt if we really need that much. On our online systems, we don't really control in that fine-grained. # It will be hard for user to setup {{cpu.share}}, how would an user know what value is meaningful. And what if some user just set some too big or small value in their requests? They will get un-predictable results. We can have some more offline chats about this, thanks for bringing up the idea. Thanks. > [Umbrella] Support CPU isolation for latency-sensitive (LS) service > --- > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503850#comment-16503850 ] Miklos Szegedi commented on YARN-8320: -- [~cheersyang], it makes sense in general. I think what you are missing here is if you use non-strict mode with simple weights {{cpu.shares}}, those are applied per thread. Indeed there is a design issue in the existing {{CGroupsCPUResourcesHandler}} that it applies {{cpu.shares=vcores}} to each thread of a guaranteed container. Given this request in your example above, you will actually get 10 * N * N cpu time with the current code that is wrong. {code:java} Request1: #vcore: 10 * N #cpu: N (0 [Umbrella] Support CPU isolation for latency-sensitive (LS) service > --- > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503212#comment-16503212 ] Weiwei Yang commented on YARN-8320: --- Hi [~miklos.szeg...@cloudera.com] I mean by letting user setup both #vcore and #cpus in their resource request is too complex. Even for phase 1, if only EXCLUSIVE mode is supported, for example: {noformat} NM: #vcore: 100 #cpu: 10 {noformat} User want to use exclusive, so the request must be like {noformat} Request1: #vcore: 10 * N #cpu: N (0 [Umbrella] Support CPU isolation for latency-sensitive (LS) service > --- > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502128#comment-16502128 ] Miklos Szegedi commented on YARN-8320: -- Thanks [~cheersyang] for the reply. bq. It's not straightforward to set it as a resource in request, just like the example you pointed out. I am not sure what do you mean. I actually pointed out the opposite. It is not straightforward to set it as a single vcore resource request. > [Umbrella] Support CPU isolation for latency-sensitive (LS) service > --- > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501927#comment-16501927 ] Weiwei Yang commented on YARN-8320: --- Hi [~miklos.szeg...@cloudera.com]/[~bibinchundatt] Thanks for the comments, here are my thoughts {quote}I think is that cpu and cpuset are not the same resource in cgroups. {quote} Actually we don't want to say *cpuset* is a resource, it is an isolation technique against a certain amount of cpu resource. It's not straightforward to set it as a resource in request, just like the example you pointed out. The purpose of this Jira is to give user a way to isolate or partially share cpu resources between containers. For the 1st step when we support exclusive type, once the container is started, we will set cpu quota/shares for them to limit resource usage, and bind the container to same amount of processors for isolation. So we still prefer the "simplify" the path, the "resource" path doesn't seem to be clear to me. {quote}If we don't have slots to bind container will be rejecting the container start request. Which will be considered as failed. Scheduler could again allocate container to same nodemanager rt ?? {quote} Well when a container failed to start like this, in most cases it will not be able to start on any node. Because this means the #vcore in the requested is too small. We are trying to do some pre-check on such conditions to give a fail-fast approach, that should help. {quote}When nm processors/ nm vcores < 1 and share mode have you considered strictness per containers ?? ie using the periods and quota also along with Cpuset assignment ?? If no other process is using cpu then process will be consuming more than what its supposed to rt ?? {quote} Yes, cpu quota and shares will also be set for containers, that's the thing we depend on to limit the actual cpu usage for each containers. We just need to make sure the values set for them are reasonable when they have bind to certain processors. Thanks for pointing this out, a very good concern. {quote}Using fixed set of folders for assignment in Allocator (Reduce overload of creation and deletion on containers. {quote} That makes sense. {quote}Resource calculation could go wrong incase of preemption of containers rt . kill reject could get processed after container start. {quote} Resource calculation won't be wrong since we don't count cpuset as a resource. But when container is killed, we will need to make sure this is handled and cgroups gets updated correctly. Thanks for pointing this out, I will add some more details into next version of design doc. Thanks > [Umbrella] Support CPU isolation for latency-sensitive (LS) service > --- > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499938#comment-16499938 ] Bibin A Chundatt commented on YARN-8320: Thank you [~cheersyang]/[~yangjiandan] for design doc {quote} When a container’s cpu_share_mode is EXCLUSIVE/RESERVED, the number of allocated processor allocateProcessorNum = container_vcore / Vcore_Ratio, request will be rejected if allocateProcessorNum <= 0; {quote} # IIUC If we don't have slots to bind container will be rejecting the container start request. Which will be considered as failed. Scheduler could again allocate container to same nodemanager rt ?? # When nm processors/ nm vcores < 1 and share mode have you considered *strictness per containers* ?? ie using the periods and quota also along with Cpuset assignment ?? If no other process is using cpu then process will be consuming more than what its supposed to rt ?? Thoughts on having CpuBindHandlerImpl includes 2 Allocators for cgroups subgroups one for cpu and another for cpuset? Could you also consider the following in design # Using fixed set of folders for assignment in Allocator (Reduce overload of creation and deletion on containers.) # Resource calculation could go wrong incase of preemption of containers rt . kill reject could get processed after container start. > [Umbrella] Support CPU isolation for latency-sensitive (LS) service > --- > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494656#comment-16494656 ] Miklos Szegedi commented on YARN-8320: -- Thank you [~cheersyang] for the detailed response. The only thing that you are missing I think is that cpu and cpuset are not the same resource in cgroups. They are actually two dimensions of the CPU space. cpu,cpuacct controls in general how much time is allocated (one dimension) and cpuset controls how many physical devices are allocated (second dimension). cpu,cpuacct is a compressible, flexible resource more will almost always proportionally reduce the runtime if cpu bound. cpuset is is not flexible, it depends on the thread factor of the container. Just to use your example above: {code:java} I have a NM with capacity: memory: 10gb vcore: 10 cpus: 10 (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) Request with just vcore number (the container runs a single process and single thread !): memory: 1gb vcore: 5 After allocation, my NM capacity updates to memory: 9gb vcore: 5 cpus: 5 (0, 1, 2, 3, 4) WRONG(!) The process is single threaded, 4 cores are wasted. Request with both vcore number and cpus: memory: 1gb vcore(cputime): 5 cpuset: 1 After allocation, my NM capacity updates to memory: 9gb vcore(cputime): 5 cpus: 5 (0, 1, 2, 3, 4, 5, 6, 7, 8) GOOD The process is single threaded. {code} I understand that you would like to simplify the configuration. However, as you see in the example above the situation above will never be able to be solved by YARN anymore. This because of backward compatibility, if the current design is chosen. That being said, if you still would like to follow the simplified path, please go ahead, I just wanted to elaborate my concerns. > [Umbrella] Support CPU isolation for latency-sensitive (LS) service > --- > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493538#comment-16493538 ] Weiwei Yang commented on YARN-8320: --- Thanks [~miklos.szeg...@cloudera.com] for sharing your idea. You were right that the original idea was to make this easy to use. That says user doesn't need to know about what set of cpus their containers will be running on, and how they are configured. They just give us a cpu_share_mode, and we do all the tricks underneath without exposing too much details. My concern about the approach you suggested is # It might be complex for user to use # It should be able to support 2 modes but not very straightforward to support 4 modes Allow me take an example like following: {noformat} I have a NM with capacity: memory: 10gb vcore: 10 cpus: 10 (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) Request with just cpu number: memory: 1gb vcore: 5 cpuset: 5 After allocation, my NM capacity updates to memory: 9gb vcore: 5 cpus: 5 (0, 1, 2, 3, 4) {noformat} there are few problems with such approach # User might get confused how many cpus to apply in the resource request. Vcore as of today is already a difficult thing to set, adding a new type of resource might make this harder. # When #vcore is not same as #processor on NM, user will need do some calculation to set a reasonable cpuset value in order not to over/less use cpu resource, and this is hard for RM to check as it doesn't have all the info like NM did # Difficult to support all 4 modes under current resource APIs Please let me know if there is any wrong in this example and the comments. I agree we can start from supporting EXCLUSIVE+ANY mode in phase 1, but still want to make sure the design is able to extend to support both modes (because RESERVED/SHARE modes are very useful to improve utilizations). I will consolidate all the comments from you and [~leftnoteasy] and come up with a new version of design doc next week. Look forward for your comments always. Thanks > [Umbrella] Support CPU isolation for latency-sensitive (LS) service > --- > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493112#comment-16493112 ] Miklos Szegedi commented on YARN-8320: -- Thank you, [~cheersyang]. For the responses. They make sense to me in general. {quote}how many cpuset resource on a NM and how a AM to request? {quote} In general, this is an adapter code, passing on a cgroup functionality to another API. As such it can do two things. One is being transparent, the other is making the original API easier to use. You try to do the latter in your design, which makes sense. Being transparent however would mean letting the AM choose cpu resources controlling cpu,cpuacct and cpuset resources controlling cpuset separately. I would prefer the latter, since it is transparent keeping all functionality without restrictions and makes any future design easier to implement. cpuset would have as many processors as there are available in cpuset.cpus of the container root cgroup that is usually {{hadoop-yarn}}. Individual CPUs are chosen by NM based on the number of cpuset cpus granted by RM. However, I do not have a strong opinion about this. > [Umbrella] Support CPU isolation for latency-sensitive (LS) service > --- > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492301#comment-16492301 ] Weiwei Yang commented on YARN-8320: --- [~leftnoteasy], your suggestion makes sense to me, let us see if we can split this to phases. For basis user scenarios, exclusive mode might be good enough to start. Will also take a look YARN-7481. And update these in next design doc. Thanks for the comments ! > [Umbrella] Support CPU isolation for latency-sensitive (LS) service > --- > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491177#comment-16491177 ] Wangda Tan commented on YARN-8320: -- Thanks [~cheersyang], To me the benefit of exclusive mode is obvious and straightforward. For SHARED/RESERVED mode, if we plan to do this in phases, could we do the EXCLUSIVE mode first, harden the API, refactoring, etc first. Instead of working on too many changes. What's ur opinion on this? In addition to the JIRA I mentioned, there's an effort to add support to GPU affinity: YARN-7481. I don't quite like the proposed approach (bitmap to represent resources). But it might be good if we can build a common layer to support such use cases since for CPU exclusive usage, etc. we also want ACLs / resource accounting, etc. > [Umbrella] Support CPU isolation for latency-sensitive (LS) service > --- > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490165#comment-16490165 ] Weiwei Yang commented on YARN-8320: --- Hi [~miklos.szeg...@cloudera.com] Thanks for the comments, please see my points below bq. 1) and 3) it might make sense to use a separate resource type for this feature Extend resource type might not be straightforward for cpuset. From your suggestion, how can you define how many cpuset resource on a NM and how a AM to request? It's not a numeric value. The problem is cpuset is working on physical cores (processor) but Yarn manages vcores, and a processor can be shared by multiple containers. Hence we can hardly define "values" if we consider it as a resource. bq. 2) users might not need the RESERVED/SHARED modes This was my first impression too. But after I talked with [~yangjiandan] and some other folks who manage LS services, I change my mind. RESERVE/SHARE helps to improve the utilization and a key for mix-workload environment, that having batch tasks running along with services. It helps to resolve the problems like you mentioned in #6 and #7. bq. 4) The design lets the AM do a delayed exclusive request directly to the NM avoiding the RM. I think it would be more robust to request from the RM in the container launch context and just forward this to the NM. The RM has the chance to decline or delay the request in this case in the future. I agree. We are not figuring out a way to let RM play its role here, will try harder thinking about this. bq. 6) Let me mention that this feature negatively affects YARN-1011 and oversubscription. That's why we have RESERVED/SHARED mode, it allows a LS service to share its CPU with other tasks, including O containers (O containers will be using ANY mode). But if we set a container with EXCLUSIVE mode, then yes, this will occupied the CPU, this is the only way to ensure it runs completely isolated for such highly sensitive tasks. For our existing online services, most of them are using RESERVED or SHARE mode in order to improve the utilization (a typical mixed-workload scenario) bq. 5) how can you make sure a parent cgroup does not interfere with a cgroup marked as cpuset.cpu_exclusive=1? What if a system service wakes up? We are not going to set cpuset.cpu_exclusive=1, at least not in this version of design. We are trying to solve the problem about competing CPU resources between containers, not with system services. bq. 7) Also, latency sensitive applications get exclusive protection but can only be assigned to their cpuset disallowing bursts to other CPUs when needed. I do not know how to solve this though. Use SHARE mode. We have a lot of online services running under this mode, that allows it to use all processors except those assigned to EXCLUSIVE and RESERVED. bq. 8) mean that other container cgroups need to be changed and reduced every time a reserved container starts Correct. When we assign a processor to a container using RESERVED or EXCLUSIVE, then we need to remove it from rest of containers cgroup, this is briefly introduced in section 3.5 of the design doc. Hope it makes sense, looking forward to hear your feedback. Thanks > [Umbrella] Support CPU isolation for latency-sensitive (LS) service > --- > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489398#comment-16489398 ] Miklos Szegedi commented on YARN-8320: -- [~cheersyang] / [~yangjiandan], thank you for raising this, this would be a very useful feature. Thank you [~leftnoteasy] for the comments. 1) I agree with [~leftnoteasy] about the special considerations regarding rounding. Because of this it might make sense to use a separate resource type for this feature. See my other comments regarding this below. 2) I also think like [~leftnoteasy] that users might not need the RESERVED/SHARED modes. It adds complexity reducing the number of users, who would use the feature. On the other hand I admit it nicely applies to cpuset.cpu_exclusive=0/1. 3) I definitely agree with [~leftnoteasy] in the use of resource types. It might be straightforward to have a cpuset resource type that the AMs can request and share the cgroups accordingly. This would also make the configuration more standard. The levels might not even be needed in this case. If an application does not request cpuset, it is shared, otherwise it is exclusive. The current suggestion would work but please consider using resource types. 4) The design lets the AM do a delayed exclusive request directly to the NM avoiding the RM. I think it would be more robust to request from the RM in the container launch context and just forward this to the NM. The RM has the chance to decline or delay the request in this case in the future. 5) [~yangjiandan], how can you make sure a parent cgroup does not interfere with a cgroup marked as {{cpuset.cpu_exclusive=1}}? What if a system service wakes up? 6) Let me mention that this feature negatively affects YARN-1011 and oversubscription. An exclusive CPU with leftover cannot be used by any other container and remains idle. This reduces overall cluster utilization. 7) Also, latency sensitive applications get exclusive protection but can only be assigned to their cpuset disallowing bursts to other CPUs when needed. I do not know how to solve this though. 8) If a cpuset is not exclusive it is considered as a limit by cgroups not a reserve. The feature uses this as a reserve which practically would mean that other container cgroups need to be changed and reduced every time a reserved container starts. Am I correct? > [Umbrella] Support CPU isolation for latency-sensitive (LS) service > --- > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org