[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

2018-06-18 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516290#comment-16516290
 ] 

Miklos Szegedi commented on YARN-8320:
--

[~cheersyang], thank you for the response. Indeed, I was assuming users will 
set some typical values for vcores(cores) that translate to cpu.shares and 
cpuset. These are like 0.2, 0.5 for shareable and 1.0 or 2.0 for exclusive. 
Values like 4.5 are too hard to understand indeed.

Please go ahead with your approach. It would be useful to keep in mind that the 
system needs to be able to set both cpu.shares and cpuset dimensions some time 
in the future.

 

> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> ---
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

2018-06-08 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505950#comment-16505950
 ] 

Weiwei Yang commented on YARN-8320:
---

Hi [~miklos.szeg...@cloudera.com]

The solution you proposed looks like quite interesting, especially the idea 
about cpu.share. Please see my comments

If use the "simple" approach you shared,

1. Such request
{noformat}
Request1:
  #vcores: 9*1024
{noformat}
this breaks the basic semantic for vcore we've been using for years, this is 
core API level incompatible.

2. For the formula you gave,
{noformat}
  #cpu.shares: (#vcores + 1023) % 1024 + 1 = 1024
{noformat}
What if I specify #vcore=8*512 (=4 * 1024), what #cpu.shares and #cpuset will 
be? I don't think you can get 2 var result from 1 var input.

If we consider #cpu.shares and #cpuset as resources,
 # NM and RM needs to know all info about physical processors, including their 
(virtual) shares, this introduces extra complexity for both NM and RM. 
Moreover, current resource API doesn't support such multidimensional resource.
 # Precise weighting sounds like an interesting idea, but I doubt if we really 
need that much. On our online systems, we don't really control in that 
fine-grained.
 # It will be hard for user to setup {{cpu.share}}, how would an user know what 
value is meaningful. And what if some user just set some too big or small value 
in their requests? They will get un-predictable results.

We can have some more offline chats about this, thanks for bringing up the idea.
 Thanks.

> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> ---
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

2018-06-06 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503850#comment-16503850
 ] 

Miklos Szegedi commented on YARN-8320:
--

[~cheersyang], it makes sense in general. I think what you are missing here is 
if you use non-strict mode with simple weights {{cpu.shares}}, those are 
applied per thread. Indeed there is a design issue in the existing 
{{CGroupsCPUResourcesHandler}} that it applies {{cpu.shares=vcores}} to each 
thread of a guaranteed container.

Given this request in your example above, you will actually get 10 * N * N cpu 
time with the current code that is wrong.
{code:java}
Request1:
  #vcore: 10 * N
  #cpu: N (0 [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> ---
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

2018-06-06 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503212#comment-16503212
 ] 

Weiwei Yang commented on YARN-8320:
---

Hi [~miklos.szeg...@cloudera.com]

I mean by letting user setup both #vcore and #cpus in their resource request is 
too complex. Even for phase 1, if only EXCLUSIVE mode is supported, for example:

{noformat}
NM:
  #vcore: 100
  #cpu: 10
{noformat}

User want to use exclusive, so the request must be like

{noformat}
Request1:
  #vcore: 10 * N
  #cpu: N (0 [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> ---
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

2018-06-05 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502128#comment-16502128
 ] 

Miklos Szegedi commented on YARN-8320:
--

Thanks [~cheersyang] for the reply.
bq. It's not straightforward to set it as a resource in request, just like the 
example you pointed out. 
I am not sure what do you mean. I actually pointed out the opposite. It is not 
straightforward to set it as a single vcore resource request.



> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> ---
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

2018-06-05 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501927#comment-16501927
 ] 

Weiwei Yang commented on YARN-8320:
---

Hi [~miklos.szeg...@cloudera.com]/[~bibinchundatt]

Thanks for the comments, here are my thoughts
{quote}I think is that cpu and cpuset are not the same resource in cgroups.
{quote}
Actually we don't want to say *cpuset* is a resource, it is an isolation 
technique against a certain amount of cpu resource. It's not straightforward to 
set it as a resource in request, just like the example you pointed out. The 
purpose of this Jira is to give user a way to isolate or partially share cpu 
resources between containers. For the 1st step when we support exclusive type, 
once the container is started, we will set cpu quota/shares for them to limit 
resource usage, and bind the container to same amount of processors for 
isolation. So we still prefer the "simplify" the path, the "resource" path 
doesn't seem to be clear to me.
{quote}If we don't have slots to bind container will be rejecting the container 
start request. Which will be considered as failed. Scheduler could again 
allocate container to same nodemanager rt ??
{quote}
Well when a container failed to start like this, in most cases it will not be 
able to start on any node. Because this means the #vcore in the requested is 
too small. We are trying to do some pre-check on such conditions to give a 
fail-fast approach, that should help.
{quote}When nm processors/ nm vcores < 1 and share mode have you considered 
strictness per containers ?? ie using the periods and quota also along with 
Cpuset assignment ?? If no other process is using cpu then process will be 
consuming more than what its supposed to rt ??
{quote}
Yes, cpu quota and shares will also be set for containers, that's the thing we 
depend on to limit the actual cpu usage for each containers. We just need to 
make sure the values set for them are reasonable when they have bind to certain 
processors. Thanks for pointing this out, a very good concern.
{quote}Using fixed set of folders for assignment in Allocator (Reduce overload 
of creation and deletion on containers.
{quote}
That makes sense.
{quote}Resource calculation could go wrong incase of preemption of containers 
rt . kill reject could get processed after container start.
{quote}
Resource calculation won't be wrong since we don't count cpuset as a resource. 
But when container is killed, we will need to make sure this is handled and 
cgroups gets updated correctly. Thanks for pointing this out, I will add some 
more details into next version of design doc.

Thanks

> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> ---
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

2018-06-04 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499938#comment-16499938
 ] 

Bibin A Chundatt commented on YARN-8320:


Thank you [~cheersyang]/[~yangjiandan] for design doc

{quote}
When a container’s cpu_share_mode is EXCLUSIVE/RESERVED, the number of allocated
processor ​ allocateProcessorNum​ = ​ container_vcore / Vcore_Ratio, ​ request 
will be
rejected if​ allocateProcessorNum <= 0;
{quote}
# IIUC If we don't  have slots to bind container will be rejecting the 
container start request. Which will be considered as failed. Scheduler could 
again allocate container to same nodemanager rt ??
# When nm processors/ nm vcores < 1  and share mode have you considered 
*strictness per containers* ?? ie using the periods and quota also along with 
Cpuset assignment ?? If no other process is using  cpu then process will be 
consuming more than what its supposed to rt ??

Thoughts on having CpuBindHandlerImpl includes 2 Allocators for cgroups 
subgroups one for cpu and another for cpuset?

Could you also consider the following in design

# Using fixed set of folders for assignment in Allocator (Reduce overload of 
creation and deletion on containers.)
# Resource calculation could go wrong incase of preemption of  containers rt . 
kill reject could get processed after container start.



> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> ---
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

2018-05-29 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494656#comment-16494656
 ] 

Miklos Szegedi commented on YARN-8320:
--

Thank you [~cheersyang] for the detailed response. The only thing that you are 
missing I think is that cpu and cpuset are not the same resource in cgroups. 
They are actually two dimensions of the CPU space. cpu,cpuacct controls in 
general how much time is allocated (one dimension) and cpuset controls how many 
physical devices are allocated (second dimension). cpu,cpuacct is a 
compressible, flexible resource more will almost always proportionally reduce 
the runtime if cpu bound. cpuset is is not flexible, it depends on the thread 
factor of the container.

Just to use your example above:
{code:java}
I have a NM with capacity:
  memory: 10gb
  vcore: 10
  cpus: 10 (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

Request with just vcore number (the container runs a single process and single 
thread !):
  memory: 1gb
  vcore: 5

After allocation, my NM capacity updates to 
  memory: 9gb
  vcore: 5
  cpus: 5 (0, 1, 2, 3, 4) WRONG(!) The process is single threaded, 4 cores are 
wasted.

Request with both vcore number and cpus:
  memory: 1gb
  vcore(cputime): 5
  cpuset: 1

After allocation, my NM capacity updates to 
  memory: 9gb
  vcore(cputime): 5
  cpus: 5 (0, 1, 2, 3, 4, 5, 6, 7, 8) GOOD The process is single threaded.
{code}
I understand that you would like to simplify the configuration. However, as you 
see in the example above the situation above will never be able to be solved by 
YARN anymore. This because of backward compatibility, if the current design is 
chosen.

That being said, if you still would like to follow the simplified path, please 
go ahead, I just wanted to elaborate my concerns.

> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> ---
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

2018-05-29 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493538#comment-16493538
 ] 

Weiwei Yang commented on YARN-8320:
---

Thanks [~miklos.szeg...@cloudera.com] for sharing your idea. You were right 
that the original idea was to make this easy to use. That says user doesn't 
need to know about what set of cpus their containers will be running on, and 
how they are configured. They just give us a cpu_share_mode, and we do all the 
tricks underneath without exposing too much details.

My concern about the approach you suggested is
 # It might be complex for user to use
 # It should be able to support 2 modes but not very straightforward to support 
4 modes

Allow me take an example like following:

{noformat}
I have a NM with capacity:
  memory: 10gb
  vcore: 10
  cpus: 10 (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

Request with just cpu number:
  memory: 1gb
  vcore: 5
  cpuset: 5

After allocation, my NM capacity updates to 
  memory: 9gb
  vcore: 5
  cpus: 5 (0, 1, 2, 3, 4)
{noformat}

there are few problems with such approach
 # User might get confused how many cpus to apply in the resource request. 
Vcore as of today is already a difficult thing to set, adding a new type of 
resource might make this harder.
 # When #vcore is not same as #processor on NM, user will need do some 
calculation to set a reasonable cpuset value in order not to over/less use cpu 
resource, and this is hard for RM to check as it doesn't have all the info like 
NM did
 # Difficult to support all 4 modes under current resource APIs

Please let me know if there is any wrong in this example and the comments.
I agree we can start from supporting EXCLUSIVE+ANY mode in phase 1, but still 
want to make sure the design is able to extend to support both modes (because 
RESERVED/SHARE modes are very useful to improve utilizations). I will 
consolidate all the comments from you and [~leftnoteasy] and come up with a new 
version of design doc next week. Look forward for your comments always.

Thanks

> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> ---
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

2018-05-28 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493112#comment-16493112
 ] 

Miklos Szegedi commented on YARN-8320:
--

Thank you, [~cheersyang]. For the responses. They make sense to me in general.
{quote}how many cpuset resource on a NM and how a AM to request?
{quote}
In general, this is an adapter code, passing on a cgroup functionality to 
another API. As such it can do two things. One is being transparent, the other 
is making the original API easier to use. You try to do the latter in your 
design, which makes sense. Being transparent however would mean letting the AM 
choose cpu resources controlling cpu,cpuacct and cpuset resources controlling 
cpuset separately. I would prefer the latter, since it is transparent keeping 
all functionality without restrictions and makes any future design easier to 
implement. cpuset would have as many processors as there are available in 
cpuset.cpus of the container root cgroup that is usually {{hadoop-yarn}}. 
Individual CPUs are chosen by NM based on the number of cpuset cpus granted by 
RM.

However, I do not have a strong opinion about this.

> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> ---
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

2018-05-27 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492301#comment-16492301
 ] 

Weiwei Yang commented on YARN-8320:
---

[~leftnoteasy], your suggestion makes sense to me, let us see if we can split 
this to phases. For basis user scenarios, exclusive mode might be good enough 
to start. Will also take a look YARN-7481. And update these in next design doc. 
Thanks for the comments !

> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> ---
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

2018-05-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491177#comment-16491177
 ] 

Wangda Tan commented on YARN-8320:
--

Thanks [~cheersyang], 

To me the benefit of exclusive mode is obvious and straightforward. For 
SHARED/RESERVED mode, if we plan to do this in phases, could we do the 
EXCLUSIVE mode first, harden the API, refactoring, etc first. Instead of 
working on too many changes. What's ur opinion on this?

In addition to the JIRA I mentioned, there's an effort to add support to GPU 
affinity: YARN-7481. I don't quite like the proposed approach (bitmap to 
represent resources). But it might be good if we can build a common layer to 
support such use cases since for CPU exclusive usage, etc. we also want ACLs / 
resource accounting, etc.

> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> ---
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

2018-05-24 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490165#comment-16490165
 ] 

Weiwei Yang commented on YARN-8320:
---

Hi [~miklos.szeg...@cloudera.com]

Thanks for the comments, please see my points below

bq. 1) and 3) it might make sense to use a separate resource type for this 
feature

Extend resource type might not be straightforward for cpuset. From your 
suggestion, how can you define how many cpuset resource on a NM and how a AM to 
request? It's not a numeric value. The problem is cpuset is working on physical 
cores (processor) but Yarn manages vcores, and a processor can be shared by 
multiple containers. Hence we can hardly define "values" if we consider it as a 
resource. 

bq. 2) users might not need the RESERVED/SHARED modes

This was my first impression too. But after I talked with [~yangjiandan] and 
some other folks who manage LS services, I change my mind. RESERVE/SHARE helps 
to improve the utilization and a key for mix-workload environment, that having 
batch tasks running along with services. It helps to resolve the problems like 
you mentioned in #6 and #7.

bq. 4) The design lets the AM do a delayed exclusive request directly to the NM 
avoiding the RM. I think it would be more robust to request from the RM in the 
container launch context and just forward this to the NM. The RM has the chance 
to decline or delay the request in this case in the future.

I agree. We are not figuring out a way to let RM play its role here, will try 
harder thinking about this.

bq. 6) Let me mention that this feature negatively affects YARN-1011 and 
oversubscription.

That's why we have RESERVED/SHARED mode, it allows a LS service to share its 
CPU with other tasks, including O containers (O containers will be using ANY 
mode). But if we set a container with EXCLUSIVE mode, then yes, this will 
occupied the CPU, this is the only way to ensure it runs completely isolated 
for such highly sensitive tasks. For our existing online services, most of them 
are using RESERVED or SHARE mode in order to improve the utilization (a typical 
mixed-workload scenario)

bq. 5)  how can you make sure a parent cgroup does not interfere with a cgroup 
marked as cpuset.cpu_exclusive=1? What if a system service wakes up?

We are not going to set cpuset.cpu_exclusive=1, at least not in this version of 
design. We are trying to solve the problem about competing CPU resources 
between containers, not with system services.

bq. 7) Also, latency sensitive applications get exclusive protection but can 
only be assigned to their cpuset disallowing bursts to other CPUs when needed. 
I do not know how to solve this though.

Use SHARE mode. We have a lot of online services running under this mode, that 
allows it to use all processors except those assigned to EXCLUSIVE and RESERVED.

bq. 8) mean that other container cgroups need to be changed and reduced every 
time a reserved container starts

Correct. When we assign a processor to a container using RESERVED or EXCLUSIVE, 
then we need to remove it from rest of containers cgroup, this is briefly 
introduced in section 3.5 of the design doc.

Hope it makes sense, looking forward to hear your feedback.
Thanks

> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> ---
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

2018-05-24 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489398#comment-16489398
 ] 

Miklos Szegedi commented on YARN-8320:
--

[~cheersyang] / [~yangjiandan], thank you for raising this, this would be a 
very useful feature.

Thank you [~leftnoteasy] for the comments.

1) I agree with  [~leftnoteasy] about the special considerations regarding 
rounding. Because of this it might make sense to use a separate resource type 
for this feature. See my other comments regarding this below.

2) I also think like [~leftnoteasy] that users might not need the 
RESERVED/SHARED modes. It adds complexity reducing the number of users, who 
would use the feature. On the other hand I admit it nicely applies to 
cpuset.cpu_exclusive=0/1.

3) I definitely agree with [~leftnoteasy] in the use of resource types. It 
might be straightforward to have a cpuset resource type that the AMs can 
request and share the cgroups accordingly. This would also make the 
configuration more standard. The levels might not even be needed in this case. 
If an application does not request cpuset, it is shared, otherwise it is 
exclusive. The current suggestion would work but please consider using resource 
types.

4) The design lets the AM do a delayed exclusive request directly to the NM 
avoiding the RM. I think it would be more robust to request from the RM in the 
container launch context and just forward this to the NM. The RM has the chance 
to decline or delay the request in this case in the future.

5) [~yangjiandan], how can you make sure a parent cgroup does not interfere 
with a cgroup marked as {{cpuset.cpu_exclusive=1}}? What if a system service 
wakes up?

6) Let me mention that this feature negatively affects YARN-1011 and 
oversubscription. An exclusive CPU with leftover cannot be used by any other 
container and remains idle. This reduces overall cluster utilization.

7) Also, latency sensitive applications get exclusive protection but can only 
be assigned to their cpuset disallowing bursts to other CPUs when needed. I do 
not know how to solve this though.

8) If a cpuset is not exclusive it is considered as a limit by cgroups not a 
reserve. The feature uses this as a reserve which practically would mean that 
other container cgroups need to be changed and reduced every time a reserved 
container starts. Am I correct?

> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> ---
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org