[jira] [Commented] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service

2018-05-22 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483813#comment-16483813
 ] 

Weiwei Yang commented on YARN-8320:
---

Some updates, I am working with [~yangjiandan] on polishing the design doc, 
will add more details and explanations this week. Please feel free to comment 
and share your thoughts.

> Add support CPU isolation for latency-sensitive  (LS) service
> -
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more finer cpu isolation.
> My co-workers and I propose a solution using cgroup cpuset to binds 
> containers to different processors, this is inspired by the isolation 
> technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].
>  Later I will upload a detailed design doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service

2018-05-21 Thread Jiandan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482336#comment-16482336
 ] 

Jiandan Yang  commented on YARN-8320:
-

upload v1 patch to initiate disscussion

> Add support CPU isolation for latency-sensitive  (LS) service
> -
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more finer cpu isolation.
> My co-workers and I propose a solution using cgroup cpuset to binds 
> containers to different processors, this is inspired by the isolation 
> technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].
>  Later I will upload a detailed design doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service

2018-05-19 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481841#comment-16481841
 ] 

Weiwei Yang commented on YARN-8320:
---

Thanks [~yangjiandan]for the proposal, this is very interesting, will help to 
extend Yarn to support online services, especially latency-sensitive services.  
I think we should have an umbrella Jira track this effort to support LS 
services. [~leftnoteasy], what do you think?

I did take a look at the proposal, there are still some details to be figured 
out, but overall a good start. Some early comments,
 # Section 2 presents 4 modes, which is a bit complex. If possible, we should 
start to support exclusive and non-exclusive mode in the first phase.
 # The proposal needs to add more info about the RM side change. It's not clear 
to me if scheduler needs cpu share mode info for its scheduling decisions. And 
also not clear to me what's the relationship between vcores and cpu share mode. 
Please add more info, with some examples.
 # Update container cpu share mode might also be a phase 2 work.

I will deep dive into this area next week and share some more comments if I 
found any. We can have some discussion over this too.

Thanks

> Add support CPU isolation for latency-sensitive  (LS) service
> -
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares”  to isolate cpu resource. However,
> * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
> * Request latency of services running on container may be frequent shake when 
> all containers share cpus, and latency-sensitive services can not afford in 
> our production environment.
> So we need more finer cpu isolation.
> My co-workers and I propose a solution using cgroup cpuset to binds 
> containers to different processors according to a [Google’s 
> PPT|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].
> Later I will upload a detailed design doc.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service

2018-05-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481002#comment-16481002
 ] 

Wangda Tan commented on YARN-8320:
--

Thanks [~yangjiandan] This is definitely a very interesting feature proposal! 
Which can benefit use cases like mixed deployment of online / offline jobs.

Adding several folks who might be interested in this: [~cheersyang], 
[~asuresh], [~miklos.szeg...@cloudera.com], [~haibo.chen], 
[~shaneku...@gmail.com].

> Add support CPU isolation for latency-sensitive  (LS) service
> -
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares”  to isolate cpu resource. However,
> * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
> * Request latency of services running on container may be frequent shake when 
> all containers share cpus, and latency-sensitive services can not afford in 
> our production environment.
> So we need more finer cpu isolation.
> My co-workers and I propose a solution using cgroup cpuset to binds 
> containers to different processors according to a [Google’s 
> PPT|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].
> Later I will upload a detailed design doc.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org